2017-12-15 09:52:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 00/27] 4.9.70-stable review

This is the start of the stable review cycle for the 4.9.70 release.
There are 27 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Dec 17 09:22:42 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.70-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.9.70-rc1

Leon Romanovsky <[email protected]>
RDMA/cxgb4: Annotate r2 and stag as __be32

Zdenek Kabelac <[email protected]>
md: free unused memory after bitmap resize

Paul Moore <[email protected]>
audit: ensure that 'audit=1' actually enables audit for PID 1

Keefe Liu <[email protected]>
ipvlan: fix ipv6 outbound device

Masahiro Yamada <[email protected]>
kbuild: do not call cc-option before KBUILD_CFLAGS initialization

Paul Mackerras <[email protected]>
powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold

Marc Zyngier <[email protected]>
KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table

Al Viro <[email protected]>
fix kcm_clone()

Vincent Pelletier <[email protected]>
usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping

Heiko Carstens <[email protected]>
s390: always save and restore all registers on context switch

Masamitsu Yamazaki <[email protected]>
ipmi: Stop timers before cleaning up the module

Debabrata Banerjee <[email protected]>
Fix handling of verdicts after NF_QUEUE

Tommi Rantala <[email protected]>
tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()

Julian Wiedmann <[email protected]>
s390/qeth: fix thinko in IPv4 multicast address tracking

Julian Wiedmann <[email protected]>
s390/qeth: fix GSO throughput regression

Julian Wiedmann <[email protected]>
s390/qeth: build max size GSO skbs on L2 devices

Eric Dumazet <[email protected]>
tcp/dccp: block bh before arming time_wait timer

Lars Persson <[email protected]>
stmmac: reset last TSO segment size after device open

Eric Dumazet <[email protected]>
net: remove hlist_nulls_add_tail_rcu()

Bjørn Mork <[email protected]>
usbnet: fix alignment for frames with no ethernet header

Eric Dumazet <[email protected]>
net/packet: fix a race in packet_bind() and packet_notifier()

Mike Maloney <[email protected]>
packet: fix crash in fanout_demux_rollover()

Hangbin Liu <[email protected]>
sit: update frag_off info

Håkon Bugge <[email protected]>
rds: Fix NULL pointer dereference in __rds_rdma_map

Jon Maloy <[email protected]>
tipc: fix memory leak in tipc_accept_from_sock()

Julian Wiedmann <[email protected]>
s390/qeth: fix early exit from error path

Sebastian Sjoholm <[email protected]>
net: qmi_wwan: add Quectel BG96 2c7c:0296


-------------

Diffstat:

Makefile | 25 ++++----
arch/powerpc/include/asm/checksum.h | 17 ++++--
arch/s390/include/asm/switch_to.h | 19 +++---
drivers/char/ipmi/ipmi_si_intf.c | 44 +++++++-------
drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 4 +-
drivers/md/bitmap.c | 9 +++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 +
drivers/net/ipvlan/ipvlan_core.c | 2 +-
drivers/net/usb/qmi_wwan.c | 3 +
drivers/net/usb/usbnet.c | 5 +-
drivers/s390/net/qeth_core.h | 3 +
drivers/s390/net/qeth_core_main.c | 31 ++++++++++
drivers/s390/net/qeth_l2_main.c | 4 +-
drivers/s390/net/qeth_l3_main.c | 13 +++--
drivers/usb/gadget/function/f_fs.c | 2 +-
include/linux/rculist_nulls.h | 38 ------------
include/linux/usb/usbnet.h | 1 +
include/net/sock.h | 6 +-
kernel/audit.c | 10 ++--
net/dccp/minisocks.c | 6 ++
net/ipv4/tcp_minisocks.c | 6 ++
net/ipv6/sit.c | 1 +
net/kcm/kcmsock.c | 71 ++++++++---------------
net/netfilter/core.c | 5 ++
net/packet/af_packet.c | 37 +++++-------
net/packet/internal.h | 1 -
net/rds/rdma.c | 2 +-
net/tipc/server.c | 1 +
net/tipc/udp_media.c | 4 --
virt/kvm/arm/vgic/vgic-its.c | 2 +-
30 files changed, 191 insertions(+), 182 deletions(-)



2017-12-15 09:52:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 10/27] stmmac: reset last TSO segment size after device open

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lars Persson <[email protected]>


[ Upstream commit 45ab4b13e46325d00f4acdb365d406e941a15f81 ]

The mss variable tracks the last max segment size sent to the TSO
engine. We do not update the hardware as long as we receive skb:s with
the same value in gso_size.

During a network device down/up cycle (mapped to stmmac_release() and
stmmac_open() callbacks) we issue a reset to the hardware and it
forgets the setting for mss. However we did not zero out our mss
variable so the next transmission of a gso packet happens with an
undefined hardware setting.

This triggers a hang in the TSO engine and eventuelly the netdev
watchdog will bark.

Fixes: f748be531d70 ("stmmac: support new GMAC4")
Signed-off-by: Lars Persson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1795,6 +1795,7 @@ static int stmmac_open(struct net_device

priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
priv->rx_copybreak = STMMAC_RX_COPYBREAK;
+ priv->mss = 0;

ret = alloc_dma_desc_resources(priv);
if (ret < 0) {


2017-12-15 09:52:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 12/27] s390/qeth: build max size GSO skbs on L2 devices

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 0cbff6d4546613330a1c5f139f5c368e4ce33ca1 ]

The current GSO skb size limit was copy&pasted over from the L3 path,
where it is needed due to a TSO limitation.
As L2 devices don't offer TSO support (and thus all GSO skbs are
segmented before they reach the driver), there's no reason to restrict
the stack in how large it may build the GSO skbs.

Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_l2_main.c | 2 --
drivers/s390/net/qeth_l3_main.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1140,8 +1140,6 @@ static int qeth_l2_setup_netdev(struct q
}
card->info.broadcast_capable = 1;
qeth_l2_request_initial_mac(card);
- card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
- PAGE_SIZE;
SET_NETDEV_DEV(card->dev, &card->gdev->dev);
netif_napi_add(card->dev, &card->napi, qeth_l2_poll, QETH_NAPI_WEIGHT);
netif_carrier_off(card->dev);
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3147,8 +3147,8 @@ static int qeth_l3_setup_netdev(struct q
NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_FILTER;
netif_keep_dst(card->dev);
- card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
- PAGE_SIZE;
+ netif_set_gso_max_size(card->dev, (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
+ PAGE_SIZE);

SET_NETDEV_DEV(card->dev, &card->gdev->dev);
netif_napi_add(card->dev, &card->napi, qeth_l3_poll, QETH_NAPI_WEIGHT);


2017-12-15 09:52:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 14/27] s390/qeth: fix thinko in IPv4 multicast address tracking

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upsteam commit bc3ab70584696cb798b9e1e0ac8e6ced5fd4c3b8 ]

Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
reworked how secondary addresses are managed for qeth devices.
Instead of dropping & subsequently re-adding all addresses on every
ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
currently registered with the HW.
On a ndo_set_rx_mode(), we thus only need to do (de-)registration
requests for the addresses that have actually changed.

On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
hashtable - and thus never finds a match. As a result, we first delete
*all* such addresses, and then re-add them again. So each set_rx_mode()
causes a short period where the IPv4 Multicast addresses are not
registered, and the card stops forwarding inbound traffic for them.

Fix this by setting the ->is_multicast flag on the lookup object, thus
enabling qeth_l3_ip_from_hash() to search the correct hashtable and
find a match there.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_l3_main.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -1416,6 +1416,7 @@ qeth_l3_add_mc_to_hash(struct qeth_card

tmp->u.a4.addr = im4->multiaddr;
memcpy(tmp->mac, buf, sizeof(tmp->mac));
+ tmp->is_multicast = 1;

ipm = qeth_l3_ip_from_hash(card, tmp);
if (ipm) {


2017-12-15 09:52:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 02/27] s390/qeth: fix early exit from error path

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 83cf79a2fec3cf499eb6cb9eb608656fc2a82776 ]

When the allocation of the addr buffer fails, we need to free
our refcount on the inetdevice before returning.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_l3_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -1593,7 +1593,7 @@ static void qeth_l3_free_vlan_addresses4

addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV4);
if (!addr)
- return;
+ goto out;

spin_lock_bh(&card->ip_lock);

@@ -1607,6 +1607,7 @@ static void qeth_l3_free_vlan_addresses4
spin_unlock_bh(&card->ip_lock);

kfree(addr);
+out:
in_dev_put(in_dev);
}

@@ -1631,7 +1632,7 @@ static void qeth_l3_free_vlan_addresses6

addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV6);
if (!addr)
- return;
+ goto out;

spin_lock_bh(&card->ip_lock);

@@ -1646,6 +1647,7 @@ static void qeth_l3_free_vlan_addresses6
spin_unlock_bh(&card->ip_lock);

kfree(addr);
+out:
in6_dev_put(in6_dev);
#endif /* CONFIG_QETH_IPV6 */
}


2017-12-15 09:53:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 22/27] powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Mackerras <[email protected]>

commit b492f7e4e07a28e706db26cf4943bb0911435426 upstream.

These functions compute an IP checksum by computing a 64-bit sum and
folding it to 32 bits (the "nofold" in their names refers to folding
down to 16 bits). However, doing (u32) (s + (s >> 32)) is not
sufficient to fold a 64-bit sum to 32 bits correctly. The addition
can produce a carry out from bit 31, which needs to be added in to
the sum to produce the correct result.

To fix this, we copy the from64to32() function from lib/checksum.c
and use that.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/checksum.h | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -53,17 +53,25 @@ static inline __sum16 csum_fold(__wsum s
return (__force __sum16)(~((__force u32)sum + tmp) >> 16);
}

+static inline u32 from64to32(u64 x)
+{
+ /* add up 32-bit and 32-bit for 32+c bit */
+ x = (x & 0xffffffff) + (x >> 32);
+ /* add up carry.. */
+ x = (x & 0xffffffff) + (x >> 32);
+ return (u32)x;
+}
+
static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, __u32 len,
__u8 proto, __wsum sum)
{
#ifdef __powerpc64__
- unsigned long s = (__force u32)sum;
+ u64 s = (__force u32)sum;

s += (__force u32)saddr;
s += (__force u32)daddr;
s += proto + len;
- s += (s >> 32);
- return (__force __wsum) s;
+ return (__force __wsum) from64to32(s);
#else
__asm__("\n\
addc %0,%0,%1 \n\
@@ -123,8 +131,7 @@ static inline __wsum ip_fast_csum_nofold

for (i = 0; i < ihl - 1; i++, ptr++)
s += *ptr;
- s += (s >> 32);
- return (__force __wsum)s;
+ return (__force __wsum)from64to32(s);
#else
__wsum sum, tmp;



2017-12-15 09:53:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 23/27] kbuild: do not call cc-option before KBUILD_CFLAGS initialization

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masahiro Yamada <[email protected]>


[ Upstream commit 433dc2ebe7d17dd21cba7ad5c362d37323592236 ]

Some $(call cc-option,...) are invoked very early, even before
KBUILD_CFLAGS, etc. are initialized.

The returned string from $(call cc-option,...) depends on
KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS.

Since they are exported, they are not empty when the top Makefile
is recursively invoked.

The recursion occurs in several places. For example, the top
Makefile invokes itself for silentoldconfig. "make tinyconfig",
"make rpm-pkg" are the cases, too.

In those cases, the second call of cc-option from the same line
runs a different shell command due to non-pristine KBUILD_CFLAGS.

To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS
must be initialized before any call of cc-option. This avoids
garbage data in the .cache.mk file.

Move all calls of cc-option below the config targets because target
compiler flags are unnecessary for Kconfig.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Makefile | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

--- a/Makefile
+++ b/Makefile
@@ -370,9 +370,6 @@ LDFLAGS_MODULE =
CFLAGS_KERNEL =
AFLAGS_KERNEL =
LDFLAGS_vmlinux =
-CFLAGS_GCOV := -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call cc-disable-warning,maybe-uninitialized,)
-CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,)
-

# Use USERINCLUDE when you must reference the UAPI directories only.
USERINCLUDE := \
@@ -393,21 +390,19 @@ LINUXINCLUDE := \

LINUXINCLUDE += $(filter-out $(LINUXINCLUDE),$(USERINCLUDE))

-KBUILD_CPPFLAGS := -D__KERNEL__
-
+KBUILD_AFLAGS := -D__ASSEMBLY__
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-Werror-implicit-function-declaration \
-Wno-format-security \
- -std=gnu89 $(call cc-option,-fno-PIE)
-
-
+ -std=gnu89
+KBUILD_CPPFLAGS := -D__KERNEL__
KBUILD_AFLAGS_KERNEL :=
KBUILD_CFLAGS_KERNEL :=
-KBUILD_AFLAGS := -D__ASSEMBLY__ $(call cc-option,-fno-PIE)
KBUILD_AFLAGS_MODULE := -DMODULE
KBUILD_CFLAGS_MODULE := -DMODULE
KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds
+GCC_PLUGINS_CFLAGS :=

# Read KERNELRELEASE from include/config/kernel.release (if it exists)
KERNELRELEASE = $(shell cat include/config/kernel.release 2> /dev/null)
@@ -420,7 +415,7 @@ export MAKE AWK GENKSYMS INSTALLKERNEL P
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS

export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS
-export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_GCOV CFLAGS_KCOV CFLAGS_KASAN CFLAGS_UBSAN
+export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_KASAN CFLAGS_UBSAN
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
@@ -620,6 +615,12 @@ endif
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
+KBUILD_AFLAGS += $(call cc-option,-fno-PIE)
+CFLAGS_GCOV := -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call cc-disable-warning,maybe-uninitialized,)
+CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,)
+export CFLAGS_GCOV CFLAGS_KCOV
+
# The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default
# values of the respective KBUILD_* variables
ARCH_CPPFLAGS :=


2017-12-15 09:53:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 05/27] sit: update frag_off info

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hangbin Liu <[email protected]>


[ Upstream commit f859b4af1c52493ec21173ccc73d0b60029b5b88 ]

After parsing the sit netlink change info, we forget to update frag_off in
ipip6_tunnel_update(). Fix it by assigning frag_off with new value.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/sit.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1085,6 +1085,7 @@ static void ipip6_tunnel_update(struct i
ipip6_tunnel_link(sitn, t);
t->parms.iph.ttl = p->iph.ttl;
t->parms.iph.tos = p->iph.tos;
+ t->parms.iph.frag_off = p->iph.frag_off;
if (t->parms.link != p->link) {
t->parms.link = p->link;
ipip6_tunnel_bind_dev(t->dev);


2017-12-15 09:53:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 07/27] net/packet: fix a race in packet_bind() and packet_notifier()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ]

syzbot reported crashes [1] and provided a C repro easing bug hunting.

When/if packet_do_bind() calls __unregister_prot_hook() and releases
po->bind_lock, another thread can run packet_notifier() and process an
NETDEV_UP event.

This calls register_prot_hook() and hooks again the socket right before
first thread is able to grab again po->bind_lock.

Fixes this issue by temporarily setting po->num to 0, as suggested by
David Miller.

[1]
dev_remove_pack: ffff8801bf16fa80 not found
------------[ cut here ]------------
kernel BUG at net/core/dev.c:7945! ( BUG_ON(!list_empty(&dev->ptype_all)); )
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
device syz0 entered promiscuous mode
CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cc57a500 task.stack: ffff8801cc588000
RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
device syz0 entered promiscuous mode
RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
FS: 0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
tun_detach drivers/net/tun.c:670 [inline]
tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
__fput+0x333/0x7f0 fs/file_table.c:210
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x199/0x270 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x9bb/0x1ae0 kernel/exit.c:865
do_group_exit+0x149/0x400 kernel/exit.c:968
SYSC_exit_group kernel/exit.c:979 [inline]
SyS_exit_group+0x1d/0x20 kernel/exit.c:977
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x44ad19

Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Francesco Ruggeri <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3101,6 +3101,10 @@ static int packet_do_bind(struct sock *s
if (need_rehook) {
if (po->running) {
rcu_read_unlock();
+ /* prevents packet_notifier() from calling
+ * register_prot_hook()
+ */
+ po->num = 0;
__unregister_prot_hook(sk, true);
rcu_read_lock();
dev_curr = po->prot_hook.dev;
@@ -3109,6 +3113,7 @@ static int packet_do_bind(struct sock *s
dev->ifindex);
}

+ BUG_ON(po->running);
po->num = proto;
po->prot_hook.type = proto;



2017-12-15 09:53:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 26/27] md: free unused memory after bitmap resize

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zdenek Kabelac <[email protected]>


[ Upstream commit 0868b99c214a3d55486c700de7c3f770b7243e7c ]

When bitmap is resized, the old kalloced chunks just are not released
once the resized bitmap starts to use new space.

This fixes in particular kmemleak reports like this one:

unreferenced object 0xffff8f4311e9c000 (size 4096):
comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
hex dump (first 32 bytes):
02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................
02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................
backtrace:
[<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
[<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
[<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
[<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
[<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
[<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
[<ffffffffa676f1ae>] bitmap_load+0xbe/0x160
[<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
[<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
[<ffffffffa6774b52>] dm_resume+0x122/0x140
[<ffffffffa6779b9f>] dev_suspend+0x18f/0x290
[<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
[<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
[<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
[<ffffffffa62d7269>] SyS_ioctl+0x79/0x90
[<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Zdenek Kabelac <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/bitmap.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -2084,6 +2084,7 @@ int bitmap_resize(struct bitmap *bitmap,
for (k = 0; k < page; k++) {
kfree(new_bp[k].map);
}
+ kfree(new_bp);

/* restore some fields from old_counts */
bitmap->counts.bp = old_counts.bp;
@@ -2134,6 +2135,14 @@ int bitmap_resize(struct bitmap *bitmap,
block += old_blocks;
}

+ if (bitmap->counts.bp != old_counts.bp) {
+ unsigned long k;
+ for (k = 0; k < old_counts.pages; k++)
+ if (!old_counts.bp[k].hijacked)
+ kfree(old_counts.bp[k].map);
+ kfree(old_counts.bp);
+ }
+
if (!init) {
int i;
while (block < (chunks << chunkshift)) {


2017-12-15 10:12:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 06/27] packet: fix crash in fanout_demux_rollover()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mike Maloney <[email protected]>


syzkaller found a race condition fanout_demux_rollover() while removing
a packet socket from a fanout group.

po->rollover is read and operated on during packet_rcv_fanout(), via
fanout_demux_rollover(), but the pointer is currently cleared before the
synchronization in packet_release(). It is safer to delay the cleanup
until after synchronize_net() has been called, ensuring all calls to
packet_rcv_fanout() for this socket have finished.

To further simplify synchronization around the rollover structure, set
po->rollover in fanout_add() only if there are no errors. This removes
the need for rcu in the struct and in the call to
packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).

Crashing stack trace:
fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
xmit_one net/core/dev.c:2975 [inline]
dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
__dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
neigh_output include/net/neighbour.h:482 [inline]
ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
NF_HOOK_COND include/linux/netfilter.h:239 [inline]
ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
dst_output include/net/dst.h:459 [inline]
NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
worker_thread+0x223/0x1990 kernel/workqueue.c:2247
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432

Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()")
Reported-by: syzbot <[email protected]>
Signed-off-by: Mike Maloney <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 32 ++++++++++----------------------
net/packet/internal.h | 1 -
2 files changed, 10 insertions(+), 23 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1661,7 +1661,6 @@ static int fanout_add(struct sock *sk, u
atomic_long_set(&rollover->num, 0);
atomic_long_set(&rollover->num_huge, 0);
atomic_long_set(&rollover->num_failed, 0);
- po->rollover = rollover;
}

match = NULL;
@@ -1706,6 +1705,8 @@ static int fanout_add(struct sock *sk, u
if (atomic_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
__dev_remove_pack(&po->prot_hook);
po->fanout = match;
+ po->rollover = rollover;
+ rollover = NULL;
atomic_inc(&match->sk_ref);
__fanout_link(sk, po);
err = 0;
@@ -1719,10 +1720,7 @@ static int fanout_add(struct sock *sk, u
}

out:
- if (err && rollover) {
- kfree_rcu(rollover, rcu);
- po->rollover = NULL;
- }
+ kfree(rollover);
mutex_unlock(&fanout_mutex);
return err;
}
@@ -1746,11 +1744,6 @@ static struct packet_fanout *fanout_rele
list_del(&f->list);
else
f = NULL;
-
- if (po->rollover) {
- kfree_rcu(po->rollover, rcu);
- po->rollover = NULL;
- }
}
mutex_unlock(&fanout_mutex);

@@ -3039,6 +3032,7 @@ static int packet_release(struct socket
synchronize_net();

if (f) {
+ kfree(po->rollover);
fanout_release_data(f);
kfree(f);
}
@@ -3853,7 +3847,6 @@ static int packet_getsockopt(struct sock
void *data = &val;
union tpacket_stats_u st;
struct tpacket_rollover_stats rstats;
- struct packet_rollover *rollover;

if (level != SOL_PACKET)
return -ENOPROTOOPT;
@@ -3932,18 +3925,13 @@ static int packet_getsockopt(struct sock
0);
break;
case PACKET_ROLLOVER_STATS:
- rcu_read_lock();
- rollover = rcu_dereference(po->rollover);
- if (rollover) {
- rstats.tp_all = atomic_long_read(&rollover->num);
- rstats.tp_huge = atomic_long_read(&rollover->num_huge);
- rstats.tp_failed = atomic_long_read(&rollover->num_failed);
- data = &rstats;
- lv = sizeof(rstats);
- }
- rcu_read_unlock();
- if (!rollover)
+ if (!po->rollover)
return -EINVAL;
+ rstats.tp_all = atomic_long_read(&po->rollover->num);
+ rstats.tp_huge = atomic_long_read(&po->rollover->num_huge);
+ rstats.tp_failed = atomic_long_read(&po->rollover->num_failed);
+ data = &rstats;
+ lv = sizeof(rstats);
break;
case PACKET_TX_HAS_OFF:
val = po->tp_tx_has_off;
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -92,7 +92,6 @@ struct packet_fanout {

struct packet_rollover {
int sock;
- struct rcu_head rcu;
atomic_long_t num;
atomic_long_t num_huge;
atomic_long_t num_failed;


2017-12-15 09:53:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 03/27] tipc: fix memory leak in tipc_accept_from_sock()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jon Maloy <[email protected]>


[ Upstream commit a7d5f107b4978e08eeab599ee7449af34d034053 ]

When the function tipc_accept_from_sock() fails to create an instance of
struct tipc_subscriber it omits to free the already created instance of
struct tipc_conn instance before it returns.

We fix that with this commit.

Reported-by: David S. Miller <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/server.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -313,6 +313,7 @@ static int tipc_accept_from_sock(struct
newcon->usr_data = s->tipc_conn_new(newcon->conid);
if (!newcon->usr_data) {
sock_release(newsock);
+ conn_put(newcon);
return -ENOMEM;
}



2017-12-15 10:13:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 27/27] RDMA/cxgb4: Annotate r2 and stag as __be32

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Leon Romanovsky <[email protected]>


[ Upstream commit 7d7d065a5eec7e218174d5c64a9f53f99ffdb119 ]

Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.

drivers/infiniband/hw/cxgb4/qp.c:614:16:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] r2
got restricted __be32 [usertype] <noident>
drivers/infiniband/hw/cxgb4/qp.c:615:18:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] stag
got restricted __be32 [usertype] <noident>

Cc: Steve Wise <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
+++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
@@ -675,8 +675,8 @@ struct fw_ri_fr_nsmr_tpte_wr {
__u16 wrid;
__u8 r1[3];
__u8 len16;
- __u32 r2;
- __u32 stag;
+ __be32 r2;
+ __be32 stag;
struct fw_ri_tpte tpte;
__u64 pbl[2];
};


2017-12-15 10:13:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 25/27] audit: ensure that audit=1 actually enables audit for PID 1

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Moore <[email protected]>


[ Upstream commit 173743dd99a49c956b124a74c8aacb0384739a4c ]

Prior to this patch we enabled audit in audit_init(), which is too
late for PID 1 as the standard initcalls are run after the PID 1 task
is forked. This means that we never allocate an audit_context (see
audit_alloc()) for PID 1 and therefore miss a lot of audit events
generated by PID 1.

This patch enables audit as early as possible to help ensure that when
PID 1 is forked it can allocate an audit_context if required.

Reviewed-by: Richard Guy Briggs <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/audit.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -79,13 +79,13 @@ static int audit_initialized;
#define AUDIT_OFF 0
#define AUDIT_ON 1
#define AUDIT_LOCKED 2
-u32 audit_enabled;
-u32 audit_ever_enabled;
+u32 audit_enabled = AUDIT_OFF;
+u32 audit_ever_enabled = !!AUDIT_OFF;

EXPORT_SYMBOL_GPL(audit_enabled);

/* Default state when kernel boots without any parameters. */
-static u32 audit_default;
+static u32 audit_default = AUDIT_OFF;

/* If auditing cannot proceed, audit_failure selects what happens. */
static u32 audit_failure = AUDIT_FAIL_PRINTK;
@@ -1199,8 +1199,6 @@ static int __init audit_init(void)
skb_queue_head_init(&audit_skb_queue);
skb_queue_head_init(&audit_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
- audit_enabled = audit_default;
- audit_ever_enabled |= !!audit_default;

audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");

@@ -1217,6 +1215,8 @@ static int __init audit_enable(char *str
audit_default = !!simple_strtol(str, NULL, 0);
if (!audit_default)
audit_initialized = AUDIT_DISABLED;
+ audit_enabled = audit_default;
+ audit_ever_enabled = !!audit_enabled;

pr_info("%s\n", audit_default ?
"enabled (after initialization)" : "disabled (until reboot)");


2017-12-15 10:14:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 24/27] ipvlan: fix ipv6 outbound device

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Keefe Liu <[email protected]>


[ Upstream commit ca29fd7cce5a6444d57fb86517589a1a31c759e1 ]

When process the outbound packet of ipv6, we should assign the master
device to output device other than input device.

Signed-off-by: Keefe Liu <[email protected]>
Acked-by: Mahesh Bandewar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ipvlan/ipvlan_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -404,7 +404,7 @@ static int ipvlan_process_v6_outbound(st
struct dst_entry *dst;
int err, ret = NET_XMIT_DROP;
struct flowi6 fl6 = {
- .flowi6_iif = dev->ifindex,
+ .flowi6_oif = dev->ifindex,
.daddr = ip6h->daddr,
.saddr = ip6h->saddr,
.flowi6_flags = FLOWI_FLAG_ANYSRC,


2017-12-15 09:52:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 21/27] KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream.

The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.

We end-up using whatever is on the stack. Who knows, it might
just be the right thing...

Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Cc: [email protected] # 4.8
Reported-by: AKASHI Takahiro <[email protected]>
Reviewed-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
virt/kvm/arm/vgic/vgic-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -322,6 +322,7 @@ static int its_sync_lpi_pending_table(st
int ret = 0;
u32 *intids;
int nr_irqs, i;
+ u8 pendmask;

nr_irqs = vgic_copy_lpi_list(vcpu->kvm, &intids);
if (nr_irqs < 0)
@@ -329,7 +330,6 @@ static int its_sync_lpi_pending_table(st

for (i = 0; i < nr_irqs; i++) {
int byte_offset, bit_nr;
- u8 pendmask;

byte_offset = intids[i] / BITS_PER_BYTE;
bit_nr = intids[i] % BITS_PER_BYTE;


2017-12-15 10:15:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 20/27] fix kcm_clone()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit a5739435b5a3b8c449f8844ecd71a3b1e89f0a33 upstream.

1) it's fput() or sock_release(), not both
2) don't do fd_install() until the last failure exit.
3) not a bug per se, but... don't attach socket to struct file
until it's set up.

Take reserving descriptor into the caller, move fd_install() to the
caller, sanitize failure exits and calling conventions.

Acked-by: Tom Herbert <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/kcm/kcmsock.c | 73 +++++++++++++++++++-----------------------------------
1 file changed, 26 insertions(+), 47 deletions(-)

--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1624,60 +1624,35 @@ static struct proto kcm_proto = {
};

/* Clone a kcm socket. */
-static int kcm_clone(struct socket *osock, struct kcm_clone *info,
- struct socket **newsockp)
+static struct file *kcm_clone(struct socket *osock)
{
struct socket *newsock;
struct sock *newsk;
- struct file *newfile;
- int err, newfd;
+ struct file *file;

- err = -ENFILE;
newsock = sock_alloc();
if (!newsock)
- goto out;
+ return ERR_PTR(-ENFILE);

newsock->type = osock->type;
newsock->ops = osock->ops;

__module_get(newsock->ops->owner);

- newfd = get_unused_fd_flags(0);
- if (unlikely(newfd < 0)) {
- err = newfd;
- goto out_fd_fail;
- }
-
- newfile = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name);
- if (unlikely(IS_ERR(newfile))) {
- err = PTR_ERR(newfile);
- goto out_sock_alloc_fail;
- }
-
newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL,
&kcm_proto, true);
if (!newsk) {
- err = -ENOMEM;
- goto out_sk_alloc_fail;
+ sock_release(newsock);
+ return ERR_PTR(-ENOMEM);
}
-
sock_init_data(newsock, newsk);
init_kcm_sock(kcm_sk(newsk), kcm_sk(osock->sk)->mux);

- fd_install(newfd, newfile);
- *newsockp = newsock;
- info->fd = newfd;
-
- return 0;
-
-out_sk_alloc_fail:
- fput(newfile);
-out_sock_alloc_fail:
- put_unused_fd(newfd);
-out_fd_fail:
- sock_release(newsock);
-out:
- return err;
+ file = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name);
+ if (IS_ERR(file))
+ sock_release(newsock);
+
+ return file;
}

static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
@@ -1707,21 +1682,25 @@ static int kcm_ioctl(struct socket *sock
}
case SIOCKCMCLONE: {
struct kcm_clone info;
- struct socket *newsock = NULL;
+ struct file *file;

- if (copy_from_user(&info, (void __user *)arg, sizeof(info)))
+ info.fd = get_unused_fd_flags(0);
+ if (unlikely(info.fd < 0))
+ return info.fd;
+
+ file = kcm_clone(sock);
+ if (IS_ERR(file)) {
+ put_unused_fd(info.fd);
+ return PTR_ERR(file);
+ }
+ if (copy_to_user((void __user *)arg, &info,
+ sizeof(info))) {
+ put_unused_fd(info.fd);
+ fput(file);
return -EFAULT;
-
- err = kcm_clone(sock, &info, &newsock);
-
- if (!err) {
- if (copy_to_user((void __user *)arg, &info,
- sizeof(info))) {
- err = -EFAULT;
- sys_close(info.fd);
- }
}
-
+ fd_install(info.fd, file);
+ err = 0;
break;
}
default:


2017-12-15 09:52:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 19/27] usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vincent Pelletier <[email protected]>

commit 30bf90ccdec1da9c8198b161ecbff39ce4e5a9ba upstream.

Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation:

[ 100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421
[ 100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python
[ 100.867954] 2 locks held by python/1880:
[ 100.867961] #0: (&epfile->mutex){....}, at: [<f8188627>] ffs_mutex_lock+0x27/0x30 [usb_f_fs]
[ 100.868020] #1: (&(&ffs->eps_lock)->rlock){....}, at: [<f818ad4b>] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs]
[ 100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118
[ 100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
[ 100.868093] Call Trace:
[ 100.868122] dump_stack+0x47/0x62
[ 100.868156] ___might_sleep+0xfd/0x110
[ 100.868182] __might_sleep+0x68/0x70
[ 100.868217] kmem_cache_alloc_trace+0x4b/0x200
[ 100.868248] ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
[ 100.868302] dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
[ 100.868343] usb_ep_alloc_request+0x16/0xc0 [udc_core]
[ 100.868386] ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs]
[ 100.868424] ? _raw_spin_unlock_irqrestore+0x27/0x40
[ 100.868457] ? kiocb_set_cancel_fn+0x57/0x60
[ 100.868477] ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs]
[ 100.868512] ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs]
[ 100.868551] ? security_file_permission+0x9c/0xd0
[ 100.868587] ? rw_verify_area+0xac/0x120
[ 100.868633] aio_read+0x9d/0x100
[ 100.868692] ? __fget+0xa2/0xd0
[ 100.868727] ? __might_sleep+0x68/0x70
[ 100.868763] SyS_io_submit+0x471/0x680
[ 100.868878] do_int80_syscall_32+0x4e/0xd0
[ 100.868921] entry_INT80_32+0x2a/0x2a
[ 100.868932] EIP: 0xb7fbb676
[ 100.868941] EFLAGS: 00000292 CPU: 1
[ 100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368
[ 100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638
[ 100.868973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Signed-off-by: Vincent Pelletier <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/gadget/function/f_fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1015,7 +1015,7 @@ static ssize_t ffs_epfile_io(struct file
else
ret = ep->status;
goto error_mutex;
- } else if (!(req = usb_ep_alloc_request(ep->ep, GFP_KERNEL))) {
+ } else if (!(req = usb_ep_alloc_request(ep->ep, GFP_ATOMIC))) {
ret = -ENOMEM;
} else {
req->buf = data;


2017-12-15 09:52:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 18/27] s390: always save and restore all registers on context switch

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <[email protected]>

commit fbbd7f1a51965b50dd12924841da0d478f3da71b upstream.

The switch_to() macro has an optimization to avoid saving and
restoring register contents that aren't needed for kernel threads.

There is however the possibility that a kernel thread execve's a user
space program. In such a case the execve'd process can partially see
the contents of the previous process, which shouldn't be allowed.

To avoid this, simply always save and restore register contents on
context switch.

Fixes: fdb6d070effba ("switch_to: dont restore/save access & fpu regs for kernel threads")
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/s390/include/asm/switch_to.h | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -29,17 +29,16 @@ static inline void restore_access_regs(u
}

#define switch_to(prev,next,last) do { \
- if (prev->mm) { \
- save_fpu_regs(); \
- save_access_regs(&prev->thread.acrs[0]); \
- save_ri_cb(prev->thread.ri_cb); \
- } \
+ /* save_fpu_regs() sets the CIF_FPU flag, which enforces \
+ * a restore of the floating point / vector registers as \
+ * soon as the next task returns to user space \
+ */ \
+ save_fpu_regs(); \
+ save_access_regs(&prev->thread.acrs[0]); \
+ save_ri_cb(prev->thread.ri_cb); \
update_cr_regs(next); \
- if (next->mm) { \
- set_cpu_flag(CIF_FPU); \
- restore_access_regs(&next->thread.acrs[0]); \
- restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
- } \
+ restore_access_regs(&next->thread.acrs[0]); \
+ restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
prev = __switch_to(prev,next); \
} while (0)



2017-12-15 09:52:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 15/27] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tommi Rantala <[email protected]>


[ Upstream commit c7799c067c2ae33e348508c8afec354f3257ff25 ]

Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
checked that the bearer is not up, and calling tipc_rcv() with a bearer
that is not up leads to a TIPC div-by-zero crash in
tipc_node_calculate_timer(). The crash is rare in practice, but can
happen like this:

We're enabling a bearer, but it's not yet up and fully initialized.
At the same time we receive a discovery packet, and in tipc_udp_recv()
we end up calling tipc_rcv() with the not-yet-initialized bearer,
causing later the div-by-zero crash in tipc_node_calculate_timer().

Jon Maloy explains the impact of removing the second tipc_rcv() call:
"link setup in the worst case will be delayed until the next arriving
discovery messages, 1 sec later, and this is an acceptable delay."

As the tipc_rcv() call is removed, just leave the function via the
rcu_out label, so that we will kfree_skb().

[ 12.590450] Own node address <1.1.1>, network identity 1
[ 12.668088] divide error: 0000 [#1] SMP
[ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
[ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
[ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
[ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
[ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
[ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
[ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
[ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
[ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
[ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
[ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
[ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 12.712627] Call Trace:
[ 12.713390] <IRQ>
[ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc]
[ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc]
[ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc]
[ 12.717396] ? minmax_running_min+0x2f/0x60
[ 12.718248] ? dst_alloc+0x4c/0xa0
[ 12.718964] ? tcp_ack+0xaf1/0x10b0
[ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
[ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc]
[ 12.721459] ? dst_alloc+0x4c/0xa0
[ 12.722130] udp_queue_rcv_skb+0x264/0x490
[ 12.722924] __udp4_lib_rcv+0x21e/0x990
[ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0
[ 12.724442] ? tcp_v4_rcv+0x958/0xa40
[ 12.725039] udp_rcv+0x1a/0x20
[ 12.725587] ip_local_deliver_finish+0x97/0x1d0
[ 12.726323] ip_local_deliver+0xaf/0xc0
[ 12.726959] ? ip_route_input_noref+0x19/0x20
[ 12.727689] ip_rcv_finish+0xdd/0x3b0
[ 12.728307] ip_rcv+0x2ac/0x360
[ 12.728839] __netif_receive_skb_core+0x6fb/0xa90
[ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0
[ 12.730274] __netif_receive_skb+0x1d/0x60
[ 12.730953] ? __netif_receive_skb+0x1d/0x60
[ 12.731637] netif_receive_skb_internal+0x37/0xd0
[ 12.732371] napi_gro_receive+0xc7/0xf0
[ 12.732920] receive_buf+0x3c3/0xd40
[ 12.733441] virtnet_poll+0xb1/0x250
[ 12.733944] net_rx_action+0x23e/0x370
[ 12.734476] __do_softirq+0xc5/0x2f8
[ 12.734922] irq_exit+0xfa/0x100
[ 12.735315] do_IRQ+0x4f/0xd0
[ 12.735680] common_interrupt+0xa2/0xa2
[ 12.736126] </IRQ>
[ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10
[ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
[ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
[ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
[ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
[ 12.741831] default_idle+0x2a/0x100
[ 12.742323] arch_cpu_idle+0xf/0x20
[ 12.742796] default_idle_call+0x28/0x40
[ 12.743312] do_idle+0x179/0x1f0
[ 12.743761] cpu_startup_entry+0x1d/0x20
[ 12.744291] start_secondary+0x112/0x120
[ 12.744816] secondary_startup_64+0xa5/0xa5
[ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
[ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
[ 12.748555] ---[ end trace 1399ab83390650fd ]---
[ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt
[ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 12.751215] Rebooting in 60 seconds..

Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
Signed-off-by: Tommi Rantala <[email protected]>
Cc: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/udp_media.c | 4 ----
1 file changed, 4 deletions(-)

--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -371,10 +371,6 @@ static int tipc_udp_recv(struct sock *sk
goto rcu_out;
}

- tipc_rcv(sock_net(sk), skb, b);
- rcu_read_unlock();
- return 0;
-
rcu_out:
rcu_read_unlock();
out:


2017-12-15 09:52:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 16/27] Fix handling of verdicts after NF_QUEUE

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Debabrata Banerjee <[email protected]>

[This fix is only needed for v4.9 stable since v4.10+ does not have the issue]

A verdict of NF_STOLEN after NF_QUEUE will cause an incorrect return value
and a potential kernel panic via double free of skb's

This was broken by commit 7034b566a4e7 ("netfilter: fix nf_queue handling")
and subsequently fixed in v4.10 by commit c63cbc460419 ("netfilter:
use switch() to handle verdict cases from nf_hook_slow()"). However that
commit cannot be cleanly cherry-picked to v4.9

Signed-off-by: Debabrata Banerjee <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>

---
net/netfilter/core.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -364,6 +364,11 @@ next_hook:
ret = nf_queue(skb, state, &entry, verdict);
if (ret == 1 && entry)
goto next_hook;
+ } else {
+ /* Implicit handling for NF_STOLEN, as well as any other
+ * non conventional verdicts.
+ */
+ ret = 0;
}
return ret;
}


2017-12-15 10:16:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 17/27] ipmi: Stop timers before cleaning up the module

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masamitsu Yamazaki <[email protected]>

commit 4f7f5551a760eb0124267be65763008169db7087 upstream.

System may crash after unloading ipmi_si.ko module
because a timer may remain and fire after the module cleaned up resources.

cleanup_one_si() contains the following processing.

/*
* Make sure that interrupts, the timer and the thread are
* stopped and will not run again.
*/
if (to_clean->irq_cleanup)
to_clean->irq_cleanup(to_clean);
wait_for_timer_and_thread(to_clean);

/*
* Timeouts are stopped, now make sure the interrupts are off
* in the BMC. Note that timers and CPU interrupts are off,
* so no need for locks.
*/
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);
}

si_state changes as following in the while loop calling poll(to_clean).

SI_GETTING_MESSAGES
=> SI_CHECKING_ENABLES
=> SI_SETTING_ENABLES
=> SI_GETTING_EVENTS
=> SI_NORMAL

As written in the code comments above,
timers are expected to stop before the polling loop and not to run again.
But the timer is set again in the following process
when si_state becomes SI_SETTING_ENABLES.

=> poll
=> smi_event_handler
=> handle_transaction_done
// smi_info->si_state == SI_SETTING_ENABLES
=> start_getting_events
=> start_new_msg
=> smi_mod_timer
=> mod_timer

As a result, before the timer set in start_new_msg() expires,
the polling loop may see si_state becoming SI_NORMAL
and the module clean-up finishes.

For example, hard LOCKUP and panic occurred as following.
smi_timeout was called after smi_event_handler,
kcs_event and hangs at port_inb()
trying to access I/O port after release.

[exception RIP: port_inb+19]
RIP: ffffffffc0473053 RSP: ffff88069fdc3d80 RFLAGS: 00000006
RAX: ffff8806800f8e00 RBX: ffff880682bd9400 RCX: 0000000000000000
RDX: 0000000000000ca3 RSI: 0000000000000ca3 RDI: ffff8806800f8e40
RBP: ffff88069fdc3d80 R8: ffffffff81d86dfc R9: ffffffff81e36426
R10: 00000000000509f0 R11: 0000000000100000 R12: 0000000000]:000000
R13: 0000000000000000 R14: 0000000000000246 R15: ffff8806800f8e00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
--- <NMI exception stack> ---

To fix the problem I defined a flag, timer_can_start,
as member of struct smi_info.
The flag is enabled immediately after initializing the timer
and disabled immediately before waiting for timer deletion.

Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
Signed-off-by: Yamazaki Masamitsu <[email protected]>
[Adjusted for recent changes in the driver.]
[Some fairly major changes went into the IPMI driver in 4.15, so this
required a backport as the code had changed and moved to a different
file. The 4.14 version of this patch moved some code under an
if statement causing it to not apply to 4.7-4.13.]
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/ipmi/ipmi_si_intf.c | 44 ++++++++++++++++++++-------------------
1 file changed, 23 insertions(+), 21 deletions(-)

--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -241,6 +241,9 @@ struct smi_info {
/* The timer for this si. */
struct timer_list si_timer;

+ /* This flag is set, if the timer can be set */
+ bool timer_can_start;
+
/* This flag is set, if the timer is running (timer_pending() isn't enough) */
bool timer_running;

@@ -416,6 +419,8 @@ out:

static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val)
{
+ if (!smi_info->timer_can_start)
+ return;
smi_info->last_timeout_jiffies = jiffies;
mod_timer(&smi_info->si_timer, new_val);
smi_info->timer_running = true;
@@ -435,21 +440,18 @@ static void start_new_msg(struct smi_inf
smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
}

-static void start_check_enables(struct smi_info *smi_info, bool start_timer)
+static void start_check_enables(struct smi_info *smi_info)
{
unsigned char msg[2];

msg[0] = (IPMI_NETFN_APP_REQUEST << 2);
msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD;

- if (start_timer)
- start_new_msg(smi_info, msg, 2);
- else
- smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
+ start_new_msg(smi_info, msg, 2);
smi_info->si_state = SI_CHECKING_ENABLES;
}

-static void start_clear_flags(struct smi_info *smi_info, bool start_timer)
+static void start_clear_flags(struct smi_info *smi_info)
{
unsigned char msg[3];

@@ -458,10 +460,7 @@ static void start_clear_flags(struct smi
msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD;
msg[2] = WDT_PRE_TIMEOUT_INT;

- if (start_timer)
- start_new_msg(smi_info, msg, 3);
- else
- smi_info->handlers->start_transaction(smi_info->si_sm, msg, 3);
+ start_new_msg(smi_info, msg, 3);
smi_info->si_state = SI_CLEARING_FLAGS;
}

@@ -496,11 +495,11 @@ static void start_getting_events(struct
* Note that we cannot just use disable_irq(), since the interrupt may
* be shared.
*/
-static inline bool disable_si_irq(struct smi_info *smi_info, bool start_timer)
+static inline bool disable_si_irq(struct smi_info *smi_info)
{
if ((smi_info->irq) && (!smi_info->interrupt_disabled)) {
smi_info->interrupt_disabled = true;
- start_check_enables(smi_info, start_timer);
+ start_check_enables(smi_info);
return true;
}
return false;
@@ -510,7 +509,7 @@ static inline bool enable_si_irq(struct
{
if ((smi_info->irq) && (smi_info->interrupt_disabled)) {
smi_info->interrupt_disabled = false;
- start_check_enables(smi_info, true);
+ start_check_enables(smi_info);
return true;
}
return false;
@@ -528,7 +527,7 @@ static struct ipmi_smi_msg *alloc_msg_ha

msg = ipmi_alloc_smi_msg();
if (!msg) {
- if (!disable_si_irq(smi_info, true))
+ if (!disable_si_irq(smi_info))
smi_info->si_state = SI_NORMAL;
} else if (enable_si_irq(smi_info)) {
ipmi_free_smi_msg(msg);
@@ -544,7 +543,7 @@ retry:
/* Watchdog pre-timeout */
smi_inc_stat(smi_info, watchdog_pretimeouts);

- start_clear_flags(smi_info, true);
+ start_clear_flags(smi_info);
smi_info->msg_flags &= ~WDT_PRE_TIMEOUT_INT;
if (smi_info->intf)
ipmi_smi_watchdog_pretimeout(smi_info->intf);
@@ -927,7 +926,7 @@ restart:
* disable and messages disabled.
*/
if (smi_info->supports_event_msg_buff || smi_info->irq) {
- start_check_enables(smi_info, true);
+ start_check_enables(smi_info);
} else {
smi_info->curr_msg = alloc_msg_handle_irq(smi_info);
if (!smi_info->curr_msg)
@@ -1234,6 +1233,7 @@ static int smi_start_processing(void

/* Set up the timer that drives the interface. */
setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
+ new_smi->timer_can_start = true;
smi_mod_timer(new_smi, jiffies + SI_TIMEOUT_JIFFIES);

/* Try to claim any interrupts. */
@@ -3448,10 +3448,12 @@ static void check_for_broken_irqs(struct
check_set_rcv_irq(smi_info);
}

-static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
+static inline void stop_timer_and_thread(struct smi_info *smi_info)
{
if (smi_info->thread != NULL)
kthread_stop(smi_info->thread);
+
+ smi_info->timer_can_start = false;
if (smi_info->timer_running)
del_timer_sync(&smi_info->si_timer);
}
@@ -3593,7 +3595,7 @@ static int try_smi_init(struct smi_info
* Start clearing the flags before we enable interrupts or the
* timer to avoid racing with the timer.
*/
- start_clear_flags(new_smi, false);
+ start_clear_flags(new_smi);

/*
* IRQ is defined to be set when non-zero. req_events will
@@ -3671,7 +3673,7 @@ static int try_smi_init(struct smi_info
return 0;

out_err_stop_timer:
- wait_for_timer_and_thread(new_smi);
+ stop_timer_and_thread(new_smi);

out_err:
new_smi->interrupt_disabled = true;
@@ -3865,7 +3867,7 @@ static void cleanup_one_si(struct smi_in
*/
if (to_clean->irq_cleanup)
to_clean->irq_cleanup(to_clean);
- wait_for_timer_and_thread(to_clean);
+ stop_timer_and_thread(to_clean);

/*
* Timeouts are stopped, now make sure the interrupts are off
@@ -3876,7 +3878,7 @@ static void cleanup_one_si(struct smi_in
poll(to_clean);
schedule_timeout_uninterruptible(1);
}
- disable_si_irq(to_clean, false);
+ disable_si_irq(to_clean);
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);


2017-12-15 10:17:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 13/27] s390/qeth: fix GSO throughput regression

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 6d69b1f1eb7a2edf8a3547f361c61f2538e054bb ]

Using GSO with small MTUs currently results in a substantial throughput
regression - which is caused by how qeth needs to map non-linear skbs
into its IO buffer elements:
compared to a linear skb, each GSO-segmented skb effectively consumes
twice as many buffer elements (ie two instead of one) due to the
additional header-only part. This causes the Output Queue to be
congested with low-utilized IO buffers.

Fix this as follows:
If the MSS is low enough so that a non-SG GSO segmentation produces
order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
where we anticipate the biggest savings, since an SG-enabled
GSO segmentation produces skbs that always consume at least two
buffer elements.

Larger MSS values continue to get a SG-enabled GSO segmentation, since
1) the relative overhead of the additional header-only buffer element
becomes less noticeable, and
2) the linearization overhead increases.

With the throughput regression fixed, re-enable NETIF_F_SG by default to
reap the significant CPU savings of GSO.

Fixes: 5722963a8e83 ("qeth: do not turn on SG per default")
Reported-by: Nils Hoppmann <[email protected]>
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_core.h | 3 +++
drivers/s390/net/qeth_core_main.c | 31 +++++++++++++++++++++++++++++++
drivers/s390/net/qeth_l2_main.c | 2 ++
drivers/s390/net/qeth_l3_main.c | 2 ++
4 files changed, 38 insertions(+)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -1004,6 +1004,9 @@ struct qeth_cmd_buffer *qeth_get_setassp
int qeth_set_features(struct net_device *, netdev_features_t);
int qeth_recover_features(struct net_device *);
netdev_features_t qeth_fix_features(struct net_device *, netdev_features_t);
+netdev_features_t qeth_features_check(struct sk_buff *skb,
+ struct net_device *dev,
+ netdev_features_t features);

/* exports for OSN */
int qeth_osn_assist(struct net_device *, void *, int);
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -19,6 +19,11 @@
#include <linux/mii.h>
#include <linux/kthread.h>
#include <linux/slab.h>
+#include <linux/if_vlan.h>
+#include <linux/netdevice.h>
+#include <linux/netdev_features.h>
+#include <linux/skbuff.h>
+
#include <net/iucv/af_iucv.h>
#include <net/dsfield.h>

@@ -6240,6 +6245,32 @@ netdev_features_t qeth_fix_features(stru
}
EXPORT_SYMBOL_GPL(qeth_fix_features);

+netdev_features_t qeth_features_check(struct sk_buff *skb,
+ struct net_device *dev,
+ netdev_features_t features)
+{
+ /* GSO segmentation builds skbs with
+ * a (small) linear part for the headers, and
+ * page frags for the data.
+ * Compared to a linear skb, the header-only part consumes an
+ * additional buffer element. This reduces buffer utilization, and
+ * hurts throughput. So compress small segments into one element.
+ */
+ if (netif_needs_gso(skb, features)) {
+ /* match skb_segment(): */
+ unsigned int doffset = skb->data - skb_mac_header(skb);
+ unsigned int hsize = skb_shinfo(skb)->gso_size;
+ unsigned int hroom = skb_headroom(skb);
+
+ /* linearize only if resulting skb allocations are order-0: */
+ if (SKB_DATA_ALIGN(hroom + doffset + hsize) <= SKB_MAX_HEAD(0))
+ features &= ~NETIF_F_SG;
+ }
+
+ return vlan_features_check(skb, features);
+}
+EXPORT_SYMBOL_GPL(qeth_features_check);
+
static int __init qeth_core_init(void)
{
int rc;
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1084,6 +1084,7 @@ static const struct net_device_ops qeth_
.ndo_stop = qeth_l2_stop,
.ndo_get_stats = qeth_get_stats,
.ndo_start_xmit = qeth_l2_hard_start_xmit,
+ .ndo_features_check = qeth_features_check,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_rx_mode = qeth_l2_set_rx_mode,
.ndo_do_ioctl = qeth_l2_do_ioctl,
@@ -1128,6 +1129,7 @@ static int qeth_l2_setup_netdev(struct q
if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) {
card->dev->hw_features = NETIF_F_SG;
card->dev->vlan_features = NETIF_F_SG;
+ card->dev->features |= NETIF_F_SG;
/* OSA 3S and earlier has no RX/TX support */
if (qeth_is_supported(card, IPA_OUTBOUND_CHECKSUM)) {
card->dev->hw_features |= NETIF_F_IP_CSUM;
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3066,6 +3066,7 @@ static const struct net_device_ops qeth_
.ndo_stop = qeth_l3_stop,
.ndo_get_stats = qeth_get_stats,
.ndo_start_xmit = qeth_l3_hard_start_xmit,
+ .ndo_features_check = qeth_features_check,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_rx_mode = qeth_l3_set_multicast_list,
.ndo_do_ioctl = qeth_l3_do_ioctl,
@@ -3122,6 +3123,7 @@ static int qeth_l3_setup_netdev(struct q
card->dev->vlan_features = NETIF_F_SG |
NETIF_F_RXCSUM | NETIF_F_IP_CSUM |
NETIF_F_TSO;
+ card->dev->features |= NETIF_F_SG;
}
}
} else if (card->info.type == QETH_CARD_TYPE_IQD) {


2017-12-15 17:40:21

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/27] 4.9.70-stable review

On Fri, Dec 15, 2017 at 10:51:43AM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.70 release.
> There are 27 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Dec 17 09:22:42 UTC 2017.
> Anything received after that time might be too late.
>
Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 124 pass: 124 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2017-12-15 21:13:02

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/27] 4.9.70-stable review

On 12/15/2017 02:51 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.70 release.
> There are 27 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Dec 17 09:22:42 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.70-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compile and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-12-16 05:56:00

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/27] 4.9.70-stable review

On 15 December 2017 at 15:21, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.9.70 release.
> There are 27 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Dec 17 09:22:42 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.70-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64 and x86_64.

Note:
We do not have arm64 hikey board results due to the changes required
from the firmware update. Hikey board is under maintenance.

Newly added selftests/net/reuseport_bpf FAILED in full run on x86_64
and beagle board x15 and the independent test execution resulted as PASS.
For the internal investigation bug reported.
https://bugs.linaro.org/show_bug.cgi?id=3502#c4

Summary
------------------------------------------------------------------------

kernel: 4.9.70-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: cc35c65be8bb29a0282724447bc694353125a35d
git describe: v4.9.69-28-gcc35c65be8bb
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.69-28-gcc35c65be8bb

No regressions (compared to build v4.9.69-22-g9ba613a08872)

Boards, architectures and test suites:
-------------------------------------
juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 40, skip: 23
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 985, skip: 121
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - fail: 1, pass: 36, skip: 25
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1036, skip: 66
* ltp-timers-tests - pass: 12,

x86_64
* boot - pass: 20,
* kselftest - fail: 1, pass: 54, skip: 23
* libhugetlbfs - pass: 89, skip: 1
* ltp-cap_bounds-tests - pass: 1,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 61, skip: 1
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 10,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 9,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 957, skip: 163
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju <[email protected]>