2016-12-09 16:18:06

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 00/28] 4.4.38-stable review

This is the start of the stable review cycle for the 4.4.38 release.
There are 28 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Dec 11 16:17:32 UTC 2016.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.38-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.38-rc1

Tobias Brunner <[email protected]>
esp6: Fix integrity verification when ESN are used

Tobias Brunner <[email protected]>
esp4: Fix integrity verification when ESN are used

Eli Cooper <[email protected]>
ipv4: Set skb->protocol properly for local output

Eli Cooper <[email protected]>
ipv6: Set skb->protocol properly for local output

Al Viro <[email protected]>
constify iov_iter_count() and iter_is_iovec()

Linus Torvalds <[email protected]>
Don't feed anything but regular iovec's to blk_rq_map_user_iov

Thomas Tai <[email protected]>
sparc64: fix compile warning section mismatch in find_node()

Thomas Tai <[email protected]>
sparc64: Fix find_node warning if numa node cannot be found

Andreas Larsson <[email protected]>
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns

Kees Cook <[email protected]>
net: ping: check minimum size on ICMP header length

Eric Dumazet <[email protected]>
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

Sabrina Dubroca <[email protected]>
geneve: avoid use-after-free of skb->data

Chris Brandt <[email protected]>
sh_eth: remove unchecked interrupts for RZ/A1

Florian Fainelli <[email protected]>
net: bcmgenet: Utilize correct struct device for all DMA operations

Philip Pettersson <[email protected]>
packet: fix race condition in packet_set_ring

Eric Dumazet <[email protected]>
net/dccp: fix use-after-free in dccp_invalid_packet

Herbert Xu <[email protected]>
netlink: Do not schedule work from sk_destruct

Herbert Xu <[email protected]>
netlink: Call cb->done from a worker thread

Amir Vadai <[email protected]>
net/sched: pedit: make sure that offset is valid

Daniel Borkmann <[email protected]>
net, sched: respect rcu grace period on cls destruction

Florian Fainelli <[email protected]>
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change

Guillaume Nault <[email protected]>
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()

Sabrina Dubroca <[email protected]>
rtnetlink: fix FDB size computation

WANG Cong <[email protected]>
af_unix: conditionally use freezable blocking calls in read

Jeremy Linton <[email protected]>
net: sky2: Fix shutdown crash

Paolo Abeni <[email protected]>
ip6_tunnel: disable caching when the traffic class is inherited

WANG Cong <[email protected]>
net: check dead netns for peernet2id_alloc()

Eric Dumazet <[email protected]>
virtio-net: add a missing synchronize_net()


-------------

Diffstat:

Makefile | 4 +-
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/mm/init_64.c | 71 +++++++++++++++++++++++---
block/blk-map.c | 3 ++
drivers/net/dsa/bcm_sf2.c | 4 ++
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 8 +--
drivers/net/ethernet/marvell/sky2.c | 13 +++++
drivers/net/ethernet/renesas/sh_eth.c | 2 +-
drivers/net/geneve.c | 14 ++---
drivers/net/virtio_net.c | 5 ++
include/linux/uio.h | 4 +-
net/core/net_namespace.c | 2 +
net/core/rtnetlink.c | 5 +-
net/core/sock.c | 4 +-
net/dccp/ipv4.c | 12 +++--
net/ipv4/esp4.c | 2 +-
net/ipv4/ip_output.c | 3 ++
net/ipv4/ping.c | 4 ++
net/ipv6/esp6.c | 2 +-
net/ipv6/ip6_tunnel.c | 13 ++++-
net/ipv6/output_core.c | 2 +
net/l2tp/l2tp_ip.c | 5 +-
net/l2tp/l2tp_ip6.c | 5 +-
net/netlink/af_netlink.c | 21 +++++++-
net/netlink/af_netlink.h | 2 +
net/packet/af_packet.c | 18 ++++---
net/sched/act_pedit.c | 24 +++++++--
net/sched/cls_basic.c | 4 --
net/sched/cls_bpf.c | 4 --
net/sched/cls_cgroup.c | 7 ++-
net/sched/cls_flow.c | 1 -
net/sched/cls_flower.c | 31 +++++++++--
net/sched/cls_rsvp.h | 3 +-
net/sched/cls_tcindex.c | 1 -
net/unix/af_unix.c | 17 +++---
35 files changed, 243 insertions(+), 81 deletions(-)



2016-12-09 16:18:14

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 01/28] virtio-net: add a missing synchronize_net()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 963abe5c8a0273a1cf5913556da1b1189de0e57a ]

It seems many drivers do not respect napi_hash_del() contract.

When napi_hash_del() is used before netif_napi_del(), an RCU grace
period is needed before freeing NAPI object.

Fixes: 91815639d880 ("virtio-net: rx busy polling support")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/virtio_net.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1465,6 +1465,11 @@ static void virtnet_free_queues(struct v
netif_napi_del(&vi->rq[i].napi);
}

+ /* We called napi_hash_del() before netif_napi_del(),
+ * we need to respect an RCU grace period before freeing vi->rq
+ */
+ synchronize_net();
+
kfree(vi->rq);
kfree(vi->sq);
}


2016-12-09 16:18:21

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 12/28] netlink: Do not schedule work from sk_destruct

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Herbert Xu <[email protected]>


[ Upstream commit ed5d7788a934a4b6d6d025e948ed4da496b4f12e ]

It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.

Instead we should do the deferral prior to sk_free.

This patch does just that.

Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
Signed-off-by: Herbert Xu <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/netlink/af_netlink.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -924,11 +924,13 @@ static void netlink_skb_set_owner_r(stru
sk_mem_charge(sk, skb->truesize);
}

-static void __netlink_sock_destruct(struct sock *sk)
+static void netlink_sock_destruct(struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);

if (nlk->cb_running) {
+ if (nlk->cb.done)
+ nlk->cb.done(&nlk->cb);
module_put(nlk->cb.module);
kfree_skb(nlk->cb.skb);
}
@@ -962,21 +964,7 @@ static void netlink_sock_destruct_work(s
struct netlink_sock *nlk = container_of(work, struct netlink_sock,
work);

- nlk->cb.done(&nlk->cb);
- __netlink_sock_destruct(&nlk->sk);
-}
-
-static void netlink_sock_destruct(struct sock *sk)
-{
- struct netlink_sock *nlk = nlk_sk(sk);
-
- if (nlk->cb_running && nlk->cb.done) {
- INIT_WORK(&nlk->work, netlink_sock_destruct_work);
- schedule_work(&nlk->work);
- return;
- }
-
- __netlink_sock_destruct(sk);
+ sk_free(&nlk->sk);
}

/* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on
@@ -1284,8 +1272,18 @@ out_module:
static void deferred_put_nlk_sk(struct rcu_head *head)
{
struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
+ struct sock *sk = &nlk->sk;
+
+ if (!atomic_dec_and_test(&sk->sk_refcnt))
+ return;
+
+ if (nlk->cb_running && nlk->cb.done) {
+ INIT_WORK(&nlk->work, netlink_sock_destruct_work);
+ schedule_work(&nlk->work);
+ return;
+ }

- sock_put(&nlk->sk);
+ sk_free(sk);
}

static int netlink_release(struct socket *sock)


2016-12-09 16:18:29

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 15/28] net: bcmgenet: Utilize correct struct device for all DMA operations

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit 8c4799ac799665065f9bf1364fd71bf4f7dc6a4a ]

__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1168,6 +1168,7 @@ static unsigned int __bcmgenet_tx_reclai
struct bcmgenet_tx_ring *ring)
{
struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct device *kdev = &priv->pdev->dev;
struct enet_cb *tx_cb_ptr;
struct netdev_queue *txq;
unsigned int pkts_compl = 0;
@@ -1195,7 +1196,7 @@ static unsigned int __bcmgenet_tx_reclai
pkts_compl++;
dev->stats.tx_packets++;
dev->stats.tx_bytes += tx_cb_ptr->skb->len;
- dma_unmap_single(&dev->dev,
+ dma_unmap_single(kdev,
dma_unmap_addr(tx_cb_ptr, dma_addr),
dma_unmap_len(tx_cb_ptr, dma_len),
DMA_TO_DEVICE);
@@ -1203,7 +1204,7 @@ static unsigned int __bcmgenet_tx_reclai
} else if (dma_unmap_addr(tx_cb_ptr, dma_addr)) {
dev->stats.tx_bytes +=
dma_unmap_len(tx_cb_ptr, dma_len);
- dma_unmap_page(&dev->dev,
+ dma_unmap_page(kdev,
dma_unmap_addr(tx_cb_ptr, dma_addr),
dma_unmap_len(tx_cb_ptr, dma_len),
DMA_TO_DEVICE);
@@ -1754,6 +1755,7 @@ static int bcmgenet_alloc_rx_buffers(str

static void bcmgenet_free_rx_buffers(struct bcmgenet_priv *priv)
{
+ struct device *kdev = &priv->pdev->dev;
struct enet_cb *cb;
int i;

@@ -1761,7 +1763,7 @@ static void bcmgenet_free_rx_buffers(str
cb = &priv->rx_cbs[i];

if (dma_unmap_addr(cb, dma_addr)) {
- dma_unmap_single(&priv->dev->dev,
+ dma_unmap_single(kdev,
dma_unmap_addr(cb, dma_addr),
priv->rx_buf_len, DMA_FROM_DEVICE);
dma_unmap_addr_set(cb, dma_addr, 0);


2016-12-09 16:18:34

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 16/28] sh_eth: remove unchecked interrupts for RZ/A1

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chris Brandt <[email protected]>


[ Upstream commit 33d446dbba4d4d6a77e1e900d434fa99e0f02c86 ]

When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.

irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21

Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/renesas/sh_eth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -832,7 +832,7 @@ static struct sh_eth_cpu_data r7s72100_d

.ecsr_value = ECSR_ICD,
.ecsipr_value = ECSIPR_ICDIP,
- .eesipr_value = 0xff7f009f,
+ .eesipr_value = 0xe77f009f,

.tx_check = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |


2016-12-09 16:18:38

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 18/28] net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit b98b0bc8c431e3ceb4b26b0dfc8db509518fb290 ]

CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...

Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.

This needs to be backported to all known linux kernels.

Again, many thanks to syzkaller team for discovering this gem.

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/sock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -745,7 +745,7 @@ int sock_setsockopt(struct socket *sock,
val = min_t(u32, val, sysctl_wmem_max);
set_sndbuf:
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
- sk->sk_sndbuf = max_t(u32, val * 2, SOCK_MIN_SNDBUF);
+ sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
/* Wake up sending tasks if we upped the value. */
sk->sk_write_space(sk);
break;
@@ -781,7 +781,7 @@ set_rcvbuf:
* returning the value we actually used in getsockopt
* is the most desirable behavior.
*/
- sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
+ sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
break;

case SO_RCVBUFFORCE:


2016-12-09 16:18:48

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 04/28] net: sky2: Fix shutdown crash

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeremy Linton <[email protected]>


[ Upstream commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b ]

The sky2 frequently crashes during machine shutdown with:

sky2_get_stats+0x60/0x3d8 [sky2]
dev_get_stats+0x68/0xd8
rtnl_fill_stats+0x54/0x140
rtnl_fill_ifinfo+0x46c/0xc68
rtmsg_ifinfo_build_skb+0x7c/0xf0
rtmsg_ifinfo.part.22+0x3c/0x70
rtmsg_ifinfo+0x50/0x5c
netdev_state_change+0x4c/0x58
linkwatch_do_dev+0x50/0x88
__linkwatch_run_queue+0x104/0x1a4
linkwatch_event+0x30/0x3c
process_one_work+0x140/0x3e0
worker_thread+0x60/0x44c
kthread+0xdc/0xf0
ret_from_fork+0x10/0x50

This is caused by the sky2 being called after it has been shutdown.
A previous thread about this can be found here:

https://lkml.org/lkml/2016/4/12/410

An alternative fix is to assure that IFF_UP gets cleared by
calling dev_close() during shutdown. This is similar to what the
bnx2/tg3/xgene and maybe others are doing to assure that the driver
isn't being called following _shutdown().

Signed-off-by: Jeremy Linton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/sky2.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -5220,6 +5220,19 @@ static SIMPLE_DEV_PM_OPS(sky2_pm_ops, sk

static void sky2_shutdown(struct pci_dev *pdev)
{
+ struct sky2_hw *hw = pci_get_drvdata(pdev);
+ int port;
+
+ for (port = 0; port < hw->ports; port++) {
+ struct net_device *ndev = hw->dev[port];
+
+ rtnl_lock();
+ if (netif_running(ndev)) {
+ dev_close(ndev);
+ netif_device_detach(ndev);
+ }
+ rtnl_unlock();
+ }
sky2_suspend(&pdev->dev);
pci_wake_from_d3(pdev, device_may_wakeup(&pdev->dev));
pci_set_power_state(pdev, PCI_D3hot);


2016-12-09 16:18:58

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 24/28] constify iov_iter_count() and iter_is_iovec()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit b57332b4105abf1d518d93886e547ee2f98cd414 upstream.

[stable note, need this to prevent build warning in commit
a0ac402cfcdc904f9772e1762b3fda112dcc56a0]

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/uio.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -101,12 +101,12 @@ int iov_iter_npages(const struct iov_ite

const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags);

-static inline size_t iov_iter_count(struct iov_iter *i)
+static inline size_t iov_iter_count(const struct iov_iter *i)
{
return i->count;
}

-static inline bool iter_is_iovec(struct iov_iter *i)
+static inline bool iter_is_iovec(const struct iov_iter *i)
{
return !(i->type & (ITER_BVEC | ITER_KVEC));
}


2016-12-09 16:19:07

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 26/28] ipv4: Set skb->protocol properly for local output

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eli Cooper <[email protected]>

commit f4180439109aa720774baafdd798b3234ab1a0d2 upstream.

When xfrm is applied to TSO/GSO packets, it follows this path:

xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.

Signed-off-by: Eli Cooper <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ipv4/ip_output.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -102,6 +102,9 @@ int __ip_local_out(struct net *net, stru

iph->tot_len = htons(skb->len);
ip_send_check(iph);
+
+ skb->protocol = htons(ETH_P_IP);
+
return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT,
net, sk, skb, NULL, skb_dst(skb)->dev,
dst_output);


2016-12-09 16:18:41

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 19/28] net: ping: check minimum size on ICMP header length

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>


[ Upstream commit 0eab121ef8750a5c8637d51534d5e9143fb0633f ]

Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.

This was found using trinity with KASAN on v3.18:

BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[< inline >] print_address_description mm/kasan/report.c:147
[< inline >] kasan_report_error mm/kasan/report.c:236
[<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[< inline >] check_memory_region mm/kasan/kasan.c:264
[<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[< inline >] memcpy_from_msg include/linux/skbuff.h:2667
[<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[< inline >] __sock_sendmsg_nosec net/socket.c:624
[< inline >] __sock_sendmsg net/socket.c:632
[<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
[< inline >] SYSC_sendto net/socket.c:1797
[<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761

CVE-2016-8399

Reported-by: Qidan He <[email protected]>
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ping.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -660,6 +660,10 @@ int ping_common_sendmsg(int family, stru
if (len > 0xFFFF)
return -EMSGSIZE;

+ /* Must have at least a full ICMP header. */
+ if (len < icmph_len)
+ return -EINVAL;
+
/*
* Check the flags.
*/


2016-12-09 16:18:55

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 23/28] Dont feed anything but regular iovecs to blk_rq_map_user_iov

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0 upstream.

In theory we could map other things, but there's a reason that function
is called "user_iov". Using anything else (like splice can do) just
confuses it.

Reported-and-tested-by: Johannes Thumshirn <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
block/blk-map.c | 3 +++
1 file changed, 3 insertions(+)

--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -90,6 +90,9 @@ int blk_rq_map_user_iov(struct request_q
if (!iter || !iter->count)
return -EINVAL;

+ if (!iter_is_iovec(iter))
+ return -EINVAL;
+
iov_for_each(iov, i, *iter) {
unsigned long uaddr = (unsigned long) iov.iov_base;



2016-12-09 16:19:02

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 25/28] ipv6: Set skb->protocol properly for local output

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eli Cooper <[email protected]>

commit b4e479a96fc398ccf83bb1cffb4ffef8631beaf1 upstream.

When xfrm is applied to TSO/GSO packets, it follows this path:

xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.

Signed-off-by: Eli Cooper <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ipv6/output_core.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -148,6 +148,8 @@ int __ip6_local_out(struct net *net, str
ipv6_hdr(skb)->payload_len = htons(len);
IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);

+ skb->protocol = htons(ETH_P_IPV6);
+
return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
net, sk, skb, NULL, skb_dst(skb)->dev,
dst_output);


2016-12-09 16:18:51

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 22/28] sparc64: fix compile warning section mismatch in find_node()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Tai <[email protected]>


[ Upstream commit 87a349f9cc0908bc0cfac0c9ece3179f650ae95a ]

A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.

Signed-off-by: Thomas Tai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/mm/init_64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -803,7 +803,7 @@ static int num_mblocks;
static int find_numa_node_for_addr(unsigned long pa,
struct node_mem_mask *pnode_mask);

-static unsigned long ra_to_pa(unsigned long addr)
+static unsigned long __init ra_to_pa(unsigned long addr)
{
int i;

@@ -819,7 +819,7 @@ static unsigned long ra_to_pa(unsigned l
return addr;
}

-static int find_node(unsigned long addr)
+static int __init find_node(unsigned long addr)
{
static bool search_mdesc = true;
static struct node_mem_mask last_mem_mask = { ~0UL, ~0UL };
@@ -856,7 +856,7 @@ static int find_node(unsigned long addr)
return last_index;
}

-static u64 memblock_nid_range(u64 start, u64 end, int *nid)
+static u64 __init memblock_nid_range(u64 start, u64 end, int *nid)
{
*nid = find_node(start);
start += PAGE_SIZE;


2016-12-09 16:19:12

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 28/28] esp6: Fix integrity verification when ESN are used

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tobias Brunner <[email protected]>

commit a55e23864d381c5a4ef110df94b00b2fe121a70d upstream.

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ipv6/esp6.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -418,7 +418,7 @@ static int esp6_input(struct xfrm_state
esph = (void *)skb_push(skb, 4);
*seqhi = esph->spi;
esph->spi = esph->seq_no;
- esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.input.hi);
+ esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
aead_request_set_callback(req, 0, esp_input_done_esn, skb);
}



2016-12-09 16:19:18

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 05/28] af_unix: conditionally use freezable blocking calls in read

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit 06a77b07e3b44aea2b3c0e64de420ea2cfdcbaa9 ]

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit 2b514574f7e8
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit 0f9548ca1091:

We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.

The pipe_lock is still held at that point.

So use freezable version only for the recvmsg call path, avoid impact for
Android.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/unix/af_unix.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2194,7 +2194,8 @@ out:
* Sleep until more data has arrived. But check for races..
*/
static long unix_stream_data_wait(struct sock *sk, long timeo,
- struct sk_buff *last, unsigned int last_len)
+ struct sk_buff *last, unsigned int last_len,
+ bool freezable)
{
struct sk_buff *tail;
DEFINE_WAIT(wait);
@@ -2215,7 +2216,10 @@ static long unix_stream_data_wait(struct

sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
unix_state_unlock(sk);
- timeo = freezable_schedule_timeout(timeo);
+ if (freezable)
+ timeo = freezable_schedule_timeout(timeo);
+ else
+ timeo = schedule_timeout(timeo);
unix_state_lock(sk);

if (sock_flag(sk, SOCK_DEAD))
@@ -2245,7 +2249,8 @@ struct unix_stream_read_state {
unsigned int splice_flags;
};

-static int unix_stream_read_generic(struct unix_stream_read_state *state)
+static int unix_stream_read_generic(struct unix_stream_read_state *state,
+ bool freezable)
{
struct scm_cookie scm;
struct socket *sock = state->socket;
@@ -2324,7 +2329,7 @@ again:
mutex_unlock(&u->iolock);

timeo = unix_stream_data_wait(sk, timeo, last,
- last_len);
+ last_len, freezable);

if (signal_pending(current)) {
err = sock_intr_errno(timeo);
@@ -2466,7 +2471,7 @@ static int unix_stream_recvmsg(struct so
.flags = flags
};

- return unix_stream_read_generic(&state);
+ return unix_stream_read_generic(&state, true);
}

static ssize_t skb_unix_socket_splice(struct sock *sk,
@@ -2512,7 +2517,7 @@ static ssize_t unix_stream_splice_read(s
flags & SPLICE_F_NONBLOCK)
state.flags = MSG_DONTWAIT;

- return unix_stream_read_generic(&state);
+ return unix_stream_read_generic(&state, false);
}

static int unix_shutdown(struct socket *sock, int mode)


2016-12-09 16:19:26

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 07/28] l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Guillaume Nault <[email protected]>


[ Upstream commit 32c231164b762dddefa13af5a0101032c70b50ef ]

Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
Without lock, a concurrent call could modify the socket flags between
the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
would then leave a stale pointer there, generating use-after-free
errors when walking through the list or modifying adjacent entries.

BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
Write of size 8 by task syz-executor/10987
CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
Call Trace:
[<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
[<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
[< inline >] print_address_description mm/kasan/report.c:194
[<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
[< inline >] kasan_report mm/kasan/report.c:303
[<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
[< inline >] __write_once_size ./include/linux/compiler.h:249
[< inline >] __hlist_del ./include/linux/list.h:622
[< inline >] hlist_del_init ./include/linux/list.h:637
[<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
[<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff813774f9>] task_work_run+0xf9/0x170
[<ffffffff81324aae>] do_exit+0x85e/0x2a00
[<ffffffff81326dc8>] do_group_exit+0x108/0x330
[<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[<ffffffff811b49af>] do_signal+0x7f/0x18f0
[<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 10987
[ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
[ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
[ 1116.897025] [< inline >] slab_post_alloc_hook mm/slab.h:417
[ 1116.897025] [< inline >] slab_alloc_node mm/slub.c:2708
[ 1116.897025] [< inline >] slab_alloc mm/slub.c:2716
[ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
[ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
[ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
[ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
[ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
[ 1116.897025] [< inline >] sock_create net/socket.c:1193
[ 1116.897025] [< inline >] SYSC_socket net/socket.c:1223
[ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
[ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 10987
[ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
[ 1116.897025] [< inline >] slab_free_hook mm/slub.c:1352
[ 1116.897025] [< inline >] slab_free_freelist_hook mm/slub.c:1374
[ 1116.897025] [< inline >] slab_free mm/slub.c:2951
[ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
[ 1116.897025] [< inline >] sk_prot_free net/core/sock.c:1369
[ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
[ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
[ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
[ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
[ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
[ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
[ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
[ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
[ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
[ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
[ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[ 1116.897025] [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

==================================================================

The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.

Fixes: c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
Reported-by: Baozeng Ding <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Baozeng Ding <[email protected]>
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/l2tp/l2tp_ip.c | 5 +++--
net/l2tp/l2tp_ip6.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)

--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -251,8 +251,6 @@ static int l2tp_ip_bind(struct sock *sk,
int ret;
int chk_addr_ret;

- if (!sock_flag(sk, SOCK_ZAPPED))
- return -EINVAL;
if (addr_len < sizeof(struct sockaddr_l2tpip))
return -EINVAL;
if (addr->l2tp_family != AF_INET)
@@ -267,6 +265,9 @@ static int l2tp_ip_bind(struct sock *sk,
read_unlock_bh(&l2tp_ip_lock);

lock_sock(sk);
+ if (!sock_flag(sk, SOCK_ZAPPED))
+ goto out;
+
if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_l2tpip))
goto out;

--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -266,8 +266,6 @@ static int l2tp_ip6_bind(struct sock *sk
int addr_type;
int err;

- if (!sock_flag(sk, SOCK_ZAPPED))
- return -EINVAL;
if (addr->l2tp_family != AF_INET6)
return -EINVAL;
if (addr_len < sizeof(*addr))
@@ -293,6 +291,9 @@ static int l2tp_ip6_bind(struct sock *sk
lock_sock(sk);

err = -EINVAL;
+ if (!sock_flag(sk, SOCK_ZAPPED))
+ goto out_unlock;
+
if (sk->sk_state != TCP_CLOSE)
goto out_unlock;



2016-12-09 16:19:39

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 11/28] netlink: Call cb->done from a worker thread

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Herbert Xu <[email protected]>


[ Upstream commit 707693c8a498697aa8db240b93eb76ec62e30892 ]

The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion. This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.

Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/netlink/af_netlink.c | 27 +++++++++++++++++++++++----
net/netlink/af_netlink.h | 2 ++
2 files changed, 25 insertions(+), 4 deletions(-)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -924,14 +924,11 @@ static void netlink_skb_set_owner_r(stru
sk_mem_charge(sk, skb->truesize);
}

-static void netlink_sock_destruct(struct sock *sk)
+static void __netlink_sock_destruct(struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);

if (nlk->cb_running) {
- if (nlk->cb.done)
- nlk->cb.done(&nlk->cb);
-
module_put(nlk->cb.module);
kfree_skb(nlk->cb.skb);
}
@@ -960,6 +957,28 @@ static void netlink_sock_destruct(struct
WARN_ON(nlk_sk(sk)->groups);
}

+static void netlink_sock_destruct_work(struct work_struct *work)
+{
+ struct netlink_sock *nlk = container_of(work, struct netlink_sock,
+ work);
+
+ nlk->cb.done(&nlk->cb);
+ __netlink_sock_destruct(&nlk->sk);
+}
+
+static void netlink_sock_destruct(struct sock *sk)
+{
+ struct netlink_sock *nlk = nlk_sk(sk);
+
+ if (nlk->cb_running && nlk->cb.done) {
+ INIT_WORK(&nlk->work, netlink_sock_destruct_work);
+ schedule_work(&nlk->work);
+ return;
+ }
+
+ __netlink_sock_destruct(sk);
+}
+
/* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on
* SMP. Look, when several writers sleep and reader wakes them up, all but one
* immediately hit write lock and grab all the cpus. Exclusive sleep solves
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -3,6 +3,7 @@

#include <linux/rhashtable.h>
#include <linux/atomic.h>
+#include <linux/workqueue.h>
#include <net/sock.h>

#define NLGRPSZ(x) (ALIGN(x, sizeof(unsigned long) * 8) / 8)
@@ -53,6 +54,7 @@ struct netlink_sock {

struct rhash_head node;
struct rcu_head rcu;
+ struct work_struct work;
};

static inline struct netlink_sock *nlk_sk(struct sock *sk)


2016-12-09 16:19:52

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 02/28] net: check dead netns for peernet2id_alloc()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit cfc44a4d147ea605d66ccb917cc24467d15ff867 ]

Andrei reports we still allocate netns ID from idr after we destroy
it in cleanup_net().

cleanup_net():
...
idr_destroy(&net->netns_ids);
...
list_for_each_entry_reverse(ops, &pernet_list, list)
ops_exit_list(ops, &net_exit_list);
-> rollback_registered_many()
-> rtmsg_ifinfo_build_skb()
-> rtnl_fill_ifinfo()
-> peernet2id_alloc()

After that point we should not even access net->netns_ids, we
should check the death of the current netns as early as we can in
peernet2id_alloc().

For net-next we can consider to avoid sending rtmsg totally,
it is a good optimization for netns teardown path.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Andrei Vagin <[email protected]>
Cc: Nicolas Dichtel <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Andrei Vagin <[email protected]>
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/net_namespace.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -217,6 +217,8 @@ int peernet2id_alloc(struct net *net, st
bool alloc;
int id;

+ if (atomic_read(&net->count) == 0)
+ return NETNSA_NSID_NOT_ASSIGNED;
spin_lock_irqsave(&net->nsid_lock, flags);
alloc = atomic_read(&peer->count) == 0 ? false : true;
id = __peernet2id_alloc(net, peer, &alloc);


2016-12-09 16:19:31

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 09/28] net, sched: respect rcu grace period on cls destruction

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>


[ Upstream commit d936377414fadbafb4d17148d222fe45ca5442d4 ]

Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.

The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.

In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.

Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.

Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)

This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.

Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Roi Dayan <[email protected]>
Cc: Jiri Pirko <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/cls_basic.c | 4 ----
net/sched/cls_bpf.c | 4 ----
net/sched/cls_cgroup.c | 7 +++----
net/sched/cls_flow.c | 1 -
net/sched/cls_flower.c | 31 ++++++++++++++++++++++++++-----
net/sched/cls_rsvp.h | 3 ++-
net/sched/cls_tcindex.c | 1 -
7 files changed, 31 insertions(+), 20 deletions(-)

--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -62,9 +62,6 @@ static unsigned long basic_get(struct tc
struct basic_head *head = rtnl_dereference(tp->root);
struct basic_filter *f;

- if (head == NULL)
- return 0UL;
-
list_for_each_entry(f, &head->flist, link) {
if (f->handle == handle) {
l = (unsigned long) f;
@@ -109,7 +106,6 @@ static bool basic_destroy(struct tcf_pro
tcf_unbind_filter(tp, &f->res);
call_rcu(&f->rcu, basic_delete_filter);
}
- RCU_INIT_POINTER(tp->root, NULL);
kfree_rcu(head, rcu);
return true;
}
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -199,7 +199,6 @@ static bool cls_bpf_destroy(struct tcf_p
call_rcu(&prog->rcu, __cls_bpf_delete_prog);
}

- RCU_INIT_POINTER(tp->root, NULL);
kfree_rcu(head, rcu);
return true;
}
@@ -210,9 +209,6 @@ static unsigned long cls_bpf_get(struct
struct cls_bpf_prog *prog;
unsigned long ret = 0UL;

- if (head == NULL)
- return 0UL;
-
list_for_each_entry(prog, &head->plist, link) {
if (prog->handle == handle) {
ret = (unsigned long) prog;
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -130,11 +130,10 @@ static bool cls_cgroup_destroy(struct tc

if (!force)
return false;
-
- if (head) {
- RCU_INIT_POINTER(tp->root, NULL);
+ /* Head can still be NULL due to cls_cgroup_init(). */
+ if (head)
call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
- }
+
return true;
}

--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -583,7 +583,6 @@ static bool flow_destroy(struct tcf_prot
list_del_rcu(&f->list);
call_rcu(&f->rcu, flow_destroy_filter);
}
- RCU_INIT_POINTER(tp->root, NULL);
kfree_rcu(head, rcu);
return true;
}
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -13,6 +13,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/rhashtable.h>
+#include <linux/workqueue.h>

#include <linux/if_ether.h>
#include <linux/in6.h>
@@ -55,7 +56,10 @@ struct cls_fl_head {
bool mask_assigned;
struct list_head filters;
struct rhashtable_params ht_params;
- struct rcu_head rcu;
+ union {
+ struct work_struct work;
+ struct rcu_head rcu;
+ };
};

struct cls_fl_filter {
@@ -165,6 +169,24 @@ static void fl_destroy_filter(struct rcu
kfree(f);
}

+static void fl_destroy_sleepable(struct work_struct *work)
+{
+ struct cls_fl_head *head = container_of(work, struct cls_fl_head,
+ work);
+ if (head->mask_assigned)
+ rhashtable_destroy(&head->ht);
+ kfree(head);
+ module_put(THIS_MODULE);
+}
+
+static void fl_destroy_rcu(struct rcu_head *rcu)
+{
+ struct cls_fl_head *head = container_of(rcu, struct cls_fl_head, rcu);
+
+ INIT_WORK(&head->work, fl_destroy_sleepable);
+ schedule_work(&head->work);
+}
+
static bool fl_destroy(struct tcf_proto *tp, bool force)
{
struct cls_fl_head *head = rtnl_dereference(tp->root);
@@ -177,10 +199,9 @@ static bool fl_destroy(struct tcf_proto
list_del_rcu(&f->list);
call_rcu(&f->rcu, fl_destroy_filter);
}
- RCU_INIT_POINTER(tp->root, NULL);
- if (head->mask_assigned)
- rhashtable_destroy(&head->ht);
- kfree_rcu(head, rcu);
+
+ __module_get(THIS_MODULE);
+ call_rcu(&head->rcu, fl_destroy_rcu);
return true;
}

--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -152,7 +152,8 @@ static int rsvp_classify(struct sk_buff
return -1;
nhptr = ip_hdr(skb);
#endif
-
+ if (unlikely(!head))
+ return -1;
restart:

#if RSVP_DST_LEN == 4
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -503,7 +503,6 @@ static bool tcindex_destroy(struct tcf_p
walker.fn = tcindex_destroy_element;
tcindex_walk(tp, &walker);

- RCU_INIT_POINTER(tp->root, NULL);
call_rcu(&p->rcu, __tcindex_destroy);
return true;
}


2016-12-09 16:20:10

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 10/28] net/sched: pedit: make sure that offset is valid

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Amir Vadai <[email protected]>


[ Upstream commit 95c2027bfeda21a28eb245121e6a249f38d0788e ]

Add a validation function to make sure offset is valid:
1. Not below skb head (could happen when offset is negative).
2. Validate both 'offset' and 'at'.

Signed-off-by: Amir Vadai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/act_pedit.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -104,6 +104,17 @@ static void tcf_pedit_cleanup(struct tc_
kfree(keys);
}

+static bool offset_valid(struct sk_buff *skb, int offset)
+{
+ if (offset > 0 && offset > skb->len)
+ return false;
+
+ if (offset < 0 && -offset > skb_headroom(skb))
+ return false;
+
+ return true;
+}
+
static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
struct tcf_result *res)
{
@@ -130,6 +141,11 @@ static int tcf_pedit(struct sk_buff *skb
if (tkey->offmask) {
char *d, _d;

+ if (!offset_valid(skb, off + tkey->at)) {
+ pr_info("tc filter pedit 'at' offset %d out of bounds\n",
+ off + tkey->at);
+ goto bad;
+ }
d = skb_header_pointer(skb, off + tkey->at, 1,
&_d);
if (!d)
@@ -142,10 +158,10 @@ static int tcf_pedit(struct sk_buff *skb
" offset must be on 32 bit boundaries\n");
goto bad;
}
- if (offset > 0 && offset > skb->len) {
- pr_info("tc filter pedit"
- " offset %d can't exceed pkt length %d\n",
- offset, skb->len);
+
+ if (!offset_valid(skb, off + offset)) {
+ pr_info("tc filter pedit offset %d out of bounds\n",
+ offset);
goto bad;
}



2016-12-09 16:20:37

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 08/28] net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit 76da8706d90d8641eeb9b8e579942ed80b6c0880 ]

In case the link change and EEE is enabled or disabled, always try to
re-negotiate this with the link partner.

Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/dsa/bcm_sf2.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1137,6 +1137,7 @@ static void bcm_sf2_sw_adjust_link(struc
struct phy_device *phydev)
{
struct bcm_sf2_priv *priv = ds_to_priv(ds);
+ struct ethtool_eee *p = &priv->port_sts[port].eee;
u32 id_mode_dis = 0, port_mode;
const char *str = NULL;
u32 reg;
@@ -1211,6 +1212,9 @@ force_link:
reg |= DUPLX_MODE;

core_writel(priv, reg, CORE_STS_OVERRIDE_GMIIP_PORT(port));
+
+ if (!phydev->is_pseudo_fixed_link)
+ p->eee_enabled = bcm_sf2_eee_init(ds, port, phydev);
}

static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,


2016-12-09 16:31:58

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 06/28] rtnetlink: fix FDB size computation

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <[email protected]>


[ Upstream commit f82ef3e10a870acc19fa04f80ef5877eaa26f41e ]

Add missing NDA_VLAN attribute's size.

Fixes: 1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/rtnetlink.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2600,7 +2600,10 @@ nla_put_failure:

static inline size_t rtnl_fdb_nlmsg_size(void)
{
- return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN);
+ return NLMSG_ALIGN(sizeof(struct ndmsg)) +
+ nla_total_size(ETH_ALEN) + /* NDA_LLADDR */
+ nla_total_size(sizeof(u16)) + /* NDA_VLAN */
+ 0;
}

static void rtnl_fdb_notify(struct net_device *dev, u8 *addr, u16 vid, int type)


2016-12-09 16:32:25

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 27/28] esp4: Fix integrity verification when ESN are used

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tobias Brunner <[email protected]>

commit 7c7fedd51c02f4418e8b2eed64bdab601f882aa4 upstream.

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ipv4/esp4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -476,7 +476,7 @@ static int esp_input(struct xfrm_state *
esph = (void *)skb_push(skb, 4);
*seqhi = esph->spi;
esph->spi = esph->seq_no;
- esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.input.hi);
+ esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
aead_request_set_callback(req, 0, esp_input_done_esn, skb);
}



2016-12-09 16:34:07

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 21/28] sparc64: Fix find_node warning if numa node cannot be found

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Tai <[email protected]>


[ Upstream commit 74a5ed5c4f692df2ff0a2313ea71e81243525519 ]

When booting up LDOM, find_node() warns that a physical address
doesn't match a NUMA node.

WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
find_node+0xf4/0x120 find_node: A physical address doesn't
match a NUMA node rule. Some physical memory will be
owned by node 0.Modules linked in:

CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
Call Trace:
[0000000000468ba0] __warn+0xc0/0xe0
[0000000000468c74] warn_slowpath_fmt+0x34/0x60
[00000000004592f4] find_node+0xf4/0x120
[0000000000dd0774] add_node_ranges+0x38/0xe4
[0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
[0000000000dd0e9c] bootmem_init+0xb8/0x160
[0000000000dd174c] paging_init+0x808/0x8fc
[0000000000dcb0d0] setup_arch+0x2c8/0x2f0
[0000000000dc68a0] start_kernel+0x48/0x424
[0000000000dcb374] start_early_boot+0x27c/0x28c
[0000000000a32c08] tlb_fixup_done+0x4c/0x64
[0000000000027f08] 0x27f08

It is because linux use an internal structure node_masks[] to
keep the best memory latency node only. However, LDOM mdesc can
contain single latency-group with multiple memory latency nodes.

If the address doesn't match the best latency node within
node_masks[], it should check for an alternative via mdesc.
The warning message should only be printed if the address
doesn't match any node_masks[] nor within mdesc. To minimize
the impact of searching mdesc every time, the last matched
mask and index is stored in a variable.

Signed-off-by: Thomas Tai <[email protected]>
Reviewed-by: Chris Hyser <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/mm/init_64.c | 65 +++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 61 insertions(+), 4 deletions(-)

--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -800,6 +800,8 @@ struct mdesc_mblock {
};
static struct mdesc_mblock *mblocks;
static int num_mblocks;
+static int find_numa_node_for_addr(unsigned long pa,
+ struct node_mem_mask *pnode_mask);

static unsigned long ra_to_pa(unsigned long addr)
{
@@ -819,6 +821,9 @@ static unsigned long ra_to_pa(unsigned l

static int find_node(unsigned long addr)
{
+ static bool search_mdesc = true;
+ static struct node_mem_mask last_mem_mask = { ~0UL, ~0UL };
+ static int last_index;
int i;

addr = ra_to_pa(addr);
@@ -828,10 +833,27 @@ static int find_node(unsigned long addr)
if ((addr & p->mask) == p->val)
return i;
}
- /* The following condition has been observed on LDOM guests.*/
- WARN_ONCE(1, "find_node: A physical address doesn't match a NUMA node"
- " rule. Some physical memory will be owned by node 0.");
- return 0;
+ /* The following condition has been observed on LDOM guests because
+ * node_masks only contains the best latency mask and value.
+ * LDOM guest's mdesc can contain a single latency group to
+ * cover multiple address range. Print warning message only if the
+ * address cannot be found in node_masks nor mdesc.
+ */
+ if ((search_mdesc) &&
+ ((addr & last_mem_mask.mask) != last_mem_mask.val)) {
+ /* find the available node in the mdesc */
+ last_index = find_numa_node_for_addr(addr, &last_mem_mask);
+ numadbg("find_node: latency group for address 0x%lx is %d\n",
+ addr, last_index);
+ if ((last_index < 0) || (last_index >= num_node_masks)) {
+ /* WARN_ONCE() and use default group 0 */
+ WARN_ONCE(1, "find_node: A physical address doesn't match a NUMA node rule. Some physical memory will be owned by node 0.");
+ search_mdesc = false;
+ last_index = 0;
+ }
+ }
+
+ return last_index;
}

static u64 memblock_nid_range(u64 start, u64 end, int *nid)
@@ -1158,6 +1180,41 @@ int __node_distance(int from, int to)
return numa_latency[from][to];
}

+static int find_numa_node_for_addr(unsigned long pa,
+ struct node_mem_mask *pnode_mask)
+{
+ struct mdesc_handle *md = mdesc_grab();
+ u64 node, arc;
+ int i = 0;
+
+ node = mdesc_node_by_name(md, MDESC_NODE_NULL, "latency-groups");
+ if (node == MDESC_NODE_NULL)
+ goto out;
+
+ mdesc_for_each_node_by_name(md, node, "group") {
+ mdesc_for_each_arc(arc, md, node, MDESC_ARC_TYPE_FWD) {
+ u64 target = mdesc_arc_target(md, arc);
+ struct mdesc_mlgroup *m = find_mlgroup(target);
+
+ if (!m)
+ continue;
+ if ((pa & m->mask) == m->match) {
+ if (pnode_mask) {
+ pnode_mask->mask = m->mask;
+ pnode_mask->val = m->match;
+ }
+ mdesc_release(md);
+ return i;
+ }
+ }
+ i++;
+ }
+
+out:
+ mdesc_release(md);
+ return -1;
+}
+
static int find_best_numa_node_for_mlgroup(struct mdesc_mlgroup *grp)
{
int i;


2016-12-09 16:34:29

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 20/28] sparc32: Fix inverted invalid_frame_pointer checks on sigreturns

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andreas Larsson <[email protected]>


[ Upstream commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa ]

Signed-off-by: Andreas Larsson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/kernel/signal_32.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/sparc/kernel/signal_32.c
+++ b/arch/sparc/kernel/signal_32.c
@@ -89,7 +89,7 @@ asmlinkage void do_sigreturn(struct pt_r
sf = (struct signal_frame __user *) regs->u_regs[UREG_FP];

/* 1. Make sure we are not getting garbage from the user */
- if (!invalid_frame_pointer(sf, sizeof(*sf)))
+ if (invalid_frame_pointer(sf, sizeof(*sf)))
goto segv_and_exit;

if (get_user(ufp, &sf->info.si_regs.u_regs[UREG_FP]))
@@ -150,7 +150,7 @@ asmlinkage void do_rt_sigreturn(struct p

synchronize_user_stack();
sf = (struct rt_signal_frame __user *) regs->u_regs[UREG_FP];
- if (!invalid_frame_pointer(sf, sizeof(*sf)))
+ if (invalid_frame_pointer(sf, sizeof(*sf)))
goto segv;

if (get_user(ufp, &sf->regs.u_regs[UREG_FP]))


2016-12-09 16:35:03

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 17/28] geneve: avoid use-after-free of skb->data

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <[email protected]>


[ Upstream commit 5b01014759991887b1e450c9def01e58c02ab81b ]

geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.

Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <[email protected]>
Reviewed-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/geneve.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -815,7 +815,6 @@ static netdev_tx_t geneve_xmit_skb(struc
struct geneve_dev *geneve = netdev_priv(dev);
struct geneve_sock *gs4 = geneve->sock4;
struct rtable *rt = NULL;
- const struct iphdr *iip; /* interior IP header */
int err = -EINVAL;
struct flowi4 fl4;
__u8 tos, ttl;
@@ -842,8 +841,6 @@ static netdev_tx_t geneve_xmit_skb(struc
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
skb_reset_mac_header(skb);

- iip = ip_hdr(skb);
-
if (info) {
const struct ip_tunnel_key *key = &info->key;
u8 *opts = NULL;
@@ -859,7 +856,7 @@ static netdev_tx_t geneve_xmit_skb(struc
if (unlikely(err))
goto err;

- tos = ip_tunnel_ecn_encap(key->tos, iip, skb);
+ tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
ttl = key->ttl;
df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
} else {
@@ -869,7 +866,7 @@ static netdev_tx_t geneve_xmit_skb(struc
if (unlikely(err))
goto err;

- tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, iip, skb);
+ tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, ip_hdr(skb), skb);
ttl = geneve->ttl;
if (!ttl && IN_MULTICAST(ntohl(fl4.daddr)))
ttl = 1;
@@ -903,7 +900,6 @@ static netdev_tx_t geneve6_xmit_skb(stru
struct geneve_dev *geneve = netdev_priv(dev);
struct geneve_sock *gs6 = geneve->sock6;
struct dst_entry *dst = NULL;
- const struct iphdr *iip; /* interior IP header */
int err = -EINVAL;
struct flowi6 fl6;
__u8 prio, ttl;
@@ -927,8 +923,6 @@ static netdev_tx_t geneve6_xmit_skb(stru
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
skb_reset_mac_header(skb);

- iip = ip_hdr(skb);
-
if (info) {
const struct ip_tunnel_key *key = &info->key;
u8 *opts = NULL;
@@ -945,7 +939,7 @@ static netdev_tx_t geneve6_xmit_skb(stru
if (unlikely(err))
goto err;

- prio = ip_tunnel_ecn_encap(key->tos, iip, skb);
+ prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
ttl = key->ttl;
} else {
udp_csum = false;
@@ -954,7 +948,7 @@ static netdev_tx_t geneve6_xmit_skb(stru
if (unlikely(err))
goto err;

- prio = ip_tunnel_ecn_encap(fl6.flowi6_tos, iip, skb);
+ prio = ip_tunnel_ecn_encap(fl6.flowi6_tos, ip_hdr(skb), skb);
ttl = geneve->ttl;
if (!ttl && ipv6_addr_is_multicast(&fl6.daddr))
ttl = 1;


2016-12-09 16:18:27

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 14/28] packet: fix race condition in packet_set_ring

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Philip Pettersson <[email protected]>


[ Upstream commit 84ac7260236a49c79eede91617700174c2c19b0c ]

When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3572,19 +3572,25 @@ packet_setsockopt(struct socket *sock, i

if (optlen != sizeof(val))
return -EINVAL;
- if (po->rx_ring.pg_vec || po->tx_ring.pg_vec)
- return -EBUSY;
if (copy_from_user(&val, optval, sizeof(val)))
return -EFAULT;
switch (val) {
case TPACKET_V1:
case TPACKET_V2:
case TPACKET_V3:
- po->tp_version = val;
- return 0;
+ break;
default:
return -EINVAL;
}
+ lock_sock(sk);
+ if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) {
+ ret = -EBUSY;
+ } else {
+ po->tp_version = val;
+ ret = 0;
+ }
+ release_sock(sk);
+ return ret;
}
case PACKET_RESERVE:
{
@@ -4067,6 +4073,7 @@ static int packet_set_ring(struct sock *
/* Added to avoid minimal code churn */
struct tpacket_req *req = &req_u->req;

+ lock_sock(sk);
/* Opening a Tx-ring is NOT supported in TPACKET_V3 */
if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) {
WARN(1, "Tx-ring is not supported.\n");
@@ -4148,7 +4155,6 @@ static int packet_set_ring(struct sock *
goto out;
}

- lock_sock(sk);

/* Detach socket from network */
spin_lock(&po->bind_lock);
@@ -4197,11 +4203,11 @@ static int packet_set_ring(struct sock *
if (!tx_ring)
prb_shutdown_retire_blk_timer(po, rb_queue);
}
- release_sock(sk);

if (pg_vec)
free_pg_vec(pg_vec, order, req->tp_block_nr);
out:
+ release_sock(sk);
return err;
}



2016-12-09 16:35:51

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 13/28] net/dccp: fix use-after-free in dccp_invalid_packet

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 648f0c28df282636c0c8a7a19ca3ce5fc80a39c3 ]

pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
in dccp_invalid_packet() or risk use after free.

Bug found by Andrey Konovalov using syzkaller.

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ipv4.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -698,6 +698,7 @@ int dccp_invalid_packet(struct sk_buff *
{
const struct dccp_hdr *dh;
unsigned int cscov;
+ u8 dccph_doff;

if (skb->pkt_type != PACKET_HOST)
return 1;
@@ -719,18 +720,19 @@ int dccp_invalid_packet(struct sk_buff *
/*
* If P.Data Offset is too small for packet type, drop packet and return
*/
- if (dh->dccph_doff < dccp_hdr_len(skb) / sizeof(u32)) {
- DCCP_WARN("P.Data Offset(%u) too small\n", dh->dccph_doff);
+ dccph_doff = dh->dccph_doff;
+ if (dccph_doff < dccp_hdr_len(skb) / sizeof(u32)) {
+ DCCP_WARN("P.Data Offset(%u) too small\n", dccph_doff);
return 1;
}
/*
* If P.Data Offset is too too large for packet, drop packet and return
*/
- if (!pskb_may_pull(skb, dh->dccph_doff * sizeof(u32))) {
- DCCP_WARN("P.Data Offset(%u) too large\n", dh->dccph_doff);
+ if (!pskb_may_pull(skb, dccph_doff * sizeof(u32))) {
+ DCCP_WARN("P.Data Offset(%u) too large\n", dccph_doff);
return 1;
}
-
+ dh = dccp_hdr(skb);
/*
* If P.type is not Data, Ack, or DataAck and P.X == 0 (the packet
* has short sequence numbers), drop packet and return


2016-12-09 16:36:05

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 03/28] ip6_tunnel: disable caching when the traffic class is inherited

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <[email protected]>


[ Upstream commit b5c2d49544e5930c96e2632a7eece3f4325a1888 ]

If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.

The issue is apprently there since at leat Linux-2.6.12-rc2.

Reported-by: Liam McBirnie <[email protected]>
Cc: Liam McBirnie <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_tunnel.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1043,6 +1043,7 @@ static int ip6_tnl_xmit2(struct sk_buff
struct ipv6_tel_txoption opt;
struct dst_entry *dst = NULL, *ndst = NULL;
struct net_device *tdev;
+ bool use_cache = false;
int mtu;
unsigned int max_headroom = sizeof(struct ipv6hdr);
u8 proto;
@@ -1070,7 +1071,15 @@ static int ip6_tnl_xmit2(struct sk_buff

memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
neigh_release(neigh);
- } else if (!fl6->flowi6_mark)
+ } else if (!(t->parms.flags &
+ (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
+ /* enable the cache only only if the routing decision does
+ * not depend on the current inner header value
+ */
+ use_cache = true;
+ }
+
+ if (use_cache)
dst = ip6_tnl_dst_get(t);

if (!ip6_tnl_xmit_ctl(t, &fl6->saddr, &fl6->daddr))
@@ -1134,7 +1143,7 @@ static int ip6_tnl_xmit2(struct sk_buff
skb = new_skb;
}

- if (!fl6->flowi6_mark && ndst)
+ if (use_cache && ndst)
ip6_tnl_dst_set(t, ndst);
skb_dst_set(skb, dst);



2016-12-09 18:23:00

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/28] 4.4.38-stable review

On 12/09/2016 09:17 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.38 release.
> There are 28 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Dec 11 16:17:32 UTC 2016.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.38-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America(Silicon Valley)
[email protected]

2016-12-09 22:35:24

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/28] 4.4.38-stable review

On Fri, Dec 09, 2016 at 05:17:42PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.38 release.
> There are 28 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Dec 11 16:17:32 UTC 2016.
> Anything received after that time might be too late.
>

Build results:
total: 149 pass: 149 fail: 0
Qemu test results:
total: 115 pass: 115 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter