2017-08-09 19:42:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 00/58] 4.4.81-stable review

This is the start of the stable review cycle for the 4.4.81 release.
There are 58 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.81-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.81-rc1

Michal Kubeček <[email protected]>
net: account for current skb length when deciding about UFO

zheng li <[email protected]>
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

Ard Biesheuvel <[email protected]>
mm: don't dereference struct page fields of invalid pages

Jamie Iles <[email protected]>
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.

Sudip Mukherjee <[email protected]>
lib/Kconfig.debug: fix frv build failure

Michal Hocko <[email protected]>
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER

Rabin Vincent <[email protected]>
ARM: 8632/1: ftrace: fix syscall name matching

Omar Sandoval <[email protected]>
virtio_blk: fix panic in initialization error path

Gerd Hoffmann <[email protected]>
drm/virtio: fix framebuffer sparse warning

Milan P. Gandhi <[email protected]>
scsi: qla2xxx: Get mutex lock before checking optrom_state

Zefir Kurtisi <[email protected]>
phy state machine: failsafe leave invalid RUNNING state

Nicholas Mc Guire <[email protected]>
x86/boot: Add missing declaration of string functions

Michael Chan <[email protected]>
tg3: Fix race condition in tg3_get_stats64().

Grygorii Strashko <[email protected]>
net: phy: dp83867: fix irq generation

Sergei Shtylyov <[email protected]>
sh_eth: R8A7740 supports packet shecksumming

Arnd Bergmann <[email protected]>
wext: handle NULL extra data in iwe_stream_add_point better

Rob Gardner <[email protected]>
sparc64: Prevent perf from running during super critical sections

Jane Chu <[email protected]>
sparc64: Measure receiver forward progress to avoid send mondo timeout

Wei Liu <[email protected]>
xen-netback: correctly schedule rate-limited queues

Florian Fainelli <[email protected]>
net: phy: Correctly process PHY_HALTED in phy_stop_machine()

Moshe Shemesh <[email protected]>
net/mlx5: Fix command bad flow on command entry allocation failure

Xin Long <[email protected]>
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors

Alexander Potapenko <[email protected]>
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()

Xin Long <[email protected]>
dccp: fix a memleak for dccp_feat_init err process

Xin Long <[email protected]>
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly

Xin Long <[email protected]>
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly

Marc Gonzalez <[email protected]>
net: ethernet: nb8800: Handle all 4 RGMII modes identically

Stefano Brivio <[email protected]>
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()

WANG Cong <[email protected]>
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()

Liping Zhang <[email protected]>
openvswitch: fix potential out of bound access in parse_ct

Thomas Jarosch <[email protected]>
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled

WANG Cong <[email protected]>
rtnetlink: allocate more memory for dev_set_mac_address()

Mahesh Bandewar <[email protected]>
ipv4: initialize fib_trie prior to register_netdev_notifier call.

Sabrina Dubroca <[email protected]>
ipv6: avoid overflow of offset in ip6_find_1stfragopt

David S. Miller <[email protected]>
net: Zero terminate ifr_name in dev_ifname().

Alexander Potapenko <[email protected]>
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()

Steven Toth <[email protected]>
saa7164: fix double fetch PCIe access condition

Greg Kroah-Hartman <[email protected]>
drm: rcar-du: fix backport bug

Jin Qian <[email protected]>
f2fs: sanity check checkpoint segno and blkoff

Sean Young <[email protected]>
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds

Mel Gorman <[email protected]>
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

Nicholas Bellinger <[email protected]>
iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done

Nicholas Bellinger <[email protected]>
iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP

Nicholas Bellinger <[email protected]>
iscsi-target: Fix initial login PDU asynchronous socket close OOPs

Nicholas Bellinger <[email protected]>
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race

Jiang Yi <[email protected]>
iscsi-target: Always wait for kthread_should_stop() before kthread exit

Nicholas Bellinger <[email protected]>
target: Avoid mappedlun symlink creation during lun shutdown

Prabhakar Lad <[email protected]>
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl

Gregory CLEMENT <[email protected]>
ARM: dts: armada-38x: Fix irq type for pca955

Jerry Lee <[email protected]>
ext4: fix overflow caused by missing cast in ext4_resize_fs()

Jan Kara <[email protected]>
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize

Josh Poimboeuf <[email protected]>
mm/page_alloc: Remove kernel address exposure in free_reserved_area()

Wanpeng Li <[email protected]>
KVM: async_pf: make rcu irq exit if not triggered from idle task

Banajit Goswami <[email protected]>
ASoC: do not close shared backend dailink

Sergei A. Trusov <[email protected]>
ALSA: hda - Fix speaker output from VAIO VPCL14M1R

Tejun Heo <[email protected]>
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

Dan Carpenter <[email protected]>
libata: array underflow in ata_find_dev()

Helge Deller <[email protected]>
parisc: Increase thread and stack size to 32kb


-------------

Diffstat:

Makefile | 4 +-
arch/arm/boot/dts/armada-388-gp.dts | 4 +-
arch/arm/include/asm/ftrace.h | 18 +++
arch/parisc/include/asm/thread_info.h | 2 +-
arch/parisc/kernel/irq.c | 2 +-
arch/sparc/include/asm/mmu_context_64.h | 12 +-
arch/sparc/include/asm/trap_block.h | 1 +
arch/sparc/kernel/smp_64.c | 185 ++++++++++++++---------
arch/sparc/kernel/sun4v_ivec.S | 15 ++
arch/sparc/kernel/traps_64.c | 1 +
arch/sparc/kernel/tsb.S | 12 ++
arch/sparc/power/hibernate.c | 3 +-
arch/x86/boot/string.c | 1 +
arch/x86/boot/string.h | 9 ++
arch/x86/kernel/kvm.c | 6 +-
drivers/ata/libata-scsi.c | 6 +-
drivers/block/virtio_blk.c | 3 +-
drivers/gpu/drm/rcar-du/rcar_du_drv.c | 2 +-
drivers/gpu/drm/virtio/virtgpu_fb.c | 2 +-
drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
drivers/media/pci/saa7164/saa7164-bus.c | 13 +-
drivers/media/platform/davinci/vpfe_capture.c | 22 +--
drivers/media/rc/ir-lirc-codec.c | 2 +-
drivers/net/ethernet/aurora/nb8800.c | 9 +-
drivers/net/ethernet/broadcom/tg3.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 19 ++-
drivers/net/ethernet/renesas/sh_eth.c | 1 +
drivers/net/irda/mcs7780.c | 16 +-
drivers/net/phy/dp83867.c | 10 ++
drivers/net/phy/phy.c | 12 ++
drivers/net/xen-netback/common.h | 1 +
drivers/net/xen-netback/interface.c | 6 +-
drivers/net/xen-netback/netback.c | 6 +-
drivers/scsi/qla2xxx/qla_attr.c | 18 ++-
drivers/target/iscsi/iscsi_target.c | 38 ++++-
drivers/target/iscsi/iscsi_target_erl0.c | 6 +-
drivers/target/iscsi/iscsi_target_erl0.h | 2 +-
drivers/target/iscsi/iscsi_target_login.c | 4 +
drivers/target/iscsi/iscsi_target_nego.c | 208 +++++++++++++++++---------
drivers/target/target_core_fabric_configfs.c | 5 +
drivers/target/target_core_tpg.c | 4 +
fs/ext4/file.c | 3 +
fs/ext4/resize.c | 3 +-
fs/f2fs/super.c | 16 ++
include/linux/mm_types.h | 4 +
include/linux/sched.h | 10 ++
include/linux/slab.h | 4 +-
include/net/iw_handler.h | 3 +-
include/net/sctp/sctp.h | 4 +
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 1 +
kernel/signal.c | 4 +-
kernel/workqueue.c | 10 ++
lib/Kconfig.debug | 2 +-
mm/internal.h | 5 +-
mm/memory.c | 1 +
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
mm/page_alloc.c | 10 +-
mm/rmap.c | 36 +++++
net/core/dev_ioctl.c | 1 +
net/core/rtnetlink.c | 3 +-
net/dccp/feat.c | 7 +-
net/dccp/ipv4.c | 1 +
net/dccp/ipv6.c | 1 +
net/ipv4/fib_frontend.c | 9 +-
net/ipv4/ip_output.c | 3 +-
net/ipv4/syncookies.c | 1 +
net/ipv6/ip6_output.c | 6 +-
net/ipv6/output_core.c | 8 +-
net/ipv6/syncookies.c | 1 +
net/openvswitch/conntrack.c | 7 +-
net/packet/af_packet.c | 2 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/soc-pcm.c | 4 +
75 files changed, 618 insertions(+), 251 deletions(-)



2017-08-09 19:42:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 01/58] parisc: Increase thread and stack size to 32kb

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Helge Deller <[email protected]>

commit 8f8201dfed91a43ac38c899c82f81eef3d36afd9 upstream.

Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.

Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/parisc/include/asm/thread_info.h | 2 +-
arch/parisc/kernel/irq.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

--- a/arch/parisc/include/asm/thread_info.h
+++ b/arch/parisc/include/asm/thread_info.h
@@ -34,7 +34,7 @@ struct thread_info {

/* thread information allocation */

-#define THREAD_SIZE_ORDER 2 /* PA-RISC requires at least 16k stack */
+#define THREAD_SIZE_ORDER 3 /* PA-RISC requires at least 32k stack */
/* Be sure to hunt all references to this down when you change the size of
* the kernel stack */
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -380,7 +380,7 @@ static inline int eirr_to_irq(unsigned l
/*
* IRQ STACK - used for irq handler
*/
-#define IRQ_STACK_SIZE (4096 << 2) /* 16k irq stack size */
+#define IRQ_STACK_SIZE (4096 << 3) /* 32k irq stack size */

union irq_stack_union {
unsigned long stack[IRQ_STACK_SIZE/sizeof(unsigned long)];


2017-08-09 19:42:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 13/58] iscsi-target: Always wait for kthread_should_stop() before kthread exit

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiang Yi <[email protected]>

commit 5e0cf5e6c43b9e19fc0284f69e5cd2b4a47523b0 upstream.

There are three timing problems in the kthread usages of iscsi_target_mod:

- np_thread of struct iscsi_np
- rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

send_sig(SIGINT, conn->tx_thread, 1);
kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/target/iscsi/iscsi_target.c | 28 ++++++++++++++++++++++------
drivers/target/iscsi/iscsi_target_erl0.c | 6 +++++-
drivers/target/iscsi/iscsi_target_erl0.h | 2 +-
drivers/target/iscsi/iscsi_target_login.c | 4 ++++
4 files changed, 32 insertions(+), 8 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -3965,6 +3965,8 @@ int iscsi_target_tx_thread(void *arg)
{
int ret = 0;
struct iscsi_conn *conn = arg;
+ bool conn_freed = false;
+
/*
* Allow ourselves to be interrupted by SIGINT so that a
* connection recovery / failure event can be triggered externally.
@@ -3990,12 +3992,14 @@ get_immediate:
goto transport_err;

ret = iscsit_handle_response_queue(conn);
- if (ret == 1)
+ if (ret == 1) {
goto get_immediate;
- else if (ret == -ECONNRESET)
+ } else if (ret == -ECONNRESET) {
+ conn_freed = true;
goto out;
- else if (ret < 0)
+ } else if (ret < 0) {
goto transport_err;
+ }
}

transport_err:
@@ -4005,8 +4009,13 @@ transport_err:
* responsible for cleaning up the early connection failure.
*/
if (conn->conn_state != TARG_CONN_STATE_IN_LOGIN)
- iscsit_take_action_for_connection_exit(conn);
+ iscsit_take_action_for_connection_exit(conn, &conn_freed);
out:
+ if (!conn_freed) {
+ while (!kthread_should_stop()) {
+ msleep(100);
+ }
+ }
return 0;
}

@@ -4105,6 +4114,7 @@ int iscsi_target_rx_thread(void *arg)
u32 checksum = 0, digest = 0;
struct iscsi_conn *conn = arg;
struct kvec iov;
+ bool conn_freed = false;
/*
* Allow ourselves to be interrupted by SIGINT so that a
* connection recovery / failure event can be triggered externally.
@@ -4116,7 +4126,7 @@ int iscsi_target_rx_thread(void *arg)
*/
rc = wait_for_completion_interruptible(&conn->rx_login_comp);
if (rc < 0 || iscsi_target_check_conn_state(conn))
- return 0;
+ goto out;

if (conn->conn_transport->transport_type == ISCSI_INFINIBAND) {
struct completion comp;
@@ -4201,7 +4211,13 @@ int iscsi_target_rx_thread(void *arg)
transport_err:
if (!signal_pending(current))
atomic_set(&conn->transport_failed, 1);
- iscsit_take_action_for_connection_exit(conn);
+ iscsit_take_action_for_connection_exit(conn, &conn_freed);
+out:
+ if (!conn_freed) {
+ while (!kthread_should_stop()) {
+ msleep(100);
+ }
+ }
return 0;
}

--- a/drivers/target/iscsi/iscsi_target_erl0.c
+++ b/drivers/target/iscsi/iscsi_target_erl0.c
@@ -930,8 +930,10 @@ static void iscsit_handle_connection_cle
}
}

-void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn)
+void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn, bool *conn_freed)
{
+ *conn_freed = false;
+
spin_lock_bh(&conn->state_lock);
if (atomic_read(&conn->connection_exit)) {
spin_unlock_bh(&conn->state_lock);
@@ -942,6 +944,7 @@ void iscsit_take_action_for_connection_e
if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
spin_unlock_bh(&conn->state_lock);
iscsit_close_connection(conn);
+ *conn_freed = true;
return;
}

@@ -955,4 +958,5 @@ void iscsit_take_action_for_connection_e
spin_unlock_bh(&conn->state_lock);

iscsit_handle_connection_cleanup(conn);
+ *conn_freed = true;
}
--- a/drivers/target/iscsi/iscsi_target_erl0.h
+++ b/drivers/target/iscsi/iscsi_target_erl0.h
@@ -9,6 +9,6 @@ extern int iscsit_stop_time2retain_timer
extern void iscsit_connection_reinstatement_rcfr(struct iscsi_conn *);
extern void iscsit_cause_connection_reinstatement(struct iscsi_conn *, int);
extern void iscsit_fall_back_to_erl0(struct iscsi_session *);
-extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *);
+extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *, bool *);

#endif /*** ISCSI_TARGET_ERL0_H ***/
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -1436,5 +1436,9 @@ int iscsi_target_login_thread(void *arg)
break;
}

+ while (!kthread_should_stop()) {
+ msleep(100);
+ }
+
return 0;
}


2017-08-09 19:42:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 17/58] iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit fce50a2fa4e9c6e103915c351b6d4a98661341d6 upstream.

This patch fixes a NULL pointer dereference in isert_login_recv_done()
of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error()
resetting isert_conn->cm_id = NULL during a failed login attempt.

As per Sagi, we will always see the completion of all recv wrs posted
on the qp (given that we assigned a ->done handler), this is a FLUSH
error completion, we just don't get to verify that because we deref
NULL before.

The issue here, was the assumption that dereferencing the connection
cm_id is always safe, which is not true since:

commit 4a579da2586bd3b79b025947ea24ede2bbfede62
Author: Sagi Grimberg <[email protected]>
Date: Sun Mar 29 15:52:04 2015 +0300

iser-target: Fix possible deadlock in RDMA_CM connection error

As I see it, we have a direct reference to the isert_device from
isert_conn which is the one-liner fix that we actually need like
we do in isert_rdma_read_done() and isert_rdma_write_done().

Reported-by: Andrea Righi <[email protected]>
Tested-by: Andrea Righi <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1581,7 +1581,7 @@ isert_rcv_completion(struct iser_rx_desc
struct isert_conn *isert_conn,
u32 xfer_len)
{
- struct ib_device *ib_dev = isert_conn->cm_id->device;
+ struct ib_device *ib_dev = isert_conn->device->ib_device;
struct iscsi_hdr *hdr;
u64 rx_dma;
int rx_buflen;


2017-08-09 19:42:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 02/58] libata: array underflow in ata_find_dev()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit 59a5e266c3f5c1567508888dd61a45b86daed0fa upstream.

My static checker complains that "devno" can be negative, meaning that
we read before the start of the loop. I've looked at the code, and I
think the warning is right. This come from /proc so it's root only or
it would be quite a quite a serious bug. The call tree looks like this:

proc_scsi_write() <- gets id and channel from simple_strtoul()
-> scsi_add_single_device() <- calls shost->transportt->user_scan()
-> ata_scsi_user_scan()
-> ata_find_dev()

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ata/libata-scsi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2832,10 +2832,12 @@ static unsigned int atapi_xlat(struct at
static struct ata_device *ata_find_dev(struct ata_port *ap, int devno)
{
if (!sata_pmp_attached(ap)) {
- if (likely(devno < ata_link_max_devices(&ap->link)))
+ if (likely(devno >= 0 &&
+ devno < ata_link_max_devices(&ap->link)))
return &ap->link.device[devno];
} else {
- if (likely(devno < ap->nr_pmp_links))
+ if (likely(devno >= 0 &&
+ devno < ap->nr_pmp_links))
return &ap->pmp_link[devno].device[0];
}



2017-08-09 19:42:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 19/58] media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sean Young <[email protected]>

commit 9f5039ba440e499d85c29b1ddbc3cbc9dc90e44b upstream.

Since commit e8f4818895b3 ("[media] lirc: advertise
LIRC_CAN_GET_REC_RESOLUTION and improve") lircd uses the ioctl
LIRC_GET_REC_RESOLUTION to determine the shortest pulse or space that
the hardware can detect. This breaks decoding in lirc because lircd
expects the answer in microseconds, but nanoseconds is returned.

Reported-by: Derek <[email protected]>
Tested-by: Derek <[email protected]>
Signed-off-by: Sean Young <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/media/rc/ir-lirc-codec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/media/rc/ir-lirc-codec.c
+++ b/drivers/media/rc/ir-lirc-codec.c
@@ -254,7 +254,7 @@ static long ir_lirc_ioctl(struct file *f
return 0;

case LIRC_GET_REC_RESOLUTION:
- val = dev->rx_resolution;
+ val = dev->rx_resolution / 1000;
break;

case LIRC_SET_WIDEBAND_RECEIVER:


2017-08-09 19:42:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 23/58] ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <[email protected]>


[ Upstream commit 18bcf2907df935981266532e1e0d052aff2e6fae ]

KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():

==================================================================
BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
skb_set_hash_from_sk ./include/net/sock.h:2011
tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307
__run_timers+0xc13/0xf10 kernel/time/timer.c:1601
run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
</IRQ>
poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
SYSC_select+0x4b4/0x4e0 fs/select.c:653
SyS_select+0x76/0xa0 fs/select.c:634
entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4597e7
RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
chained origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_save_stack mm/kmsan/kmsan.c:317
kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
__msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
reqsk_alloc ./include/net/request_sock.h:87
inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
==================================================================

Similar error is reported for cookie_v4_check().

Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/syncookies.c | 1 +
net/ipv6/syncookies.c | 1 +
2 files changed, 2 insertions(+)

--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -337,6 +337,7 @@ struct sock *cookie_v4_check(struct sock
treq = tcp_rsk(req);
treq->rcv_isn = ntohl(th->seq) - 1;
treq->snt_isn = cookie;
+ treq->txhash = net_tx_rndhash();
req->mss = mss;
ireq->ir_num = ntohs(th->dest);
ireq->ir_rmt_port = th->source;
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -210,6 +210,7 @@ struct sock *cookie_v6_check(struct sock
treq->snt_synack.v64 = 0;
treq->rcv_isn = ntohl(th->seq) - 1;
treq->snt_isn = cookie;
+ treq->txhash = net_tx_rndhash();

/*
* We need to lookup the dst_entry to get the correct window size.


2017-08-09 19:42:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 12/58] target: Avoid mappedlun symlink creation during lun shutdown

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 49cb77e297dc611a1b795cfeb79452b3002bd331 upstream.

This patch closes a race between se_lun deletion during configfs
unlink in target_fabric_port_unlink() -> core_dev_del_lun()
-> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
waiting for percpu_ref RCU grace period to finish, but a new
NodeACL mappedlun is added before the RCU grace period has
completed.

This can happen in target_fabric_mappedlun_link() because it
only checks for se_lun->lun_se_dev, which is not cleared until
after transport_clear_lun_ref() percpu_ref RCU grace period
finishes.

This bug originally manifested as NULL pointer dereference
OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
v4.1.y code, because it dereferences lun->lun_se_dev without
a explicit NULL pointer check.

In post v4.1 code with target-core RCU conversion, the code
in target_stat_scsi_att_intr_port_show_attr_dev() no longer
uses se_lun->lun_se_dev, but the same race still exists.

To address the bug, go ahead and set se_lun>lun_shutdown as
early as possible in core_tpg_remove_lun(), and ensure new
NodeACL mappedlun creation in target_fabric_mappedlun_link()
fails during se_lun shutdown.

Reported-by: James Shen <[email protected]>
Cc: James Shen <[email protected]>
Tested-by: James Shen <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/target/target_core_fabric_configfs.c | 5 +++++
drivers/target/target_core_tpg.c | 4 ++++
include/target/target_core_base.h | 1 +
3 files changed, 10 insertions(+)

--- a/drivers/target/target_core_fabric_configfs.c
+++ b/drivers/target/target_core_fabric_configfs.c
@@ -92,6 +92,11 @@ static int target_fabric_mappedlun_link(
pr_err("Source se_lun->lun_se_dev does not exist\n");
return -EINVAL;
}
+ if (lun->lun_shutdown) {
+ pr_err("Unable to create mappedlun symlink because"
+ " lun->lun_shutdown=true\n");
+ return -EINVAL;
+ }
se_tpg = lun->lun_tpg;

nacl_ci = &lun_acl_ci->ci_parent->ci_group->cg_item;
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -673,6 +673,8 @@ void core_tpg_remove_lun(
*/
struct se_device *dev = rcu_dereference_raw(lun->lun_se_dev);

+ lun->lun_shutdown = true;
+
core_clear_lun_from_tpg(lun, tpg);
/*
* Wait for any active I/O references to percpu se_lun->lun_ref to
@@ -694,6 +696,8 @@ void core_tpg_remove_lun(
}
if (!(dev->se_hba->hba_flags & HBA_FLAGS_INTERNAL_USE))
hlist_del_rcu(&lun->link);
+
+ lun->lun_shutdown = false;
mutex_unlock(&tpg->tpg_lun_mutex);

percpu_ref_exit(&lun->lun_ref);
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -714,6 +714,7 @@ struct se_lun {
#define SE_LUN_LINK_MAGIC 0xffff7771
u32 lun_link_magic;
u32 lun_access;
+ bool lun_shutdown;
u32 lun_index;

/* RELATIVE TARGET PORT IDENTIFER */


2017-08-09 19:43:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <[email protected]>

commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.

Stable note for 4.4: The upstream patch patches madvise(MADV_FREE) but 4.4
does not have support for that feature. The changelog is left
as-is but the hunk related to madvise is omitted from the backport.

Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]

user writes using cached RW PTE
...

try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[[email protected]: tweak comments]
[[email protected]: fix spello]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Nadav Amit <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/mm_types.h | 4 ++++
mm/internal.h | 5 ++++-
mm/memory.c | 1 +
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
mm/rmap.c | 36 ++++++++++++++++++++++++++++++++++++
6 files changed, 47 insertions(+), 1 deletion(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -504,6 +504,10 @@ struct mm_struct {
*/
bool tlb_flush_pending;
#endif
+#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+ /* See flush_tlb_batched_pending() */
+ bool tlb_flush_batched;
+#endif
struct uprobes_state uprobes_state;
#ifdef CONFIG_X86_INTEL_MPX
/* address of the bounds directory */
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -453,6 +453,7 @@ struct tlbflush_unmap_batch;
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
void try_to_unmap_flush(void);
void try_to_unmap_flush_dirty(void);
+void flush_tlb_batched_pending(struct mm_struct *mm);
#else
static inline void try_to_unmap_flush(void)
{
@@ -460,6 +461,8 @@ static inline void try_to_unmap_flush(vo
static inline void try_to_unmap_flush_dirty(void)
{
}
-
+static inline void flush_tlb_batched_pending(struct mm_struct *mm)
+{
+}
#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */
#endif /* __MM_INTERNAL_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1127,6 +1127,7 @@ again:
init_rss_vec(rss);
start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte = start_pte;
+ flush_tlb_batched_pending(mm);
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = *pte;
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -72,6 +72,7 @@ static unsigned long change_pte_range(st
if (!pte)
return 0;

+ flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
do {
oldpte = *pte;
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -135,6 +135,7 @@ static void move_ptes(struct vm_area_str
new_ptl = pte_lockptr(mm, new_pmd);
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+ flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();

for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -649,6 +649,13 @@ static void set_tlb_ubc_flush_pending(st
tlb_ubc->flush_required = true;

/*
+ * Ensure compiler does not re-order the setting of tlb_flush_batched
+ * before the PTE is cleared.
+ */
+ barrier();
+ mm->tlb_flush_batched = true;
+
+ /*
* If the PTE was dirty then it's best to assume it's writable. The
* caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
* before the page is queued for IO.
@@ -675,6 +682,35 @@ static bool should_defer_flush(struct mm

return should_defer;
}
+
+/*
+ * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
+ * releasing the PTL if TLB flushes are batched. It's possible for a parallel
+ * operation such as mprotect or munmap to race between reclaim unmapping
+ * the page and flushing the page. If this race occurs, it potentially allows
+ * access to data via a stale TLB entry. Tracking all mm's that have TLB
+ * batching in flight would be expensive during reclaim so instead track
+ * whether TLB batching occurred in the past and if so then do a flush here
+ * if required. This will cost one additional flush per reclaim cycle paid
+ * by the first operation at risk such as mprotect and mumap.
+ *
+ * This must be called under the PTL so that an access to tlb_flush_batched
+ * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
+ * via the PTL.
+ */
+void flush_tlb_batched_pending(struct mm_struct *mm)
+{
+ if (mm->tlb_flush_batched) {
+ flush_tlb_mm(mm);
+
+ /*
+ * Do not allow the compiler to re-order the clearing of
+ * tlb_flush_batched before the tlb is flushed.
+ */
+ barrier();
+ mm->tlb_flush_batched = false;
+ }
+}
#else
static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
struct page *page, bool writable)


2017-08-09 19:43:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 15/58] iscsi-target: Fix initial login PDU asynchronous socket close OOPs

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 25cdda95fda78d22d44157da15aa7ea34be3c804 upstream.

This patch fixes a OOPs originally introduced by:

commit bb048357dad6d604520c91586334c9c230366a14
Author: Nicholas Bellinger <[email protected]>
Date: Thu Sep 5 14:54:04 2013 -0700

iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running. For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed. For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Tested-by: Mike Christie <[email protected]>
Cc: Mike Christie <[email protected]>
Reported-by: Hannes Reinecke <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Varun Prakash <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/target/iscsi/iscsi_target_nego.c | 204 ++++++++++++++++++++-----------
include/target/iscsi/iscsi_target_core.h | 1
2 files changed, 138 insertions(+), 67 deletions(-)

--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -489,14 +489,60 @@ static void iscsi_target_restore_sock_ca

static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *);

-static bool iscsi_target_sk_state_check(struct sock *sk)
+static bool __iscsi_target_sk_check_close(struct sock *sk)
{
if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) {
- pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE,"
+ pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,"
"returning FALSE\n");
- return false;
+ return true;
}
- return true;
+ return false;
+}
+
+static bool iscsi_target_sk_check_close(struct iscsi_conn *conn)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ state = (__iscsi_target_sk_check_close(sk) ||
+ test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+ read_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
+}
+
+static bool iscsi_target_sk_check_flag(struct iscsi_conn *conn, unsigned int flag)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ state = test_bit(flag, &conn->login_flags);
+ read_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
+}
+
+static bool iscsi_target_sk_check_and_clear(struct iscsi_conn *conn, unsigned int flag)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ write_lock_bh(&sk->sk_callback_lock);
+ state = (__iscsi_target_sk_check_close(sk) ||
+ test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+ if (!state)
+ clear_bit(flag, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
}

static void iscsi_target_login_drop(struct iscsi_conn *conn, struct iscsi_login *login)
@@ -536,6 +582,20 @@ static void iscsi_target_do_login_rx(str

pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n",
conn, current->comm, current->pid);
+ /*
+ * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready()
+ * before initial PDU processing in iscsi_target_start_negotiation()
+ * has completed, go ahead and retry until it's cleared.
+ *
+ * Otherwise if the TCP connection drops while this is occuring,
+ * iscsi_target_start_negotiation() will detect the failure, call
+ * cancel_delayed_work_sync(&conn->login_work), and cleanup the
+ * remaining iscsi connection resources from iscsi_np process context.
+ */
+ if (iscsi_target_sk_check_flag(conn, LOGIN_FLAGS_INITIAL_PDU)) {
+ schedule_delayed_work(&conn->login_work, msecs_to_jiffies(10));
+ return;
+ }

spin_lock(&tpg->tpg_state_lock);
state = (tpg->tpg_state == TPG_STATE_ACTIVE);
@@ -543,26 +603,12 @@ static void iscsi_target_do_login_rx(str

if (!state) {
pr_debug("iscsi_target_do_login_rx: tpg_state != TPG_STATE_ACTIVE\n");
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
+ goto err;
}

- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
-
- read_lock_bh(&sk->sk_callback_lock);
- state = iscsi_target_sk_state_check(sk);
- read_unlock_bh(&sk->sk_callback_lock);
-
- if (!state) {
- pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
- }
+ if (iscsi_target_sk_check_close(conn)) {
+ pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
+ goto err;
}

conn->login_kworker = current;
@@ -580,34 +626,29 @@ static void iscsi_target_do_login_rx(str
flush_signals(current);
conn->login_kworker = NULL;

- if (rc < 0) {
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
- }
+ if (rc < 0)
+ goto err;

pr_debug("iscsi_target_do_login_rx after rx_login_io, %p, %s:%d\n",
conn, current->comm, current->pid);

rc = iscsi_target_do_login(conn, login);
if (rc < 0) {
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
+ goto err;
} else if (!rc) {
- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
-
- write_lock_bh(&sk->sk_callback_lock);
- clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags);
- write_unlock_bh(&sk->sk_callback_lock);
- }
+ if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_READ_ACTIVE))
+ goto err;
} else if (rc == 1) {
iscsi_target_nego_release(conn);
iscsi_post_login_handler(np, conn, zero_tsih);
iscsit_deaccess_np(np, tpg, tpg_np);
}
+ return;
+
+err:
+ iscsi_target_restore_sock_callbacks(conn);
+ iscsi_target_login_drop(conn, login);
+ iscsit_deaccess_np(np, tpg, tpg_np);
}

static void iscsi_target_do_cleanup(struct work_struct *work)
@@ -655,31 +696,54 @@ static void iscsi_target_sk_state_change
orig_state_change(sk);
return;
}
+ state = __iscsi_target_sk_check_close(sk);
+ pr_debug("__iscsi_target_sk_close_change: state: %d\n", state);
+
if (test_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change"
" conn: %p\n", conn);
+ if (state)
+ set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
write_unlock_bh(&sk->sk_callback_lock);
orig_state_change(sk);
return;
}
- if (test_and_set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
+ if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
pr_debug("Got LOGIN_FLAGS_CLOSED=1 sk_state_change conn: %p\n",
conn);
write_unlock_bh(&sk->sk_callback_lock);
orig_state_change(sk);
return;
}
+ /*
+ * If the TCP connection has dropped, go ahead and set LOGIN_FLAGS_CLOSED,
+ * but only queue conn->login_work -> iscsi_target_do_login_rx()
+ * processing if LOGIN_FLAGS_INITIAL_PDU has already been cleared.
+ *
+ * When iscsi_target_do_login_rx() runs, iscsi_target_sk_check_close()
+ * will detect the dropped TCP connection from delayed workqueue context.
+ *
+ * If LOGIN_FLAGS_INITIAL_PDU is still set, which means the initial
+ * iscsi_target_start_negotiation() is running, iscsi_target_do_login()
+ * via iscsi_target_sk_check_close() or iscsi_target_start_negotiation()
+ * via iscsi_target_sk_check_and_clear() is responsible for detecting the
+ * dropped TCP connection in iscsi_np process context, and cleaning up
+ * the remaining iscsi connection resources.
+ */
+ if (state) {
+ pr_debug("iscsi_target_sk_state_change got failed state\n");
+ set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
+ state = test_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);

- state = iscsi_target_sk_state_check(sk);
- write_unlock_bh(&sk->sk_callback_lock);
-
- pr_debug("iscsi_target_sk_state_change: state: %d\n", state);
+ orig_state_change(sk);

- if (!state) {
- pr_debug("iscsi_target_sk_state_change got failed state\n");
- schedule_delayed_work(&conn->login_cleanup_work, 0);
+ if (!state)
+ schedule_delayed_work(&conn->login_work, 0);
return;
}
+ write_unlock_bh(&sk->sk_callback_lock);
+
orig_state_change(sk);
}

@@ -944,6 +1008,15 @@ static int iscsi_target_do_login(struct
if (iscsi_target_handle_csg_one(conn, login) < 0)
return -1;
if (login_rsp->flags & ISCSI_FLAG_LOGIN_TRANSIT) {
+ /*
+ * Check to make sure the TCP connection has not
+ * dropped asynchronously while session reinstatement
+ * was occuring in this kthread context, before
+ * transitioning to full feature phase operation.
+ */
+ if (iscsi_target_sk_check_close(conn))
+ return -1;
+
login->tsih = conn->sess->tsih;
login->login_complete = 1;
iscsi_target_restore_sock_callbacks(conn);
@@ -970,21 +1043,6 @@ static int iscsi_target_do_login(struct
break;
}

- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
- bool state;
-
- read_lock_bh(&sk->sk_callback_lock);
- state = iscsi_target_sk_state_check(sk);
- read_unlock_bh(&sk->sk_callback_lock);
-
- if (!state) {
- pr_debug("iscsi_target_do_login() failed state for"
- " conn: %p\n", conn);
- return -1;
- }
- }
-
return 0;
}

@@ -1251,13 +1309,25 @@ int iscsi_target_start_negotiation(
if (conn->sock) {
struct sock *sk = conn->sock->sk;

- write_lock_bh(&sk->sk_callback_lock);
- set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
- write_unlock_bh(&sk->sk_callback_lock);
- }
+ write_lock_bh(&sk->sk_callback_lock);
+ set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+ set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
+ /*
+ * If iscsi_target_do_login returns zero to signal more PDU
+ * exchanges are required to complete the login, go ahead and
+ * clear LOGIN_FLAGS_INITIAL_PDU but only if the TCP connection
+ * is still active.
+ *
+ * Otherwise if TCP connection dropped asynchronously, go ahead
+ * and perform connection cleanup now.
+ */
+ ret = iscsi_target_do_login(conn, login);
+ if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
+ ret = -1;

- ret = iscsi_target_do_login(conn, login);
- if (ret < 0) {
+ if (ret < 0) {
cancel_delayed_work_sync(&conn->login_work);
cancel_delayed_work_sync(&conn->login_cleanup_work);
iscsi_target_restore_sock_callbacks(conn);
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -562,6 +562,7 @@ struct iscsi_conn {
#define LOGIN_FLAGS_READ_ACTIVE 1
#define LOGIN_FLAGS_CLOSED 2
#define LOGIN_FLAGS_READY 4
+#define LOGIN_FLAGS_INITIAL_PDU 8
unsigned long login_flags;
struct delayed_work login_work;
struct delayed_work login_cleanup_work;


2017-08-09 19:43:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 32/58] net: ethernet: nb8800: Handle all 4 RGMII modes identically

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Gonzalez <[email protected]>


[ Upstream commit 4813497b537c6208c90d6cbecac5072d347de900 ]

Before commit bf8f6952a233 ("Add blurb about RGMII") it was unclear
whose responsibility it was to insert the required clock skew, and
in hindsight, some PHY drivers got it wrong. The solution forward
is to introduce a new property, explicitly requiring skew from the
node to which it is attached. In the interim, this driver will handle
all 4 RGMII modes identically (no skew).

Fixes: 52dfc8301248 ("net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller")
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/aurora/nb8800.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -608,7 +608,7 @@ static void nb8800_mac_config(struct net
mac_mode |= HALF_DUPLEX;

if (gigabit) {
- if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+ if (phy_interface_is_rgmii(dev->phydev))
mac_mode |= RGMII_MODE;

mac_mode |= GMAC_MODE;
@@ -1295,11 +1295,10 @@ static int nb8800_tangox_init(struct net
break;

case PHY_INTERFACE_MODE_RGMII:
- pad_mode = PAD_MODE_RGMII;
- break;
-
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
- pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+ pad_mode = PAD_MODE_RGMII;
break;

default:


2017-08-09 19:43:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 03/58] workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream.

The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution. After NUMA affinity 4c16bd327c74 ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Reported-by: Alexei Potashnik <[email protected]>
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/workqueue.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3834,6 +3834,16 @@ struct workqueue_struct *__alloc_workque
struct workqueue_struct *wq;
struct pool_workqueue *pwq;

+ /*
+ * Unbound && max_active == 1 used to imply ordered, which is no
+ * longer the case on NUMA machines due to per-node pools. While
+ * alloc_ordered_workqueue() is the right way to create an ordered
+ * workqueue, keep the previous behavior to avoid subtle breakages
+ * on NUMA.
+ */
+ if ((flags & WQ_UNBOUND) && max_active == 1)
+ flags |= __WQ_ORDERED;
+
/* see the comment above the definition of WQ_POWER_EFFICIENT */
if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient)
flags |= WQ_UNBOUND;


2017-08-09 19:43:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 16/58] iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 105fa2f44e504c830697b0c794822112d79808dc upstream.

This patch fixes a BUG() in iscsit_close_session() that could be
triggered when iscsit_logout_post_handler() execution from within
tx thread context was not run for more than SECONDS_FOR_LOGOUT_COMP
(15 seconds), and the TCP connection didn't already close before
then forcing tx thread context to automatically exit.

This would manifest itself during explicit logout as:

[33206.974254] 1 connection(s) still exist for iSCSI session to iqn.1993-08.org.debian:01:3f5523242179
[33206.980184] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 2100.772 msecs
[33209.078643] ------------[ cut here ]------------
[33209.078646] kernel BUG at drivers/target/iscsi/iscsi_target.c:4346!

Normally when explicit logout attempt fails, the tx thread context
exits and iscsit_close_connection() from rx thread context does the
extra cleanup once it detects conn->conn_logout_remove has not been
cleared by the logout type specific post handlers.

To address this special case, if the logout post handler in tx thread
context detects conn->tx_thread_active has already been cleared, simply
return and exit in order for existing iscsit_close_connection()
logic from rx thread context do failed logout cleanup.

Reported-by: Bart Van Assche <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Tested-by: Gary Guo <[email protected]>
Tested-by: Chu Yuan Lin <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/target/iscsi/iscsi_target.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4591,8 +4591,11 @@ static void iscsit_logout_post_handler_c
* always sleep waiting for RX/TX thread shutdown to complete
* within iscsit_close_connection().
*/
- if (conn->conn_transport->transport_type == ISCSI_TCP)
+ if (conn->conn_transport->transport_type == ISCSI_TCP) {
sleep = cmpxchg(&conn->tx_thread_active, true, false);
+ if (!sleep)
+ return;
+ }

atomic_set(&conn->conn_logout_remove, 0);
complete(&conn->conn_logout_comp);
@@ -4608,8 +4611,11 @@ static void iscsit_logout_post_handler_s
{
int sleep = 1;

- if (conn->conn_transport->transport_type == ISCSI_TCP)
+ if (conn->conn_transport->transport_type == ISCSI_TCP) {
sleep = cmpxchg(&conn->tx_thread_active, true, false);
+ if (!sleep)
+ return;
+ }

atomic_set(&conn->conn_logout_remove, 0);
complete(&conn->conn_logout_comp);


2017-08-09 19:43:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 28/58] mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Jarosch <[email protected]>


[ Upstream commit 9476d393667968b4a02afbe9d35a3558482b943e ]

DMA transfers are not allowed to buffers that are on the stack.
Therefore allocate a buffer to store the result of usb_control_message().

Fixes these bugreports:
https://bugzilla.kernel.org/show_bug.cgi?id=195217

https://bugzilla.redhat.com/show_bug.cgi?id=1421387
https://bugzilla.redhat.com/show_bug.cgi?id=1427398

Shortened kernel backtrace from 4.11.9-200.fc25.x86_64:
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 3 PID: 2957 at drivers/usb/core/hcd.c:1587
kernel: transfer buffer not dma capable
kernel: Call Trace:
kernel: dump_stack+0x63/0x86
kernel: __warn+0xcb/0xf0
kernel: warn_slowpath_fmt+0x5a/0x80
kernel: usb_hcd_map_urb_for_dma+0x37f/0x570
kernel: ? try_to_del_timer_sync+0x53/0x80
kernel: usb_hcd_submit_urb+0x34e/0xb90
kernel: ? schedule_timeout+0x17e/0x300
kernel: ? del_timer_sync+0x50/0x50
kernel: ? __slab_free+0xa9/0x300
kernel: usb_submit_urb+0x2f4/0x560
kernel: ? urb_destroy+0x24/0x30
kernel: usb_start_wait_urb+0x6e/0x170
kernel: usb_control_msg+0xdc/0x120
kernel: mcs_get_reg+0x36/0x40 [mcs7780]
kernel: mcs_net_open+0xb5/0x5c0 [mcs7780]
...

Regression goes back to 4.9, so it's a good candidate for -stable.
Though it's the decision of the maintainer.

Thanks to Dan Williams for adding the "transfer buffer not dma capable"
warning in the first place. It instantly pointed me in the right direction.

Patch has been tested with transferring data from a Polar watch.

Signed-off-by: Thomas Jarosch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/irda/mcs7780.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

--- a/drivers/net/irda/mcs7780.c
+++ b/drivers/net/irda/mcs7780.c
@@ -141,9 +141,19 @@ static int mcs_set_reg(struct mcs_cb *mc
static int mcs_get_reg(struct mcs_cb *mcs, __u16 reg, __u16 * val)
{
struct usb_device *dev = mcs->usbdev;
- int ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), MCS_RDREQ,
- MCS_RD_RTYPE, 0, reg, val, 2,
- msecs_to_jiffies(MCS_CTRL_TIMEOUT));
+ void *dmabuf;
+ int ret;
+
+ dmabuf = kmalloc(sizeof(__u16), GFP_KERNEL);
+ if (!dmabuf)
+ return -ENOMEM;
+
+ ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), MCS_RDREQ,
+ MCS_RD_RTYPE, 0, reg, dmabuf, 2,
+ msecs_to_jiffies(MCS_CTRL_TIMEOUT));
+
+ memcpy(val, dmabuf, sizeof(__u16));
+ kfree(dmabuf);

return ret;
}


2017-08-09 19:43:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 04/58] ALSA: hda - Fix speaker output from VAIO VPCL14M1R

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergei A. Trusov <[email protected]>

commit 3f3c371421e601fa93b6cb7fb52da9ad59ec90b4 upstream.

Sony VAIO VPCL14M1R needs the quirk to make the speaker working properly.

Tested-by: Dmitriy <[email protected]>
Signed-off-by: Sergei A. Trusov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -2233,6 +2233,7 @@ static const struct snd_pci_quirk alc882
SND_PCI_QUIRK(0x1043, 0x8691, "ASUS ROG Ranger VIII", ALC882_FIXUP_GPIO3),
SND_PCI_QUIRK(0x104d, 0x9047, "Sony Vaio TT", ALC889_FIXUP_VAIO_TT),
SND_PCI_QUIRK(0x104d, 0x905a, "Sony Vaio Z", ALC882_FIXUP_NO_PRIMARY_HP),
+ SND_PCI_QUIRK(0x104d, 0x9060, "Sony Vaio VPCL14M1R", ALC882_FIXUP_NO_PRIMARY_HP),
SND_PCI_QUIRK(0x104d, 0x9043, "Sony Vaio VGC-LN51JGB", ALC882_FIXUP_NO_PRIMARY_HP),
SND_PCI_QUIRK(0x104d, 0x9044, "Sony VAIO AiO", ALC882_FIXUP_NO_PRIMARY_HP),



2017-08-09 19:43:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 36/58] sctp: dont dereference ptr before leaving _sctp_walk_{params, errors}()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <[email protected]>


[ Upstream commit b1f5bfc27a19f214006b9b4db7b9126df2dfdf5a ]

If the length field of the iterator (|pos.p| or |err|) is past the end
of the chunk, we shouldn't access it.

This bug has been detected by KMSAN. For the following pair of system
calls:

socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, 28) = 1

the tool has reported a use of uninitialized memory:

==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
__sctp_rcv_init_lookup net/sctp/input.c:1074
__sctp_rcv_lookup_harder net/sctp/input.c:1233
__sctp_rcv_lookup net/sctp/input.c:1255
sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq kernel/softirq.c:328
__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
rcu_read_unlock_bh ./include/linux/rcupdate.h:931
ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
NF_HOOK_COND ./include/linux/netfilter.h:246
ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
dst_output ./include/net/dst.h:486
NF_HOOK ./include/linux/netfilter.h:257
ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x401133
RSP: 002b:00007fff6d99cd38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401133
RDX: 0000000000000001 RSI: 0000000000494088 RDI: 0000000000000003
RBP: 00007fff6d99cd90 R08: 00007fff6d99cd50 R09: 000000000000001c
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004063d0 R14: 0000000000406460 R15: 0000000000000000
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
slab_alloc_node mm/slub.c:2743
__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
__kmalloc_reserve net/core/skbuff.c:138
__alloc_skb+0x26b/0x840 net/core/skbuff.c:231
alloc_skb ./include/linux/skbuff.h:933
sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================

Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/sctp/sctp.h | 4 ++++
1 file changed, 4 insertions(+)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -444,6 +444,8 @@ _sctp_walk_params((pos), (chunk), ntohs(

#define _sctp_walk_params(pos, chunk, end, member)\
for (pos.v = chunk->member;\
+ (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <\
+ (void *)chunk + end) &&\
pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
pos.v += WORD_ROUND(ntohs(pos.p->length)))
@@ -454,6 +456,8 @@ _sctp_walk_errors((err), (chunk_hdr), nt
#define _sctp_walk_errors(err, chunk_hdr, end)\
for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
sizeof(sctp_chunkhdr_t));\
+ ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <\
+ (void *)chunk_hdr + end) &&\
(void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
ntohs(err->length) >= sizeof(sctp_errhdr_t); \
err = (sctp_errhdr_t *)((void *)err + WORD_ROUND(ntohs(err->length))))


2017-08-09 19:43:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 37/58] sctp: fix the check for _sctp_walk_params and _sctp_walk_errors

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 6b84202c946cd3da3a8daa92c682510e9ed80321 ]

Commit b1f5bfc27a19 ("sctp: don't dereference ptr before leaving
_sctp_walk_{params, errors}()") tried to fix the issue that it
may overstep the chunk end for _sctp_walk_{params, errors} with
'chunk_end > offset(length) + sizeof(length)'.

But it introduced a side effect: When processing INIT, it verifies
the chunks with 'param.v == chunk_end' after iterating all params
by sctp_walk_params(). With the check 'chunk_end > offset(length)
+ sizeof(length)', it would return when the last param is not yet
accessed. Because the last param usually is fwdtsn supported param
whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'

This is a badly issue even causing sctp couldn't process 4-shakes.
Client would always get abort when connecting to server, due to
the failure of INIT chunk verification on server.

The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
instead of 'chunk_end < offset(length) + sizeof(length)' for both
_sctp_walk_params and _sctp_walk_errors.

Fixes: b1f5bfc27a19 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/sctp/sctp.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -444,7 +444,7 @@ _sctp_walk_params((pos), (chunk), ntohs(

#define _sctp_walk_params(pos, chunk, end, member)\
for (pos.v = chunk->member;\
- (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <\
+ (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <=\
(void *)chunk + end) &&\
pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
@@ -456,7 +456,7 @@ _sctp_walk_errors((err), (chunk_hdr), nt
#define _sctp_walk_errors(err, chunk_hdr, end)\
for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
sizeof(sctp_chunkhdr_t));\
- ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <\
+ ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <=\
(void *)chunk_hdr + end) &&\
(void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
ntohs(err->length) >= sizeof(sctp_errhdr_t); \


2017-08-09 19:44:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 11/58] media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Prabhakar Lad <[email protected]>

commit da05d52d2f0f6bd61094a0cd045fed94bf7d673a upstream.

this patch makes sure VPFE_CMD_S_CCDC_RAW_PARAMS ioctl no longer works
for vpfe_capture driver with a minimal patch suitable for backporting.

- This ioctl was never in public api and was only defined in kernel header.
- The function set_params constantly mixes up pointers and phys_addr_t
numbers.
- This is part of a 'VPFE_CMD_S_CCDC_RAW_PARAMS' ioctl command that is
described as an 'experimental ioctl that will change in future kernels'.
- The code to allocate the table never gets called after we copy_from_user
the user input over the kernel settings, and then compare them
for inequality.
- We then go on to use an address provided by user space as both the
__user pointer for input and pass it through phys_to_virt to come up
with a kernel pointer to copy the data to. This looks like a trivially
exploitable root hole.

Due to these reasons we make sure this ioctl now returns -EINVAL and backport
this patch as far as possible.

Fixes: 5f15fbb68fd7 ("V4L/DVB (12251): v4l: dm644x ccdc module for vpfe capture driver")

Signed-off-by: Lad, Prabhakar <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/media/platform/davinci/vpfe_capture.c | 22 ++--------------------
1 file changed, 2 insertions(+), 20 deletions(-)

--- a/drivers/media/platform/davinci/vpfe_capture.c
+++ b/drivers/media/platform/davinci/vpfe_capture.c
@@ -1709,27 +1709,9 @@ static long vpfe_param_handler(struct fi

switch (cmd) {
case VPFE_CMD_S_CCDC_RAW_PARAMS:
+ ret = -EINVAL;
v4l2_warn(&vpfe_dev->v4l2_dev,
- "VPFE_CMD_S_CCDC_RAW_PARAMS: experimental ioctl\n");
- if (ccdc_dev->hw_ops.set_params) {
- ret = ccdc_dev->hw_ops.set_params(param);
- if (ret) {
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "Error setting parameters in CCDC\n");
- goto unlock_out;
- }
- ret = vpfe_get_ccdc_image_format(vpfe_dev,
- &vpfe_dev->fmt);
- if (ret < 0) {
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "Invalid image format at CCDC\n");
- goto unlock_out;
- }
- } else {
- ret = -EINVAL;
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "VPFE_CMD_S_CCDC_RAW_PARAMS not supported\n");
- }
+ "VPFE_CMD_S_CCDC_RAW_PARAMS not supported\n");
break;
default:
ret = -ENOTTY;


2017-08-09 19:43:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 49/58] scsi: qla2xxx: Get mutex lock before checking optrom_state

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Milan P. Gandhi" <[email protected]>


[ Upstream commit c7702b8c22712a06080e10f1d2dee1a133ec8809 ]

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

if (ha->optrom_state != QLA_SREADING)
return 0;

mutex_lock(&ha->optrom_mutex);
rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
ha->optrom_region_size);
mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296] [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941] [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571] [<ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206] [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839] [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Tested-by: Milan P. Gandhi <[email protected]>
Signed-off-by: Milan P. Gandhi <[email protected]>
Reviewed-by: Laurence Oberman <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/qla2xxx/qla_attr.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -329,12 +329,15 @@ qla2x00_sysfs_read_optrom(struct file *f
struct qla_hw_data *ha = vha->hw;
ssize_t rval = 0;

+ mutex_lock(&ha->optrom_mutex);
+
if (ha->optrom_state != QLA_SREADING)
- return 0;
+ goto out;

- mutex_lock(&ha->optrom_mutex);
rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
ha->optrom_region_size);
+
+out:
mutex_unlock(&ha->optrom_mutex);

return rval;
@@ -349,14 +352,19 @@ qla2x00_sysfs_write_optrom(struct file *
struct device, kobj)));
struct qla_hw_data *ha = vha->hw;

- if (ha->optrom_state != QLA_SWRITING)
+ mutex_lock(&ha->optrom_mutex);
+
+ if (ha->optrom_state != QLA_SWRITING) {
+ mutex_unlock(&ha->optrom_mutex);
return -EINVAL;
- if (off > ha->optrom_region_size)
+ }
+ if (off > ha->optrom_region_size) {
+ mutex_unlock(&ha->optrom_mutex);
return -ERANGE;
+ }
if (off + count > ha->optrom_region_size)
count = ha->optrom_region_size - off;

- mutex_lock(&ha->optrom_mutex);
memcpy(&ha->optrom_buffer[off], buf, count);
mutex_unlock(&ha->optrom_mutex);



2017-08-09 19:44:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 57/58] ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: zheng li <[email protected]>


[ Upstream commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 ]

There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -922,7 +922,7 @@ static int __ip_append_data(struct sock
csummode = CHECKSUM_PARTIAL;

cork->length += length;
- if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+ if ((((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len &&
(sk->sk_type == SOCK_DGRAM) && !sk->sk_no_check_tx) {


2017-08-09 19:44:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 56/58] mm: dont dereference struct page fields of invalid pages

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ard Biesheuvel <[email protected]>


[ Upstream commit f073bdc51771f5a5c7a8d1191bfc3ae371d44de7 ]

The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone. However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with. This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.

So reorder the VM_BUG_ON() with the pfn_valid_within() check.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Yisheng Xie <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: James Morse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1527,14 +1527,14 @@ int move_freepages(struct zone *zone,
#endif

for (page = start_page; page <= end_page;) {
- /* Make sure we are not inadvertently changing nodes */
- VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
-
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
continue;
}

+ /* Make sure we are not inadvertently changing nodes */
+ VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
+
if (!PageBuddy(page)) {
page++;
continue;


2017-08-09 19:43:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 50/58] drm/virtio: fix framebuffer sparse warning

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gerd Hoffmann <[email protected]>


[ Upstream commit 71d3f6ef7f5af38dea2975ec5715c88bae92e92d ]

virtio uses normal ram as backing storage for the framebuffer, so we
should assign the address to new screen_buffer (added by commit
17a7b0b4d9749f80d365d7baff5dec2f54b0e992) instead of screen_base.

Reported-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_fb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/virtio/virtgpu_fb.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fb.c
@@ -338,7 +338,7 @@ static int virtio_gpufb_create(struct dr
info->fbops = &virtio_gpufb_ops;
info->pixmap.flags = FB_PIXMAP_SYSTEM;

- info->screen_base = obj->vmap;
+ info->screen_buffer = obj->vmap;
info->screen_size = obj->gem_base.size;
drm_fb_helper_fill_fix(info, fb->pitches[0], fb->depth);
drm_fb_helper_fill_var(info, &vfbdev->helper,


2017-08-09 19:45:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 55/58] signal: protect SIGNAL_UNKILLABLE from unintentional clearing.

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jamie Iles <[email protected]>


[ Upstream commit 2d39b3cd34e6d323720d4c61bd714f5ae202c022 ]

Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we
can now trace init processes. init is initially protected with
SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but
there are a number of paths during tracing where SIGNAL_UNKILLABLE can
be implicitly cleared.

This can result in init becoming stoppable/killable after tracing. For
example, running:

while true; do kill -STOP 1; done &
strace -p 1

and then stopping strace and the kill loop will result in init being
left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but
init will now respond to future SIGSTOP signals rather than ignoring
them.

Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED
that we don't clear SIGNAL_UNKILLABLE.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jamie Iles <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/sched.h | 10 ++++++++++
kernel/signal.c | 4 ++--
2 files changed, 12 insertions(+), 2 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -801,6 +801,16 @@ struct signal_struct {

#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */

+#define SIGNAL_STOP_MASK (SIGNAL_CLD_MASK | SIGNAL_STOP_STOPPED | \
+ SIGNAL_STOP_CONTINUED)
+
+static inline void signal_set_stop_flags(struct signal_struct *sig,
+ unsigned int flags)
+{
+ WARN_ON(sig->flags & (SIGNAL_GROUP_EXIT|SIGNAL_GROUP_COREDUMP));
+ sig->flags = (sig->flags & ~SIGNAL_STOP_MASK) | flags;
+}
+
/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
{
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -346,7 +346,7 @@ static bool task_participate_group_stop(
* fresh group stop. Read comment in do_signal_stop() for details.
*/
if (!sig->group_stop_count && !(sig->flags & SIGNAL_STOP_STOPPED)) {
- sig->flags = SIGNAL_STOP_STOPPED;
+ signal_set_stop_flags(sig, SIGNAL_STOP_STOPPED);
return true;
}
return false;
@@ -845,7 +845,7 @@ static bool prepare_signal(int sig, stru
* will take ->siglock, notice SIGNAL_CLD_MASK, and
* notify its parent. See get_signal_to_deliver().
*/
- signal->flags = why | SIGNAL_STOP_CONTINUED;
+ signal_set_stop_flags(signal, why | SIGNAL_STOP_CONTINUED);
signal->group_stop_count = 0;
signal->group_exit_code = 0;
}


2017-08-09 19:45:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 54/58] lib/Kconfig.debug: fix frv build failure

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sudip Mukherjee <[email protected]>


[ Upstream commit da0510c47519fe0999cffe316e1d370e29f952be ]

The build of frv allmodconfig was failing with the errors like:

/tmp/cc0JSPc3.s: Assembler messages:
/tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined
/tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined
/tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined
/tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined

Commit 866ced950bcd ("kbuild: Support split debug info v4") introduced
splitting the debug info and keeping that in a separate file. Somehow,
the frv-linux gcc did not like that and I am guessing that instead of
splitting it started copying. The first report about this is at:

https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html.

I will try and see if this can work with frv and if still fails I will
open a bug report with gcc. But meanwhile this is the easiest option to
solve build failure of frv.

Fixes: 866ced950bcd ("kbuild: Support split debug info v4")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sudip Mukherjee <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
lib/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -145,7 +145,7 @@ config DEBUG_INFO_REDUCED

config DEBUG_INFO_SPLIT
bool "Produce split debuginfo in .dwo files"
- depends on DEBUG_INFO
+ depends on DEBUG_INFO && !FRV
help
Generate debug info into separate .dwo files. This significantly
reduces the build directory size for builds with DEBUG_INFO,


2017-08-09 19:45:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 53/58] mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michal Hocko <[email protected]>


[ Upstream commit bb1107f7c6052c863692a41f78c000db792334bf ]

Andrey Konovalov has reported the following warning triggered by the
syzkaller fuzzer.

WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__alloc_pages_slowpath mm/page_alloc.c:3511
__alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781
alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072
alloc_pages include/linux/gfp.h:469
kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
kmalloc_large include/linux/slab.h:422
__kmalloc+0x210/0x2d0 mm/slub.c:3723
kmalloc include/linux/slab.h:495
ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664
new_sync_write fs/read_write.c:499
__vfs_write+0x483/0x760 fs/read_write.c:512
vfs_write+0x170/0x4e0 fs/read_write.c:560
SYSC_write fs/read_write.c:607
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2

The issue is caused by a lack of size check for the request size in
ep_write_iter which should be fixed. It, however, points to another
problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its
KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the
resulting page allocator request might be MAX_ORDER which is too large
(see __alloc_pages_slowpath).

The same applies to the SLOB allocator which allows even larger sizes.
Make sure that they are capped properly and never request more than
MAX_ORDER order.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/slab.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -203,7 +203,7 @@ size_t ksize(const void *);
* (PAGE_SIZE*2). Larger requests are passed to the page allocator.
*/
#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 3
#endif
@@ -216,7 +216,7 @@ size_t ksize(const void *);
* be allocated from the same page.
*/
#define KMALLOC_SHIFT_HIGH PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX 30
+#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 3
#endif


2017-08-09 19:46:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 52/58] ARM: 8632/1: ftrace: fix syscall name matching

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rabin Vincent <[email protected]>


[ Upstream commit 270c8cf1cacc69cb8d99dea812f06067a45e4609 ]

ARM has a few system calls (most notably mmap) for which the names of
the functions which are referenced in the syscall table do not match the
names of the syscall tracepoints. As a consequence of this, these
tracepoints are not made available. Implement
arch_syscall_match_sym_name to fix this and allow tracing even these
system calls.

Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/include/asm/ftrace.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

--- a/arch/arm/include/asm/ftrace.h
+++ b/arch/arm/include/asm/ftrace.h
@@ -54,6 +54,24 @@ static inline void *return_address(unsig

#define ftrace_return_address(n) return_address(n)

+#define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
+
+static inline bool arch_syscall_match_sym_name(const char *sym,
+ const char *name)
+{
+ if (!strcmp(sym, "sys_mmap2"))
+ sym = "sys_mmap_pgoff";
+ else if (!strcmp(sym, "sys_statfs64_wrapper"))
+ sym = "sys_statfs64";
+ else if (!strcmp(sym, "sys_fstatfs64_wrapper"))
+ sym = "sys_fstatfs64";
+ else if (!strcmp(sym, "sys_arm_fadvise64_64"))
+ sym = "sys_fadvise64_64";
+
+ /* Ignore case since sym may start with "SyS" instead of "sys" */
+ return !strcasecmp(sym, name);
+}
+
#endif /* ifndef __ASSEMBLY__ */

#endif /* _ASM_ARM_FTRACE */


2017-08-09 19:46:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 51/58] virtio_blk: fix panic in initialization error path

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <[email protected]>


[ Upstream commit 6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3 ]

If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.

Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/virtio_blk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -641,11 +641,12 @@ static int virtblk_probe(struct virtio_d
if (err)
goto out_put_disk;

- q = vblk->disk->queue = blk_mq_init_queue(&vblk->tag_set);
+ q = blk_mq_init_queue(&vblk->tag_set);
if (IS_ERR(q)) {
err = -ENOMEM;
goto out_free_tags;
}
+ vblk->disk->queue = q;

q->queuedata = vblk;



2017-08-09 19:43:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 08/58] ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit fcf5ea10992fbac3c7473a1db33d56a139333cd1 upstream.

ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.

Reported-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/file.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -500,6 +500,8 @@ static int ext4_find_unwritten_pgoff(str
lastoff = page_offset(page);
bh = head = page_buffers(page);
do {
+ if (lastoff + bh->b_size <= startoff)
+ goto next;
if (buffer_uptodate(bh) ||
buffer_unwritten(bh)) {
if (whence == SEEK_DATA)
@@ -514,6 +516,7 @@ static int ext4_find_unwritten_pgoff(str
unlock_page(page);
goto out;
}
+next:
lastoff += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);


2017-08-09 19:46:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 48/58] phy state machine: failsafe leave invalid RUNNING state

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zefir Kurtisi <[email protected]>


[ Upstream commit 811a919135b980bac8009d042acdccf10dc1ef5e ]

While in RUNNING state, phy_state_machine() checks for link changes by
comparing phydev->link before and after calling phy_read_status().
This works as long as it is guaranteed that phydev->link is never
changed outside the phy_state_machine().

If in some setups this happens, it causes the state machine to miss
a link loss and remain RUNNING despite phydev->link being 0.

This has been observed running a dsa setup with a process continuously
polling the link states over ethtool each second (SNMPD RFC-1213
agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
call phy_read_status() and with that modify the link status - and
with that bricking the phy state machine.

This patch adds a fail-safe check while in RUNNING, which causes to
move to CHANGELINK when the link is gone and we are still RUNNING.

Signed-off-by: Zefir Kurtisi <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -921,6 +921,15 @@ void phy_state_machine(struct work_struc
if (old_link != phydev->link)
phydev->state = PHY_CHANGELINK;
}
+ /*
+ * Failsafe: check that nobody set phydev->link=0 between two
+ * poll cycles, otherwise we won't leave RUNNING state as long
+ * as link remains down.
+ */
+ if (!phydev->link && phydev->state == PHY_RUNNING) {
+ phydev->state = PHY_CHANGELINK;
+ dev_err(&phydev->dev, "no link in PHY_RUNNING\n");
+ }
break;
case PHY_CHANGELINK:
err = phy_read_status(phydev);


2017-08-09 19:46:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 46/58] tg3: Fix race condition in tg3_get_stats64().

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Chan <[email protected]>


[ Upstream commit f5992b72ebe0dde488fa8f706b887194020c66fc ]

The driver's ndo_get_stats64() method is not always called under RTNL.
So it can race with driver close or ethtool reconfigurations. Fix the
race condition by taking tp->lock spinlock in tg3_free_consistent()
when freeing the tp->hw_stats memory block. tg3_get_stats64() is
already taking tp->lock.

Reported-by: Wang Yufen <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8722,11 +8722,14 @@ static void tg3_free_consistent(struct t
tg3_mem_rx_release(tp);
tg3_mem_tx_release(tp);

+ /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
+ tg3_full_lock(tp, 0);
if (tp->hw_stats) {
dma_free_coherent(&tp->pdev->dev, sizeof(struct tg3_hw_stats),
tp->hw_stats, tp->stats_mapping);
tp->hw_stats = NULL;
}
+ tg3_full_unlock(tp);
}

/*


2017-08-09 19:46:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 47/58] x86/boot: Add missing declaration of string functions

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Mc Guire <[email protected]>


[ Upstream commit fac69d0efad08fc15e4dbfc116830782acc0dc9a ]

Add the missing declarations of basic string functions to string.h to allow
a clean build.

Fixes: 5be865661516 ("String-handling functions for the new x86 setup code.")
Signed-off-by: Nicholas Mc Guire <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/boot/string.c | 1 +
arch/x86/boot/string.h | 9 +++++++++
2 files changed, 10 insertions(+)

--- a/arch/x86/boot/string.c
+++ b/arch/x86/boot/string.c
@@ -14,6 +14,7 @@

#include <linux/types.h>
#include "ctype.h"
+#include "string.h"

int memcmp(const void *s1, const void *s2, size_t len)
{
--- a/arch/x86/boot/string.h
+++ b/arch/x86/boot/string.h
@@ -18,4 +18,13 @@ int memcmp(const void *s1, const void *s
#define memset(d,c,l) __builtin_memset(d,c,l)
#define memcmp __builtin_memcmp

+extern int strcmp(const char *str1, const char *str2);
+extern int strncmp(const char *cs, const char *ct, size_t count);
+extern size_t strlen(const char *s);
+extern char *strstr(const char *s1, const char *s2);
+extern size_t strnlen(const char *s, size_t maxlen);
+extern unsigned int atou(const char *s);
+extern unsigned long long simple_strtoull(const char *cp, char **endp,
+ unsigned int base);
+
#endif /* BOOT_STRING_H */


2017-08-09 19:47:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 45/58] net: phy: dp83867: fix irq generation

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Grygorii Strashko <[email protected]>


[ Upstream commit 5ca7d1ca77dc23934504b95a96d2660d345f83c2 ]

For proper IRQ generation by DP83867 phy the INT/PWDN pin has to be
programmed as an interrupt output instead of a Powerdown input in
Configuration Register 3 (CFG3), Address 0x001E, bit 7 INT_OE = 1. The
current driver doesn't do this and as result IRQs will not be generated by
DP83867 phy even if they are properly configured in DT.

Hence, fix IRQ generation by properly configuring CFG3.INT_OE bit and
ensure that Link Status Change (LINK_STATUS_CHNG_INT) and Auto-Negotiation
Complete (AUTONEG_COMP_INT) interrupt are enabled. After this the DP83867
driver will work properly in interrupt enabled mode.

Signed-off-by: Grygorii Strashko <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/dp83867.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -29,6 +29,7 @@
#define MII_DP83867_MICR 0x12
#define MII_DP83867_ISR 0x13
#define DP83867_CTRL 0x1f
+#define DP83867_CFG3 0x1e

/* Extended Registers */
#define DP83867_RGMIICTL 0x0032
@@ -89,6 +90,8 @@ static int dp83867_config_intr(struct ph
micr_status |=
(MII_DP83867_MICR_AN_ERR_INT_EN |
MII_DP83867_MICR_SPEED_CHNG_INT_EN |
+ MII_DP83867_MICR_AUTONEG_COMP_INT_EN |
+ MII_DP83867_MICR_LINK_STS_CHNG_INT_EN |
MII_DP83867_MICR_DUP_MODE_CHNG_INT_EN |
MII_DP83867_MICR_SLEEP_MODE_CHNG_INT_EN);

@@ -184,6 +187,13 @@ static int dp83867_config_init(struct ph
DP83867_DEVADDR, phydev->addr, delay);
}

+ /* Enable Interrupt output INT_OE in CFG3 register */
+ if (phy_interrupt_is_valid(phydev)) {
+ val = phy_read(phydev, DP83867_CFG3);
+ val |= BIT(7);
+ phy_write(phydev, DP83867_CFG3, val);
+ }
+
return 0;
}



2017-08-09 19:47:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 43/58] wext: handle NULL extra data in iwe_stream_add_point better

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[email protected]>

commit 93be2b74279c15c2844684b1a027fdc71dd5d9bf upstream.

gcc-7 complains that wl3501_cs passes NULL into a function that
then uses the argument as the input for memcpy:

drivers/net/wireless/wl3501_cs.c: In function 'wl3501_get_scan':
include/net/iw_handler.h:559:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(stream + point_len, extra, iwe->u.data.length);

This works fine here because iwe->u.data.length is guaranteed to be 0
and the memcpy doesn't actually have an effect.

Making the length check explicit avoids the warning and should have
no other effect here.

Also check the pointer itself, since otherwise we get warnings
elsewhere in the code.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/net/iw_handler.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/net/iw_handler.h
+++ b/include/net/iw_handler.h
@@ -556,7 +556,8 @@ iwe_stream_add_point(struct iw_request_i
memcpy(stream + lcp_len,
((char *) &iwe->u) + IW_EV_POINT_OFF,
IW_EV_POINT_PK_LEN - IW_EV_LCP_PK_LEN);
- memcpy(stream + point_len, extra, iwe->u.data.length);
+ if (iwe->u.data.length && extra)
+ memcpy(stream + point_len, extra, iwe->u.data.length);
stream += event_len;
}
return stream;


2017-08-09 19:47:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 44/58] sh_eth: R8A7740 supports packet shecksumming

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergei Shtylyov <[email protected]>


[ Upstream commit 0f1f9cbc04dbb3cc310f70a11cba0cf1f2109d9c ]

The R8A7740 GEther controller supports the packet checksum offloading
but the 'hw_crc' (bad name, I'll fix it) flag isn't set in the R8A7740
data, thus CSMR isn't cleared...

Fixes: 73a0d907301e ("net: sh_eth: add support R8A7740")
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -819,6 +819,7 @@ static struct sh_eth_cpu_data r8a7740_da
.rpadir_value = 2 << 16,
.no_trimd = 1,
.no_ade = 1,
+ .hw_crc = 1,
.tsu = 1,
.select_mii = 1,
.shift_rd0 = 1,


2017-08-09 19:48:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 40/58] xen-netback: correctly schedule rate-limited queues

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Liu <[email protected]>


[ Upstream commit dfa523ae9f2542bee4cddaea37b3be3e157f6e6b ]

Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling will be done in the
timer callback function.

Reported-by: Jean-Louis Dupond <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Tested-by: Jean-Louis Dupond <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netback/common.h | 1 +
drivers/net/xen-netback/interface.c | 6 +++++-
drivers/net/xen-netback/netback.c | 6 +++++-
3 files changed, 11 insertions(+), 2 deletions(-)

--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -201,6 +201,7 @@ struct xenvif_queue { /* Per-queue data
unsigned long remaining_credit;
struct timer_list credit_timeout;
u64 credit_window_start;
+ bool rate_limited;

/* Statistics */
struct xenvif_stats stats;
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -105,7 +105,11 @@ static int xenvif_poll(struct napi_struc

if (work_done < budget) {
napi_complete(napi);
- xenvif_napi_schedule_or_enable_events(queue);
+ /* If the queue is rate-limited, it shall be
+ * rescheduled in the timer callback.
+ */
+ if (likely(!queue->rate_limited))
+ xenvif_napi_schedule_or_enable_events(queue);
}

return work_done;
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -687,6 +687,7 @@ static void tx_add_credit(struct xenvif_
max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */

queue->remaining_credit = min(max_credit, max_burst);
+ queue->rate_limited = false;
}

void xenvif_tx_credit_callback(unsigned long data)
@@ -1184,8 +1185,10 @@ static bool tx_credit_exceeded(struct xe
msecs_to_jiffies(queue->credit_usec / 1000);

/* Timer could already be pending in rare cases. */
- if (timer_pending(&queue->credit_timeout))
+ if (timer_pending(&queue->credit_timeout)) {
+ queue->rate_limited = true;
return true;
+ }

/* Passed the point where we can replenish credit? */
if (time_after_eq64(now, next_credit)) {
@@ -1200,6 +1203,7 @@ static bool tx_credit_exceeded(struct xe
mod_timer(&queue->credit_timeout,
next_credit);
queue->credit_window_start = next_credit;
+ queue->rate_limited = true;

return true;
}


2017-08-09 19:43:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 07/58] mm/page_alloc: Remove kernel address exposure in free_reserved_area()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <[email protected]>

commit adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 upstream.

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg. The only leaks I see on my local
system are:

Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)

Linus says:

"I suspect we should just remove [the addresses in the 'Freeing'
messages]. I'm sure they are useful in theory, but I suspect they
were more useful back when the whole "free init memory" was
originally done.

These days, if we have a use-after-free, I suspect the init-mem
situation is the easiest situation by far. Compared to all the dynamic
allocations which are much more likely to show it anyway. So having
debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

Freeing SMP alternatives memory: 32K
Freeing initrd memory: 10588K
Freeing unused kernel memory: 3592K
Freeing unused kernel memory: 1352K
Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5847,8 +5847,8 @@ unsigned long free_reserved_area(void *s
}

if (pages && s)
- pr_info("Freeing %s memory: %ldK (%p - %p)\n",
- s, pages << (PAGE_SHIFT - 10), start, end);
+ pr_info("Freeing %s memory: %ldK\n",
+ s, pages << (PAGE_SHIFT - 10));

return pages;
}


2017-08-09 19:48:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 41/58] sparc64: Measure receiver forward progress to avoid send mondo timeout

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jane Chu <[email protected]>


[ Upstream commit 9d53caec84c7c5700e7c1ed744ea584fff55f9ac ]

A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.

But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.

This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.

In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.

Orabug: 25476541
Orabug: 26417466

Signed-off-by: Jane Chu <[email protected]>
Reviewed-by: Steve Sistare <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Reviewed-by: Rob Gardner <[email protected]>
Reviewed-by: Thomas Tai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/trap_block.h | 1
arch/sparc/kernel/smp_64.c | 189 ++++++++++++++++++++++--------------
arch/sparc/kernel/sun4v_ivec.S | 15 ++
arch/sparc/kernel/traps_64.c | 1
4 files changed, 134 insertions(+), 72 deletions(-)

--- a/arch/sparc/include/asm/trap_block.h
+++ b/arch/sparc/include/asm/trap_block.h
@@ -54,6 +54,7 @@ extern struct trap_per_cpu trap_block[NR
void init_cur_cpu_trap(struct thread_info *);
void setup_tba(void);
extern int ncpus_probed;
+extern u64 cpu_mondo_counter[NR_CPUS];

unsigned long real_hard_smp_processor_id(void);

--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -617,22 +617,48 @@ retry:
}
}

-/* Multi-cpu list version. */
+#define CPU_MONDO_COUNTER(cpuid) (cpu_mondo_counter[cpuid])
+#define MONDO_USEC_WAIT_MIN 2
+#define MONDO_USEC_WAIT_MAX 100
+#define MONDO_RETRY_LIMIT 500000
+
+/* Multi-cpu list version.
+ *
+ * Deliver xcalls to 'cnt' number of cpus in 'cpu_list'.
+ * Sometimes not all cpus receive the mondo, requiring us to re-send
+ * the mondo until all cpus have received, or cpus are truly stuck
+ * unable to receive mondo, and we timeout.
+ * Occasionally a target cpu strand is borrowed briefly by hypervisor to
+ * perform guest service, such as PCIe error handling. Consider the
+ * service time, 1 second overall wait is reasonable for 1 cpu.
+ * Here two in-between mondo check wait time are defined: 2 usec for
+ * single cpu quick turn around and up to 100usec for large cpu count.
+ * Deliver mondo to large number of cpus could take longer, we adjusts
+ * the retry count as long as target cpus are making forward progress.
+ */
static void hypervisor_xcall_deliver(struct trap_per_cpu *tb, int cnt)
{
- int retries, this_cpu, prev_sent, i, saw_cpu_error;
+ int this_cpu, tot_cpus, prev_sent, i, rem;
+ int usec_wait, retries, tot_retries;
+ u16 first_cpu = 0xffff;
+ unsigned long xc_rcvd = 0;
unsigned long status;
+ int ecpuerror_id = 0;
+ int enocpu_id = 0;
u16 *cpu_list;
+ u16 cpu;

this_cpu = smp_processor_id();
-
cpu_list = __va(tb->cpu_list_pa);
-
- saw_cpu_error = 0;
- retries = 0;
+ usec_wait = cnt * MONDO_USEC_WAIT_MIN;
+ if (usec_wait > MONDO_USEC_WAIT_MAX)
+ usec_wait = MONDO_USEC_WAIT_MAX;
+ retries = tot_retries = 0;
+ tot_cpus = cnt;
prev_sent = 0;
+
do {
- int forward_progress, n_sent;
+ int n_sent, mondo_delivered, target_cpu_busy;

status = sun4v_cpu_mondo_send(cnt,
tb->cpu_list_pa,
@@ -640,94 +666,113 @@ static void hypervisor_xcall_deliver(str

/* HV_EOK means all cpus received the xcall, we're done. */
if (likely(status == HV_EOK))
- break;
+ goto xcall_done;
+
+ /* If not these non-fatal errors, panic */
+ if (unlikely((status != HV_EWOULDBLOCK) &&
+ (status != HV_ECPUERROR) &&
+ (status != HV_ENOCPU)))
+ goto fatal_errors;

/* First, see if we made any forward progress.
*
+ * Go through the cpu_list, count the target cpus that have
+ * received our mondo (n_sent), and those that did not (rem).
+ * Re-pack cpu_list with the cpus remain to be retried in the
+ * front - this simplifies tracking the truly stalled cpus.
+ *
* The hypervisor indicates successful sends by setting
* cpu list entries to the value 0xffff.
+ *
+ * EWOULDBLOCK means some target cpus did not receive the
+ * mondo and retry usually helps.
+ *
+ * ECPUERROR means at least one target cpu is in error state,
+ * it's usually safe to skip the faulty cpu and retry.
+ *
+ * ENOCPU means one of the target cpu doesn't belong to the
+ * domain, perhaps offlined which is unexpected, but not
+ * fatal and it's okay to skip the offlined cpu.
*/
+ rem = 0;
n_sent = 0;
for (i = 0; i < cnt; i++) {
- if (likely(cpu_list[i] == 0xffff))
+ cpu = cpu_list[i];
+ if (likely(cpu == 0xffff)) {
n_sent++;
+ } else if ((status == HV_ECPUERROR) &&
+ (sun4v_cpu_state(cpu) == HV_CPU_STATE_ERROR)) {
+ ecpuerror_id = cpu + 1;
+ } else if (status == HV_ENOCPU && !cpu_online(cpu)) {
+ enocpu_id = cpu + 1;
+ } else {
+ cpu_list[rem++] = cpu;
+ }
}

- forward_progress = 0;
- if (n_sent > prev_sent)
- forward_progress = 1;
+ /* No cpu remained, we're done. */
+ if (rem == 0)
+ break;

- prev_sent = n_sent;
+ /* Otherwise, update the cpu count for retry. */
+ cnt = rem;

- /* If we get a HV_ECPUERROR, then one or more of the cpus
- * in the list are in error state. Use the cpu_state()
- * hypervisor call to find out which cpus are in error state.
+ /* Record the overall number of mondos received by the
+ * first of the remaining cpus.
*/
- if (unlikely(status == HV_ECPUERROR)) {
- for (i = 0; i < cnt; i++) {
- long err;
- u16 cpu;
-
- cpu = cpu_list[i];
- if (cpu == 0xffff)
- continue;
-
- err = sun4v_cpu_state(cpu);
- if (err == HV_CPU_STATE_ERROR) {
- saw_cpu_error = (cpu + 1);
- cpu_list[i] = 0xffff;
- }
- }
- } else if (unlikely(status != HV_EWOULDBLOCK))
- goto fatal_mondo_error;
+ if (first_cpu != cpu_list[0]) {
+ first_cpu = cpu_list[0];
+ xc_rcvd = CPU_MONDO_COUNTER(first_cpu);
+ }

- /* Don't bother rewriting the CPU list, just leave the
- * 0xffff and non-0xffff entries in there and the
- * hypervisor will do the right thing.
- *
- * Only advance timeout state if we didn't make any
- * forward progress.
+ /* Was any mondo delivered successfully? */
+ mondo_delivered = (n_sent > prev_sent);
+ prev_sent = n_sent;
+
+ /* or, was any target cpu busy processing other mondos? */
+ target_cpu_busy = (xc_rcvd < CPU_MONDO_COUNTER(first_cpu));
+ xc_rcvd = CPU_MONDO_COUNTER(first_cpu);
+
+ /* Retry count is for no progress. If we're making progress,
+ * reset the retry count.
*/
- if (unlikely(!forward_progress)) {
- if (unlikely(++retries > 10000))
- goto fatal_mondo_timeout;
-
- /* Delay a little bit to let other cpus catch up
- * on their cpu mondo queue work.
- */
- udelay(2 * cnt);
+ if (likely(mondo_delivered || target_cpu_busy)) {
+ tot_retries += retries;
+ retries = 0;
+ } else if (unlikely(retries > MONDO_RETRY_LIMIT)) {
+ goto fatal_mondo_timeout;
}
- } while (1);

- if (unlikely(saw_cpu_error))
- goto fatal_mondo_cpu_error;
+ /* Delay a little bit to let other cpus catch up on
+ * their cpu mondo queue work.
+ */
+ if (!mondo_delivered)
+ udelay(usec_wait);

- return;
+ retries++;
+ } while (1);

-fatal_mondo_cpu_error:
- printk(KERN_CRIT "CPU[%d]: SUN4V mondo cpu error, some target cpus "
- "(including %d) were in error state\n",
- this_cpu, saw_cpu_error - 1);
+xcall_done:
+ if (unlikely(ecpuerror_id > 0)) {
+ pr_crit("CPU[%d]: SUN4V mondo cpu error, target cpu(%d) was in error state\n",
+ this_cpu, ecpuerror_id - 1);
+ } else if (unlikely(enocpu_id > 0)) {
+ pr_crit("CPU[%d]: SUN4V mondo cpu error, target cpu(%d) does not belong to the domain\n",
+ this_cpu, enocpu_id - 1);
+ }
return;

+fatal_errors:
+ /* fatal errors include bad alignment, etc */
+ pr_crit("CPU[%d]: Args were cnt(%d) cpulist_pa(%lx) mondo_block_pa(%lx)\n",
+ this_cpu, tot_cpus, tb->cpu_list_pa, tb->cpu_mondo_block_pa);
+ panic("Unexpected SUN4V mondo error %lu\n", status);
+
fatal_mondo_timeout:
- printk(KERN_CRIT "CPU[%d]: SUN4V mondo timeout, no forward "
- " progress after %d retries.\n",
- this_cpu, retries);
- goto dump_cpu_list_and_out;
-
-fatal_mondo_error:
- printk(KERN_CRIT "CPU[%d]: Unexpected SUN4V mondo error %lu\n",
- this_cpu, status);
- printk(KERN_CRIT "CPU[%d]: Args were cnt(%d) cpulist_pa(%lx) "
- "mondo_block_pa(%lx)\n",
- this_cpu, cnt, tb->cpu_list_pa, tb->cpu_mondo_block_pa);
-
-dump_cpu_list_and_out:
- printk(KERN_CRIT "CPU[%d]: CPU list [ ", this_cpu);
- for (i = 0; i < cnt; i++)
- printk("%u ", cpu_list[i]);
- printk("]\n");
+ /* some cpus being non-responsive to the cpu mondo */
+ pr_crit("CPU[%d]: SUN4V mondo timeout, cpu(%d) made no forward progress after %d retries. Total target cpus(%d).\n",
+ this_cpu, first_cpu, (tot_retries + retries), tot_cpus);
+ panic("SUN4V mondo timeout panic\n");
}

static void (*xcall_deliver_impl)(struct trap_per_cpu *, int);
--- a/arch/sparc/kernel/sun4v_ivec.S
+++ b/arch/sparc/kernel/sun4v_ivec.S
@@ -26,6 +26,21 @@ sun4v_cpu_mondo:
ldxa [%g0] ASI_SCRATCHPAD, %g4
sub %g4, TRAP_PER_CPU_FAULT_INFO, %g4

+ /* Get smp_processor_id() into %g3 */
+ sethi %hi(trap_block), %g5
+ or %g5, %lo(trap_block), %g5
+ sub %g4, %g5, %g3
+ srlx %g3, TRAP_BLOCK_SZ_SHIFT, %g3
+
+ /* Increment cpu_mondo_counter[smp_processor_id()] */
+ sethi %hi(cpu_mondo_counter), %g5
+ or %g5, %lo(cpu_mondo_counter), %g5
+ sllx %g3, 3, %g3
+ add %g5, %g3, %g5
+ ldx [%g5], %g3
+ add %g3, 1, %g3
+ stx %g3, [%g5]
+
/* Get CPU mondo queue base phys address into %g7. */
ldx [%g4 + TRAP_PER_CPU_CPU_MONDO_PA], %g7

--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -2659,6 +2659,7 @@ void do_getpsr(struct pt_regs *regs)
}
}

+u64 cpu_mondo_counter[NR_CPUS] = {0};
struct trap_per_cpu trap_block[NR_CPUS];
EXPORT_SYMBOL(trap_block);



2017-08-09 19:48:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 42/58] sparc64: Prevent perf from running during super critical sections

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rob Gardner <[email protected]>


[ Upstream commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a ]

This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.

Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).

But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.

One particular critical section occurs in switch_mm:

spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);

If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.

This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.

Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.

Orabug: 25577560

Signed-off-by: Dave Aldridge <[email protected]>
Signed-off-by: Rob Gardner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-----
arch/sparc/kernel/tsb.S | 12 ++++++++++++
arch/sparc/power/hibernate.c | 3 +--
3 files changed, 20 insertions(+), 7 deletions(-)

--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -25,9 +25,11 @@ void destroy_context(struct mm_struct *m
void __tsb_context_switch(unsigned long pgd_pa,
struct tsb_config *tsb_base,
struct tsb_config *tsb_huge,
- unsigned long tsb_descr_pa);
+ unsigned long tsb_descr_pa,
+ unsigned long secondary_ctx);

-static inline void tsb_context_switch(struct mm_struct *mm)
+static inline void tsb_context_switch_ctx(struct mm_struct *mm,
+ unsigned long ctx)
{
__tsb_context_switch(__pa(mm->pgd),
&mm->context.tsb_block[0],
@@ -38,7 +40,8 @@ static inline void tsb_context_switch(st
#else
NULL
#endif
- , __pa(&mm->context.tsb_descr[0]));
+ , __pa(&mm->context.tsb_descr[0]),
+ ctx);
}

void tsb_grow(struct mm_struct *mm,
@@ -110,8 +113,7 @@ static inline void switch_mm(struct mm_s
* cpu0 to update it's TSB because at that point the cpu_vm_mask
* only had cpu1 set in it.
*/
- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));

/* Any time a processor runs a context on an address space
* for the first time, we must flush that context out of the
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -375,6 +375,7 @@ tsb_flush:
* %o1: TSB base config pointer
* %o2: TSB huge config pointer, or NULL if none
* %o3: Hypervisor TSB descriptor physical address
+ * %o4: Secondary context to load, if non-zero
*
* We have to run this whole thing with interrupts
* disabled so that the current cpu doesn't change
@@ -387,6 +388,17 @@ __tsb_context_switch:
rdpr %pstate, %g1
wrpr %g1, PSTATE_IE, %pstate

+ brz,pn %o4, 1f
+ mov SECONDARY_CONTEXT, %o5
+
+661: stxa %o4, [%o5] ASI_DMMU
+ .section .sun4v_1insn_patch, "ax"
+ .word 661b
+ stxa %o4, [%o5] ASI_MMU
+ .previous
+ flush %g6
+
+1:
TRAP_LOAD_TRAP_BLOCK(%g2, %g3)

stx %o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
--- a/arch/sparc/power/hibernate.c
+++ b/arch/sparc/power/hibernate.c
@@ -35,6 +35,5 @@ void restore_processor_state(void)
{
struct mm_struct *mm = current->active_mm;

- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
}


2017-08-09 19:49:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 39/58] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ]

Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez <[email protected]>
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -541,6 +541,9 @@ void phy_stop_machine(struct phy_device
if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
phydev->state = PHY_UP;
mutex_unlock(&phydev->lock);
+
+ /* Now we can run the state machine synchronously */
+ phy_state_machine(&phydev->state_queue.work);
}

/**


2017-08-09 19:49:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 33/58] dccp: fix a memleak that dccp_ipv6 doesnt put reqsk properly

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 0c2232b0a71db0ac1d22f751aa1ac0cadb950fd2 ]

In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.

The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.

Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.

dccp_memleak.c:
int s1 = socket(PF_INET6, 6, IPPROTO_IP);
bind(s1, &sa1, 0x20);
listen(s1, 0x9);
int s2 = socket(PF_INET6, 6, IPPROTO_IP);
connect(s2, &sa1, 0x20);
close(s1);
close(s2);

This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ipv6.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -376,6 +376,7 @@ static int dccp_v6_conn_request(struct s
goto drop_and_free;

inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ reqsk_put(req);
return 0;

drop_and_free:


2017-08-09 19:49:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 30/58] packet: fix use-after-free in prb_retire_rx_blk_timer_expired()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit c800aaf8d869f2b9b47b10c5c312fe19f0a94042 ]

There are multiple reports showing we have a use-after-free in
the timer prb_retire_rx_blk_timer_expired(), where we use struct
tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
free_pg_vec().

The interesting part is it is not freed via packet_release() but
via packet_setsockopt(), which means we are not closing the socket.
Looking into the big and fat function packet_set_ring(), this could
happen if we satisfy the following conditions:

1. closing == 0, not on packet_release() path
2. req->tp_block_nr == 0, we don't allocate a new pg_vec
3. rx_ring->pg_vec is already set as V3, which means we already called
packet_set_ring() wtih req->tp_block_nr > 0 previously
4. req->tp_frame_nr == 0, pass sanity check
5. po->mapped == 0, never called mmap()

In this scenario we are clearing the old rx_ring->pg_vec, so we need
to free this pg_vec, but we don't stop the timer on this path because
of closing==0.

The timer has to be stopped as long as we need to free pg_vec, therefore
the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.

Thanks to liujian for testing different fixes.

Reported-by: [email protected]
Reported-by: Dave Jones <[email protected]>
Reported-by: liujian (CE) <[email protected]>
Tested-by: liujian (CE) <[email protected]>
Cc: Ding Tianhong <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4225,7 +4225,7 @@ static int packet_set_ring(struct sock *
register_prot_hook(sk);
}
spin_unlock(&po->bind_lock);
- if (closing && (po->tp_version > TPACKET_V2)) {
+ if (pg_vec && (po->tp_version > TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
if (!tx_ring)
prb_shutdown_retire_blk_timer(po, rb_queue);


2017-08-09 19:49:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 29/58] openvswitch: fix potential out of bound access in parse_ct

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Liping Zhang <[email protected]>


[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ]

Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/openvswitch/conntrack.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -577,8 +577,8 @@ static int parse_ct(const struct nlattr

nla_for_each_nested(a, attr, rem) {
int type = nla_type(a);
- int maxlen = ovs_ct_attr_lens[type].maxlen;
- int minlen = ovs_ct_attr_lens[type].minlen;
+ int maxlen;
+ int minlen;

if (type > OVS_CT_ATTR_MAX) {
OVS_NLERR(log,
@@ -586,6 +586,9 @@ static int parse_ct(const struct nlattr
type, OVS_CT_ATTR_MAX);
return -EINVAL;
}
+
+ maxlen = ovs_ct_attr_lens[type].maxlen;
+ minlen = ovs_ct_attr_lens[type].minlen;
if (nla_len(a) < minlen || nla_len(a) > maxlen) {
OVS_NLERR(log,
"Conntrack attr type has unexpected length (type=%d, length=%d, expected=%d)",


2017-08-09 19:50:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 38/58] net/mlx5: Fix command bad flow on command entry allocation failure

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Moshe Shemesh <[email protected]>


[ Upstream commit 219c81f7d1d5a89656cb3b53d3b4e11e93608d80 ]

When driver fail to allocate an entry to send command to FW, it must
notify the calling function and release the memory allocated for
this command.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Moshe Shemesh <[email protected]>
Cc: [email protected]
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -630,6 +630,10 @@ static void dump_command(struct mlx5_cor
pr_debug("\n");
}

+static void free_msg(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *msg);
+static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev,
+ struct mlx5_cmd_msg *msg);
+
static void cmd_work_handler(struct work_struct *work)
{
struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work);
@@ -638,16 +642,27 @@ static void cmd_work_handler(struct work
struct mlx5_cmd_layout *lay;
struct semaphore *sem;
unsigned long flags;
+ int alloc_ret;

sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
down(sem);
if (!ent->page_queue) {
- ent->idx = alloc_ent(cmd);
- if (ent->idx < 0) {
+ alloc_ret = alloc_ent(cmd);
+ if (alloc_ret < 0) {
+ if (ent->callback) {
+ ent->callback(-EAGAIN, ent->context);
+ mlx5_free_cmd_msg(dev, ent->out);
+ free_msg(dev, ent->in);
+ free_cmd(ent);
+ } else {
+ ent->ret = -EAGAIN;
+ complete(&ent->done);
+ }
mlx5_core_err(dev, "failed to allocate command entry\n");
up(sem);
return;
}
+ ent->idx = alloc_ret;
} else {
ent->idx = cmd->max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);


2017-08-09 19:50:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 35/58] dccp: fix a memleak for dccp_feat_init err process

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit e90ce2fc27cad7e7b1e72b9e66201a7a4c124c2b ]

In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/feat.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -1471,9 +1471,12 @@ int dccp_feat_init(struct sock *sk)
* singleton values (which always leads to failure).
* These settings can still (later) be overridden via sockopts.
*/
- if (ccid_get_builtin_ccids(&tx.val, &tx.len) ||
- ccid_get_builtin_ccids(&rx.val, &rx.len))
+ if (ccid_get_builtin_ccids(&tx.val, &tx.len))
return -ENOBUFS;
+ if (ccid_get_builtin_ccids(&rx.val, &rx.len)) {
+ kfree(tx.val);
+ return -ENOBUFS;
+ }

if (!dccp_feat_prefer(sysctl_dccp_tx_ccid, tx.val, tx.len) ||
!dccp_feat_prefer(sysctl_dccp_rx_ccid, rx.val, rx.len))


2017-08-09 19:51:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 21/58] drm: rcar-du: fix backport bug

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <[email protected]>

In the backport of commit 4f7b0d263833 ("drm: rcar-du: Simplify and fix
probe error handling"), which is commit 8255d26322a3 in this tree, the
error handling path was incorrect. This patch fixes it up.

Reported-by: Ben Hutchings <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: thongsyho <[email protected]>
Cc: Nhan Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/rcar-du/rcar_du_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
@@ -296,7 +296,7 @@ static int rcar_du_probe(struct platform
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
rcdu->mmio = devm_ioremap_resource(&pdev->dev, mem);
if (IS_ERR(rcdu->mmio))
- ret = PTR_ERR(rcdu->mmio);
+ return PTR_ERR(rcdu->mmio);

/* DRM/KMS objects */
ddev = drm_dev_alloc(&rcar_du_driver, &pdev->dev);


2017-08-09 19:51:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 22/58] [media] saa7164: fix double fetch PCIe access condition

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Toth <[email protected]>

commit 6fb05e0dd32e566facb96ea61a48c7488daa5ac3 upstream.

Avoid a double fetch by reusing the values from the prior transfer.

Originally reported via https://bugzilla.kernel.org/show_bug.cgi?id=195559

Thanks to Pengfei Wang <[email protected]> for reporting.

Signed-off-by: Steven Toth <[email protected]>
Reported-by: Pengfei Wang <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/media/pci/saa7164/saa7164-bus.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)

--- a/drivers/media/pci/saa7164/saa7164-bus.c
+++ b/drivers/media/pci/saa7164/saa7164-bus.c
@@ -393,11 +393,11 @@ int saa7164_bus_get(struct saa7164_dev *
msg_tmp.size = le16_to_cpu((__force __le16)msg_tmp.size);
msg_tmp.command = le32_to_cpu((__force __le32)msg_tmp.command);
msg_tmp.controlselector = le16_to_cpu((__force __le16)msg_tmp.controlselector);
+ memcpy(msg, &msg_tmp, sizeof(*msg));

/* No need to update the read positions, because this was a peek */
/* If the caller specifically want to peek, return */
if (peekonly) {
- memcpy(msg, &msg_tmp, sizeof(*msg));
goto peekout;
}

@@ -442,21 +442,15 @@ int saa7164_bus_get(struct saa7164_dev *
space_rem = bus->m_dwSizeGetRing - curr_grp;

if (space_rem < sizeof(*msg)) {
- /* msg wraps around the ring */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, space_rem);
- memcpy_fromio((u8 *)msg + space_rem, bus->m_pdwGetRing,
- sizeof(*msg) - space_rem);
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing + sizeof(*msg) -
space_rem, buf_size);

} else if (space_rem == sizeof(*msg)) {
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing, buf_size);
} else {
/* Additional data wraps around the ring */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf) {
memcpy_fromio(buf, bus->m_pdwGetRing + curr_grp +
sizeof(*msg), space_rem - sizeof(*msg));
@@ -469,15 +463,10 @@ int saa7164_bus_get(struct saa7164_dev *

} else {
/* No wrapping */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing + curr_grp + sizeof(*msg),
buf_size);
}
- /* Convert from little endian to CPU */
- msg->size = le16_to_cpu((__force __le16)msg->size);
- msg->command = le32_to_cpu((__force __le32)msg->command);
- msg->controlselector = le16_to_cpu((__force __le16)msg->controlselector);

/* Update the read positions, adjusting the ring */
saa7164_writel(bus->m_dwGetReadPos, new_grp);


2017-08-09 19:51:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 34/58] dccp: fix a memleak that dccp_ipv4 doesnt put reqsk properly

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit b7953d3c0e30a5fc944f6b7bd0bcceb0794bcd85 ]

The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.

This patch is to fix it for dccp_ipv4.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ipv4.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -635,6 +635,7 @@ int dccp_v4_conn_request(struct sock *sk
goto drop_and_free;

inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ reqsk_put(req);
return 0;

drop_and_free:


2017-08-09 19:43:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 31/58] ipv6: Dont increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stefano Brivio <[email protected]>


[ Upstream commit afce615aaabfbaad02550e75c0bec106dafa1adf ]

RFC 2465 defines ipv6IfStatsOutFragFails as:

"The number of IPv6 datagrams that have been discarded
because they needed to be fragmented at this output
interface but could not be."

The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.

This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.

Reported-by: Sabrina Dubroca <[email protected]>
Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_output.c | 4 ----
1 file changed, 4 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -647,8 +647,6 @@ int ip6_fragment(struct net *net, struct
*prevhdr = NEXTHDR_FRAGMENT;
tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC);
if (!tmp_hdr) {
- IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}
@@ -767,8 +765,6 @@ slow_path:
frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
hroom + troom, GFP_ATOMIC);
if (!frag) {
- IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}


2017-08-09 19:43:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 09/58] ext4: fix overflow caused by missing cast in ext4_resize_fs()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jerry Lee <[email protected]>

commit aec51758ce10a9c847a62a48a168f8c804c6e053 upstream.

On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks. This may
caused the superblock being corrupted with zero blocks count.

Fixes: 1c6bd7173d66
Signed-off-by: Jerry Lee <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/resize.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1926,7 +1926,8 @@ retry:
n_desc_blocks = o_desc_blocks +
le16_to_cpu(es->s_reserved_gdt_blocks);
n_group = n_desc_blocks * EXT4_DESC_PER_BLOCK(sb);
- n_blocks_count = n_group * EXT4_BLOCKS_PER_GROUP(sb);
+ n_blocks_count = (ext4_fsblk_t)n_group *
+ EXT4_BLOCKS_PER_GROUP(sb);
n_group--; /* set to last group number */
}



2017-08-09 19:43:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 26/58] ipv4: initialize fib_trie prior to register_netdev_notifier call.

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mahesh Bandewar <[email protected]>


[ Upstream commit 8799a221f5944a7d74516ecf46d58c28ec1d1f75 ]

Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.

Fixes following crash

Call Trace:
? alternate_node_alloc+0x76/0xa0
fib_table_insert+0x1b7/0x4b0
fib_magic.isra.17+0xea/0x120
fib_add_ifaddr+0x7b/0x190
fib_netdev_event+0xc0/0x130
register_netdevice_notifier+0x1c1/0x1d0
ip_fib_init+0x72/0x85
ip_rt_init+0x187/0x1e9
ip_init+0xe/0x1a
inet_init+0x171/0x26c
? ipv4_offload_init+0x66/0x66
do_one_initcall+0x43/0x160
kernel_init_freeable+0x191/0x219
? rest_init+0x80/0x80
kernel_init+0xe/0x150
ret_from_fork+0x22/0x30
Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
CR2: 0000000000000014

Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization")

Signed-off-by: Mahesh Bandewar <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/fib_frontend.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1319,13 +1319,14 @@ static struct pernet_operations fib_net_

void __init ip_fib_init(void)
{
- rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
- rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
- rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
+ fib_trie_init();

register_pernet_subsys(&fib_net_ops);
+
register_netdevice_notifier(&fib_netdev_notifier);
register_inetaddr_notifier(&fib_inetaddr_notifier);

- fib_trie_init();
+ rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
+ rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
+ rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
}


2017-08-09 19:52:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 25/58] ipv6: avoid overflow of offset in ip6_find_1stfragopt

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <[email protected]>


[ Upstream commit 6399f1fae4ec29fab5ec76070435555e256ca3a6 ]

In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.

Signed-off-by: Sabrina Dubroca <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/output_core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -78,7 +78,7 @@ EXPORT_SYMBOL(ipv6_select_ident);

int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
{
- u16 offset = sizeof(struct ipv6hdr);
+ unsigned int offset = sizeof(struct ipv6hdr);
unsigned int packet_len = skb_tail_pointer(skb) -
skb_network_header(skb);
int found_rhdr = 0;
@@ -86,6 +86,7 @@ int ip6_find_1stfragopt(struct sk_buff *

while (offset <= packet_len) {
struct ipv6_opt_hdr *exthdr;
+ unsigned int len;

switch (**nexthdr) {

@@ -111,7 +112,10 @@ int ip6_find_1stfragopt(struct sk_buff *

exthdr = (struct ipv6_opt_hdr *)(skb_network_header(skb) +
offset);
- offset += ipv6_optlen(exthdr);
+ len = ipv6_optlen(exthdr);
+ if (len + offset >= IPV6_MAXPLEN)
+ return -EINVAL;
+ offset += len;
*nexthdr = &exthdr->nexthdr;
}



2017-08-09 19:43:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 27/58] rtnetlink: allocate more memory for dev_set_mac_address()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit 153711f9421be5dbc973dc57a4109dc9d54c89b1 ]

virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.

We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.

Reported-by: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/rtnetlink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1742,7 +1742,8 @@ static int do_setlink(const struct sk_bu
struct sockaddr *sa;
int len;

- len = sizeof(sa_family_t) + dev->addr_len;
+ len = sizeof(sa_family_t) + max_t(size_t, dev->addr_len,
+ sizeof(*sa));
sa = kmalloc(len, GFP_KERNEL);
if (!sa) {
err = -ENOMEM;


2017-08-09 19:53:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 05/58] ASoC: do not close shared backend dailink

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Banajit Goswami <[email protected]>

commit b1cd2e34c69a2f3988786af451b6e17967c293a0 upstream.

Multiple frontend dailinks may be connected to a backend
dailink at the same time. When one of frontend dailinks is
closed, the associated backend dailink should not be closed
if it is connected to other active frontend dailinks. Change
ensures that backend dailink is closed only after all
connected frontend dailinks are closed.

Signed-off-by: Gopikrishnaiah Anandan <[email protected]>
Signed-off-by: Banajit Goswami <[email protected]>
Signed-off-by: Patrick Lai <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/soc-pcm.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -181,6 +181,10 @@ int dpcm_dapm_stream_event(struct snd_so
dev_dbg(be->dev, "ASoC: BE %s event %d dir %d\n",
be->dai_link->name, event, dir);

+ if ((event == SND_SOC_DAPM_STREAM_STOP) &&
+ (be->dpcm[dir].users >= 1))
+ continue;
+
snd_soc_dapm_stream_event(be, dir, event);
}



2017-08-09 19:53:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 20/58] f2fs: sanity check checkpoint segno and blkoff

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jin Qian <[email protected]>

commit 15d3042a937c13f5d9244241c7a9c8416ff6e82a upstream.

Make sure segno and blkoff read from raw image are valid.

Cc: [email protected]
Signed-off-by: Jin Qian <[email protected]>
[Jaegeuk Kim: adjust minor coding style]
Signed-off-by: Jaegeuk Kim <[email protected]>
[AmitP: Found in Android Security bulletin for Aug'17, fixes CVE-2017-10663]
Signed-off-by: Amit Pundir <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/super.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1078,6 +1078,8 @@ static int sanity_check_ckpt(struct f2fs
unsigned int total, fsmeta;
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ unsigned int main_segs, blocks_per_seg;
+ int i;

total = le32_to_cpu(raw_super->segment_count);
fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
@@ -1089,6 +1091,20 @@ static int sanity_check_ckpt(struct f2fs
if (unlikely(fsmeta >= total))
return 1;

+ main_segs = le32_to_cpu(raw_super->segment_count_main);
+ blocks_per_seg = sbi->blocks_per_seg;
+
+ for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) {
+ if (le32_to_cpu(ckpt->cur_node_segno[i]) >= main_segs ||
+ le16_to_cpu(ckpt->cur_node_blkoff[i]) >= blocks_per_seg)
+ return 1;
+ }
+ for (i = 0; i < NR_CURSEG_DATA_TYPE; i++) {
+ if (le32_to_cpu(ckpt->cur_data_segno[i]) >= main_segs ||
+ le16_to_cpu(ckpt->cur_data_blkoff[i]) >= blocks_per_seg)
+ return 1;
+ }
+
if (unlikely(f2fs_cp_error(sbi))) {
f2fs_msg(sbi->sb, KERN_ERR, "A bug case: need to run fsck");
return 1;


2017-08-09 19:54:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 24/58] net: Zero terminate ifr_name in dev_ifname().

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>


[ Upstream commit 63679112c536289826fec61c917621de95ba2ade ]

The ifr.ifr_name is passed around and assumed to be NULL terminated.

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/dev_ioctl.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -28,6 +28,7 @@ static int dev_ifname(struct net *net, s

if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
return -EFAULT;
+ ifr.ifr_name[IFNAMSIZ-1] = 0;

error = netdev_get_name(net, ifr.ifr_name, ifr.ifr_ifindex);
if (error)


2017-08-09 19:42:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 14/58] iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 8f0dfb3d8b1120c61f6e2cc3729290db10772b2d upstream.

There is a iscsi-target/tcp login race in LOGIN_FLAGS_READY
state assignment that can result in frequent errors during
iscsi discovery:

"iSCSI Login negotiation failed."

To address this bug, move the initial LOGIN_FLAGS_READY
assignment ahead of iscsi_target_do_login() when handling
the initial iscsi_target_start_negotiation() request PDU
during connection login.

As iscsi_target_do_login_rx() work_struct callback is
clearing LOGIN_FLAGS_READ_ACTIVE after subsequent calls
to iscsi_target_do_login(), the early sk_data_ready
ahead of the first iscsi_target_do_login() expects
LOGIN_FLAGS_READY to also be set for the initial
login request PDU.

As reported by Maged, this was first obsered using an
MSFT initiator running across multiple VMWare host
virtual machines with iscsi-target/tcp.

Reported-by: Maged Mokhtar <[email protected]>
Tested-by: Maged Mokhtar <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/target/iscsi/iscsi_target_nego.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -1248,16 +1248,16 @@ int iscsi_target_start_negotiation(
{
int ret;

- ret = iscsi_target_do_login(conn, login);
- if (!ret) {
- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;

- write_lock_bh(&sk->sk_callback_lock);
- set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
- write_unlock_bh(&sk->sk_callback_lock);
- }
- } else if (ret < 0) {
+ write_lock_bh(&sk->sk_callback_lock);
+ set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
+
+ ret = iscsi_target_do_login(conn, login);
+ if (ret < 0) {
cancel_delayed_work_sync(&conn->login_work);
cancel_delayed_work_sync(&conn->login_cleanup_work);
iscsi_target_restore_sock_callbacks(conn);


2017-08-09 19:55:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 10/58] ARM: dts: armada-38x: Fix irq type for pca955

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gregory CLEMENT <[email protected]>

commit 8d4514173211586c6238629b1ef1e071927735f5 upstream.

As written in the datasheet the PCA955 can only handle low level irq and
not edge irq.

Without this fix the interrupt is not usable for pca955: the gpio-pca953x
driver already set the irq type as low level which is incompatible with
edge type, then the kernel prevents using the interrupt:

"irq: type mismatch, failed to map hwirq-18 for
/soc/internal-regs/gpio@18100!"

Fixes: 928413bd859c ("ARM: mvebu: Add Armada 388 General Purpose
Development Board support")
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/boot/dts/armada-388-gp.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/boot/dts/armada-388-gp.dts
+++ b/arch/arm/boot/dts/armada-388-gp.dts
@@ -89,7 +89,7 @@
pinctrl-names = "default";
pinctrl-0 = <&pca0_pins>;
interrupt-parent = <&gpio0>;
- interrupts = <18 IRQ_TYPE_EDGE_FALLING>;
+ interrupts = <18 IRQ_TYPE_LEVEL_LOW>;
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
@@ -101,7 +101,7 @@
compatible = "nxp,pca9555";
pinctrl-names = "default";
interrupt-parent = <&gpio0>;
- interrupts = <18 IRQ_TYPE_EDGE_FALLING>;
+ interrupts = <18 IRQ_TYPE_LEVEL_LOW>;
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;


2017-08-10 00:02:14

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On 08/09/2017 01:41 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.81 release.
> There are 58 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.81-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-08-10 00:37:49

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On 08/09/2017 12:41 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.81 release.
> There are 58 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 143 fail: 2
Failed builds:
sparc64:allmodconfig
sparc64:allnoconfig

Qemu test results:
total: 115 pass: 110 fail: 5
Failed tests:
nios2:10m50-ghrd:10m50_defconfig:10m50_devboard.dts
sparc64:sun4u:smp:sparc64_defconfig
sparc64:sun4v:smp:sparc64_defconfig
sparc64:sun4u:nosmp:sparc64_defconfig
sparc64:sun4v:nosmp:sparc64_defconfig

Build error:

arch/sparc/mm/tsb.c:478:3: error: implicit declaration of function 'tsb_context_switch'

The nios2 test crashes at reboot.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
ea = 00000000, ra = c803f110, cause = 13
Kernel panic - not syncing: Oops

I'll start a bisect on it. Note that it also crashes in 4.9.

Details are available at http://kerneltests.org/builders.

Guenter

2017-08-10 00:58:39

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On Wed, Aug 09, 2017 at 12:41:12PM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.81 release.
> There are 58 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> Anything received after that time might be too late.
>

Bisect results for the nios2 crash:

# bad: [a7b4f512d9ce04f1c4133838371a2f3a8a31d323] Linux 4.4.81-rc1
# good: [09e69607e47ce9f422da4310c68d7a9b399d4f8c] Linux 4.4.80
git bisect start 'HEAD' 'v4.4.80'
# good: [a3bfc7a6f7e2a469d261d50a4bd4a9797ccc38d8] openvswitch: fix potential out of bound access in parse_ct
git bisect good a3bfc7a6f7e2a469d261d50a4bd4a9797ccc38d8
# bad: [fec6ca05cae44222e0d778ae9f2662de4c7da6e8] sh_eth: R8A7740 supports packet shecksumming
git bisect bad fec6ca05cae44222e0d778ae9f2662de4c7da6e8
# good: [f9a6e277a4290a30aaeb43b74ec78fa0ea8b125a] sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
git bisect good f9a6e277a4290a30aaeb43b74ec78fa0ea8b125a
# bad: [9a8cf5c3ae379556f25c47f0e26276101045c421] xen-netback: correctly schedule rate-limited queues
git bisect bad 9a8cf5c3ae379556f25c47f0e26276101045c421
# good: [eded791aa722d68d443c1722489339ba52930e44] net/mlx5: Fix command bad flow on command entry allocation failure
git bisect good eded791aa722d68d443c1722489339ba52930e44
# bad: [75f7f0b78d6ba311c902871649561fb6a1a66956] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
git bisect bad 75f7f0b78d6ba311c902871649561fb6a1a66956
# first bad commit: [75f7f0b78d6ba311c902871649561fb6a1a66956] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

Reverting the offending patch fixes the problem. Reverting the same patch
in v4.9.y fixes the problem there as well.

I did a wild guess and found that applying

7b9a88a390da net: phy: Fix PHY unbind crash

instead of reverting the offending patch fixes the problem in both
v4.4.y and v4.9.y.

Copying Florian and David for additional input.

Guenter

2017-08-10 16:17:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On Wed, Aug 09, 2017 at 05:37:45PM -0700, Guenter Roeck wrote:
> On 08/09/2017 12:41 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.81 release.
> > There are 58 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> > Anything received after that time might be too late.
> >
>
> Build results:
> total: 145 pass: 143 fail: 2
> Failed builds:
> sparc64:allmodconfig
> sparc64:allnoconfig
>
> Qemu test results:
> total: 115 pass: 110 fail: 5
> Failed tests:
> nios2:10m50-ghrd:10m50_defconfig:10m50_devboard.dts
> sparc64:sun4u:smp:sparc64_defconfig
> sparc64:sun4v:smp:sparc64_defconfig
> sparc64:sun4u:nosmp:sparc64_defconfig
> sparc64:sun4v:nosmp:sparc64_defconfig
>
> Build error:
>
> arch/sparc/mm/tsb.c:478:3: error: implicit declaration of function 'tsb_context_switch'
>
> The nios2 test crashes at reboot.
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> ea = 00000000, ra = c803f110, cause = 13
> Kernel panic - not syncing: Oops
>
> I'll start a bisect on it. Note that it also crashes in 4.9.

Odd that this also doesn't crash on 3.18 right now, as it has the same
offending patch. I've queued up the fix now, thanks.

greg k-h

2017-08-10 16:18:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On Wed, Aug 09, 2017 at 05:58:32PM -0700, Guenter Roeck wrote:
> On Wed, Aug 09, 2017 at 12:41:12PM -0700, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.81 release.
> > There are 58 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> > Anything received after that time might be too late.
> >
>
> Bisect results for the nios2 crash:
>
> # bad: [a7b4f512d9ce04f1c4133838371a2f3a8a31d323] Linux 4.4.81-rc1
> # good: [09e69607e47ce9f422da4310c68d7a9b399d4f8c] Linux 4.4.80
> git bisect start 'HEAD' 'v4.4.80'
> # good: [a3bfc7a6f7e2a469d261d50a4bd4a9797ccc38d8] openvswitch: fix potential out of bound access in parse_ct
> git bisect good a3bfc7a6f7e2a469d261d50a4bd4a9797ccc38d8
> # bad: [fec6ca05cae44222e0d778ae9f2662de4c7da6e8] sh_eth: R8A7740 supports packet shecksumming
> git bisect bad fec6ca05cae44222e0d778ae9f2662de4c7da6e8
> # good: [f9a6e277a4290a30aaeb43b74ec78fa0ea8b125a] sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
> git bisect good f9a6e277a4290a30aaeb43b74ec78fa0ea8b125a
> # bad: [9a8cf5c3ae379556f25c47f0e26276101045c421] xen-netback: correctly schedule rate-limited queues
> git bisect bad 9a8cf5c3ae379556f25c47f0e26276101045c421
> # good: [eded791aa722d68d443c1722489339ba52930e44] net/mlx5: Fix command bad flow on command entry allocation failure
> git bisect good eded791aa722d68d443c1722489339ba52930e44
> # bad: [75f7f0b78d6ba311c902871649561fb6a1a66956] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
> git bisect bad 75f7f0b78d6ba311c902871649561fb6a1a66956
> # first bad commit: [75f7f0b78d6ba311c902871649561fb6a1a66956] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
>
> Reverting the offending patch fixes the problem. Reverting the same patch
> in v4.9.y fixes the problem there as well.
>
> I did a wild guess and found that applying
>
> 7b9a88a390da net: phy: Fix PHY unbind crash
>
> instead of reverting the offending patch fixes the problem in both
> v4.4.y and v4.9.y.
>
> Copying Florian and David for additional input.

I've queued up this fix for 4.9, 4.4, and 3.18, as the same patch was
there as well.

thanks,

greg k-h

2017-08-10 16:20:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 42/58] sparc64: Prevent perf from running during super critical sections

On Wed, Aug 09, 2017 at 12:41:54PM -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Rob Gardner <[email protected]>
>
>
> [ Upstream commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a ]
>
> This fixes another cause of random segfaults and bus errors that may
> occur while running perf with the callgraph option.
>
> Critical sections beginning with spin_lock_irqsave() raise the interrupt
> level to PIL_NORMAL_MAX (14) and intentionally do not block performance
> counter interrupts, which arrive at PIL_NMI (15).
>
> But some sections of code are "super critical" with respect to perf
> because the perf_callchain_user() path accesses user space and may cause
> TLB activity as well as faults as it unwinds the user stack.
>
> One particular critical section occurs in switch_mm:
>
> spin_lock_irqsave(&mm->context.lock, flags);
> ...
> load_secondary_context(mm);
> tsb_context_switch(mm);
> ...
> spin_unlock_irqrestore(&mm->context.lock, flags);
>
> If a perf interrupt arrives in between load_secondary_context() and
> tsb_context_switch(), then perf_callchain_user() could execute with
> the context ID of one process, but with an active TSB for a different
> process. When the user stack is accessed, it is very likely to
> incur a TLB miss, since the h/w context ID has been changed. The TLB
> will then be reloaded with a translation from the TSB for one process,
> but using a context ID for another process. This exposes memory from
> one process to another, and since it is a mapping for stack memory,
> this usually causes the new process to crash quickly.
>
> This super critical section needs more protection than is provided
> by spin_lock_irqsave() since perf interrupts must not be allowed in.
>
> Since __tsb_context_switch already goes through the trouble of
> disabling interrupts completely, we fix this by moving the secondary
> context load down into this better protected region.
>
> Orabug: 25577560
>
> Signed-off-by: Dave Aldridge <[email protected]>
> Signed-off-by: Rob Gardner <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-----
> arch/sparc/kernel/tsb.S | 12 ++++++++++++
> arch/sparc/power/hibernate.c | 3 +--
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> --- a/arch/sparc/include/asm/mmu_context_64.h
> +++ b/arch/sparc/include/asm/mmu_context_64.h
> @@ -25,9 +25,11 @@ void destroy_context(struct mm_struct *m
> void __tsb_context_switch(unsigned long pgd_pa,
> struct tsb_config *tsb_base,
> struct tsb_config *tsb_huge,
> - unsigned long tsb_descr_pa);
> + unsigned long tsb_descr_pa,
> + unsigned long secondary_ctx);
>
> -static inline void tsb_context_switch(struct mm_struct *mm)
> +static inline void tsb_context_switch_ctx(struct mm_struct *mm,
> + unsigned long ctx)
> {
> __tsb_context_switch(__pa(mm->pgd),
> &mm->context.tsb_block[0],
> @@ -38,7 +40,8 @@ static inline void tsb_context_switch(st
> #else
> NULL
> #endif
> - , __pa(&mm->context.tsb_descr[0]));
> + , __pa(&mm->context.tsb_descr[0]),
> + ctx);
> }
>
> void tsb_grow(struct mm_struct *mm,
> @@ -110,8 +113,7 @@ static inline void switch_mm(struct mm_s
> * cpu0 to update it's TSB because at that point the cpu_vm_mask
> * only had cpu1 set in it.
> */
> - load_secondary_context(mm);
> - tsb_context_switch(mm);
> + tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
>
> /* Any time a processor runs a context on an address space
> * for the first time, we must flush that context out of the
> --- a/arch/sparc/kernel/tsb.S
> +++ b/arch/sparc/kernel/tsb.S
> @@ -375,6 +375,7 @@ tsb_flush:
> * %o1: TSB base config pointer
> * %o2: TSB huge config pointer, or NULL if none
> * %o3: Hypervisor TSB descriptor physical address
> + * %o4: Secondary context to load, if non-zero
> *
> * We have to run this whole thing with interrupts
> * disabled so that the current cpu doesn't change
> @@ -387,6 +388,17 @@ __tsb_context_switch:
> rdpr %pstate, %g1
> wrpr %g1, PSTATE_IE, %pstate
>
> + brz,pn %o4, 1f
> + mov SECONDARY_CONTEXT, %o5
> +
> +661: stxa %o4, [%o5] ASI_DMMU
> + .section .sun4v_1insn_patch, "ax"
> + .word 661b
> + stxa %o4, [%o5] ASI_MMU
> + .previous
> + flush %g6
> +
> +1:
> TRAP_LOAD_TRAP_BLOCK(%g2, %g3)
>
> stx %o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
> --- a/arch/sparc/power/hibernate.c
> +++ b/arch/sparc/power/hibernate.c
> @@ -35,6 +35,5 @@ void restore_processor_state(void)
> {
> struct mm_struct *mm = current->active_mm;
>
> - load_secondary_context(mm);
> - tsb_context_switch(mm);
> + tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
> }
>

This also breaks the build here as well, so I'm dropping it from the
patch queue.

thanks,

greg k-h

2017-08-10 16:21:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On Wed, Aug 09, 2017 at 05:37:45PM -0700, Guenter Roeck wrote:
> On 08/09/2017 12:41 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.81 release.
> > There are 58 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> > Anything received after that time might be too late.
> >
>
> Build results:
> total: 145 pass: 143 fail: 2
> Failed builds:
> sparc64:allmodconfig
> sparc64:allnoconfig
>
> Qemu test results:
> total: 115 pass: 110 fail: 5
> Failed tests:
> nios2:10m50-ghrd:10m50_defconfig:10m50_devboard.dts
> sparc64:sun4u:smp:sparc64_defconfig
> sparc64:sun4v:smp:sparc64_defconfig
> sparc64:sun4u:nosmp:sparc64_defconfig
> sparc64:sun4v:nosmp:sparc64_defconfig
>
> Build error:
>
> arch/sparc/mm/tsb.c:478:3: error: implicit declaration of function 'tsb_context_switch'

I've dropped the offending patch for this issue from here, and for 4.9.

thanks,

greg k-h

2017-08-10 17:34:41

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/58] 4.4.81-stable review

On Thu, Aug 10, 2017 at 09:17:37AM -0700, Greg Kroah-Hartman wrote:
> On Wed, Aug 09, 2017 at 05:37:45PM -0700, Guenter Roeck wrote:
> > On 08/09/2017 12:41 PM, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.4.81 release.
> > > There are 58 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Fri Aug 11 19:41:25 UTC 2017.
> > > Anything received after that time might be too late.
> > >
> >
> > Build results:
> > total: 145 pass: 143 fail: 2
> > Failed builds:
> > sparc64:allmodconfig
> > sparc64:allnoconfig
> >
> > Qemu test results:
> > total: 115 pass: 110 fail: 5
> > Failed tests:
> > nios2:10m50-ghrd:10m50_defconfig:10m50_devboard.dts
> > sparc64:sun4u:smp:sparc64_defconfig
> > sparc64:sun4v:smp:sparc64_defconfig
> > sparc64:sun4u:nosmp:sparc64_defconfig
> > sparc64:sun4v:nosmp:sparc64_defconfig
> >
> > Build error:
> >
> > arch/sparc/mm/tsb.c:478:3: error: implicit declaration of function 'tsb_context_switch'
> >
> > The nios2 test crashes at reboot.
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > ea = 00000000, ra = c803f110, cause = 13
> > Kernel panic - not syncing: Oops
> >
> > I'll start a bisect on it. Note that it also crashes in 4.9.
>
> Odd that this also doesn't crash on 3.18 right now, as it has the same
> offending patch. I've queued up the fix now, thanks.
>
The nios2 architecture is not supported in 3.18.

Guenter

2017-08-11 01:33:10

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 01/58] parisc: Increase thread and stack size to 32kb

On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Helge Deller <[email protected]>
>
> commit 8f8201dfed91a43ac38c899c82f81eef3d36afd9 upstream.
>
> Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
> the default size of 16k. The reason why stack usage suddenly grew is yet
> unknown.

So we don't need this for 4.4.

Ben.

> Signed-off-by: Helge Deller <[email protected]>
> Signed-off-by: Helge Deller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> arch/parisc/include/asm/thread_info.h | 2 +-
> arch/parisc/kernel/irq.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> --- a/arch/parisc/include/asm/thread_info.h
> +++ b/arch/parisc/include/asm/thread_info.h
> @@ -34,7 +34,7 @@ struct thread_info {
>
> /* thread information allocation */
>
> -#define THREAD_SIZE_ORDER 2 /* PA-RISC requires at least 16k stack */
> +#define THREAD_SIZE_ORDER 3 /* PA-RISC requires at least 32k stack */
> /* Be sure to hunt all references to this down when you change the size of
> * the kernel stack */
> #define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
> --- a/arch/parisc/kernel/irq.c
> +++ b/arch/parisc/kernel/irq.c
> @@ -380,7 +380,7 @@ static inline int eirr_to_irq(unsigned l
> /*
> * IRQ STACK - used for irq handler
> */
> -#define IRQ_STACK_SIZE (4096 << 2) /* 16k irq stack size */
> +#define IRQ_STACK_SIZE (4096 << 3) /* 32k irq stack size */
>
> union irq_stack_union {
> unsigned long stack[IRQ_STACK_SIZE/sizeof(unsigned long)];
>
>

--
Ben Hutchings
Software Developer, Codethink Ltd.


2017-08-11 07:21:21

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH 4.4 01/58] parisc: Increase thread and stack size to 32kb

On 11.08.2017 03:33, Ben Hutchings wrote:
> On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Helge Deller <[email protected]>
>>
>> commit 8f8201dfed91a43ac38c899c82f81eef3d36afd9 upstream.
>>
>> Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
>> the default size of 16k. The reason why stack usage suddenly grew is yet
>> unknown.
>
> So we don't need this for 4.4.

Correct.

I had Cc: [email protected] # 4.11+
in the commit itself:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f8201dfed91a43ac38c899c82f81eef3d36afd9

Helge

>
> Ben.
>
>> Signed-off-by: Helge Deller <[email protected]>
>> Signed-off-by: Helge Deller <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>
>> ---
>> arch/parisc/include/asm/thread_info.h | 2 +-
>> arch/parisc/kernel/irq.c | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> --- a/arch/parisc/include/asm/thread_info.h
>> +++ b/arch/parisc/include/asm/thread_info.h
>> @@ -34,7 +34,7 @@ struct thread_info {
>>
>> /* thread information allocation */
>>
>> -#define THREAD_SIZE_ORDER 2 /* PA-RISC requires at least 16k stack */
>> +#define THREAD_SIZE_ORDER 3 /* PA-RISC requires at least 32k stack */
>> /* Be sure to hunt all references to this down when you change the size of
>> * the kernel stack */
>> #define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
>> --- a/arch/parisc/kernel/irq.c
>> +++ b/arch/parisc/kernel/irq.c
>> @@ -380,7 +380,7 @@ static inline int eirr_to_irq(unsigned l
>> /*
>> * IRQ STACK - used for irq handler
>> */
>> -#define IRQ_STACK_SIZE (4096 << 2) /* 16k irq stack size */
>> +#define IRQ_STACK_SIZE (4096 << 3) /* 32k irq stack size */
>>
>> union irq_stack_union {
>> unsigned long stack[IRQ_STACK_SIZE/sizeof(unsigned long)];
>>
>>
>

2017-08-11 15:33:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 01/58] parisc: Increase thread and stack size to 32kb

On Fri, Aug 11, 2017 at 09:21:15AM +0200, Helge Deller wrote:
> On 11.08.2017 03:33, Ben Hutchings wrote:
> > On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> >> 4.4-stable review patch. If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: Helge Deller <[email protected]>
> >>
> >> commit 8f8201dfed91a43ac38c899c82f81eef3d36afd9 upstream.
> >>
> >> Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
> >> the default size of 16k. The reason why stack usage suddenly grew is yet
> >> unknown.
> >
> > So we don't need this for 4.4.
>
> Correct.
>
> I had Cc: [email protected] # 4.11+
> in the commit itself:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f8201dfed91a43ac38c899c82f81eef3d36afd9
>

Yes, my fault, I shouldn't have applied it there, now dropped.

thanks,

greg k-h

2017-08-11 16:12:52

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 15/58] iscsi-target: Fix initial login PDU asynchronous socket close OOPs

On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me
> know.
>
> ------------------
>
> From: Nicholas Bellinger <[email protected]>
>
> commit 25cdda95fda78d22d44157da15aa7ea34be3c804 upstream.
[...]
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -489,14 +489,60 @@ static void iscsi_target_restore_sock_ca
>
> static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *);
>
> -static bool iscsi_target_sk_state_check(struct sock *sk)
> +static bool __iscsi_target_sk_check_close(struct sock *sk)
> {
> if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) {
> - pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE,"
> + pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,"
> "returning FALSE\n");
> - return false;
> + return true;
[...]

The log message no longer matches the code. This seems to be unfixed
upstream.

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.


2017-08-11 17:46:01

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Mel Gorman <[email protected]>
>
> commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
[...]
> +/*
> + * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
> + * releasing the PTL if TLB flushes are batched. It's possible for a parallel
> + * operation such as mprotect or munmap to race between reclaim unmapping
> + * the page and flushing the page. If this race occurs, it potentially allows
> + * access to data via a stale TLB entry. Tracking all mm's that have TLB
> + * batching in flight would be expensive during reclaim so instead track
> + * whether TLB batching occurred in the past and if so then do a flush here
> + * if required. This will cost one additional flush per reclaim cycle paid
> + * by the first operation at risk such as mprotect and mumap.
> + *
> + * This must be called under the PTL so that an access to tlb_flush_batched
> + * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
> + * via the PTL.

What about USE_SPLIT_PTE_PTLOCKS? I don't see how you can use "the PTL"
to synchronise access to a per-mm flag.

Ben.

> + */
> +void flush_tlb_batched_pending(struct mm_struct *mm)
> +{
> + if (mm->tlb_flush_batched) {
> + flush_tlb_mm(mm);
> +
> + /*
> + * Do not allow the compiler to re-order the clearing of
> + * tlb_flush_batched before the tlb is flushed.
> + */
> + barrier();
> + mm->tlb_flush_batched = false;
> + }
> +}
> #else
> static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
> struct page *page, bool writable)
>
>

--
Ben Hutchings
Software Developer, Codethink Ltd.


2017-08-13 06:27:33

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

Ben Hutchings <[email protected]> wrote:

> On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Mel Gorman <[email protected]>
>>
>> commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
> [...]
>> +/*
>> + * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
>> + * releasing the PTL if TLB flushes are batched. It's possible for a parallel
>> + * operation such as mprotect or munmap to race between reclaim unmapping
>> + * the page and flushing the page. If this race occurs, it potentially allows
>> + * access to data via a stale TLB entry. Tracking all mm's that have TLB
>> + * batching in flight would be expensive during reclaim so instead track
>> + * whether TLB batching occurred in the past and if so then do a flush here
>> + * if required. This will cost one additional flush per reclaim cycle paid
>> + * by the first operation at risk such as mprotect and mumap.
>> + *
>> + * This must be called under the PTL so that an access to tlb_flush_batched
>> + * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
>> + * via the PTL.
>
> What about USE_SPLIT_PTE_PTLOCKS? I don't see how you can use "the PTL"
> to synchronise access to a per-mm flag.

Although it is a per-mm flag, the only situations we care about it are those
in which “the PTL” (i.e. the same PTL) is accessed by both the reclaimer
(which batches the flushes) and mprotect/munmap/etc.

Nadav

2017-08-14 08:00:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

On Fri, Aug 11, 2017 at 06:45:49PM +0100, Ben Hutchings wrote:
> On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Mel Gorman <[email protected]>
> >
> > commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
> [...]
> > +/*
> > + * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
> > + * releasing the PTL if TLB flushes are batched. It's possible for a parallel
> > + * operation such as mprotect or munmap to race between reclaim unmapping
> > + * the page and flushing the page. If this race occurs, it potentially allows
> > + * access to data via a stale TLB entry. Tracking all mm's that have TLB
> > + * batching in flight would be expensive during reclaim so instead track
> > + * whether TLB batching occurred in the past and if so then do a flush here
> > + * if required. This will cost one additional flush per reclaim cycle paid
> > + * by the first operation at risk such as mprotect and mumap.
> > + *
> > + * This must be called under the PTL so that an access to tlb_flush_batched
> > + * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
> > + * via the PTL.
>
> What about USE_SPLIT_PTE_PTLOCKS? I don't see how you can use "the PTL"
> to synchronise access to a per-mm flag.
>

In this context, the primary concern is a race with clearing and
checking PTEs at the location protected by a single PTL lock. While the
flag in question is a per-mm flag, the ordering only matters when a race
can potentially occur.

--
Mel Gorman
SUSE Labs

2017-08-15 13:36:44

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

On Sat, 2017-08-12 at 23:27 -0700, Nadav Amit wrote:
> Ben Hutchings <[email protected]> wrote:
>
> > On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
> >> 4.4-stable review patch. If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: Mel Gorman <[email protected]>
> >>
> >> commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
> > [...]
> >> +/*
> >> + * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
> >> + * releasing the PTL if TLB flushes are batched. It's possible for a parallel
> >> + * operation such as mprotect or munmap to race between reclaim unmapping
> >> + * the page and flushing the page. If this race occurs, it potentially allows
> >> + * access to data via a stale TLB entry. Tracking all mm's that have TLB
> >> + * batching in flight would be expensive during reclaim so instead track
> >> + * whether TLB batching occurred in the past and if so then do a flush here
> >> + * if required. This will cost one additional flush per reclaim cycle paid
> >> + * by the first operation at risk such as mprotect and mumap.
> >> + *
> >> + * This must be called under the PTL so that an access to tlb_flush_batched
> >> + * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
> >> + * via the PTL.
> >
> > What about USE_SPLIT_PTE_PTLOCKS? I don't see how you can use "the PTL"
> > to synchronise access to a per-mm flag.
>
> Although it is a per-mm flag, the only situations we care about it are those
> in which “the PTL” (i.e. the same PTL) is accessed by both the reclaimer
> (which batches the flushes) and mprotect/munmap/etc.

Is there anything that presents this sequence?

P0 P1 P2
-- -- --

change_pte_range() [ptl=X]
-> flush_tlb_batch_pending()
-> flush_tlb_mm()
try_to_unmap_one() [ptl=Y]
-> set_tlb_ubc_flush_pending()
-> tlb_flush_batched = true
-> tlb_flush_batched = false

change_pte_range() [ptl=Y]
->
flush_tlb_batch_pending()
(nop)

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.


2017-08-15 16:39:51

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 4.4 18/58] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

Ben Hutchings <[email protected]> wrote:

> On Sat, 2017-08-12 at 23:27 -0700, Nadav Amit wrote:
>> Ben Hutchings <[email protected]> wrote:
>>
>>> On Wed, 2017-08-09 at 12:41 -0700, Greg Kroah-Hartman wrote:
>>>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>>
>>>> From: Mel Gorman <[email protected]>
>>>>
>>>> commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
>>> [...]
>>>> +/*
>>>> + * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
>>>> + * releasing the PTL if TLB flushes are batched. It's possible for a parallel
>>>> + * operation such as mprotect or munmap to race between reclaim unmapping
>>>> + * the page and flushing the page. If this race occurs, it potentially allows
>>>> + * access to data via a stale TLB entry. Tracking all mm's that have TLB
>>>> + * batching in flight would be expensive during reclaim so instead track
>>>> + * whether TLB batching occurred in the past and if so then do a flush here
>>>> + * if required. This will cost one additional flush per reclaim cycle paid
>>>> + * by the first operation at risk such as mprotect and mumap.
>>>> + *
>>>> + * This must be called under the PTL so that an access to tlb_flush_batched
>>>> + * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
>>>> + * via the PTL.
>>>
>>> What about USE_SPLIT_PTE_PTLOCKS? I don't see how you can use "the PTL"
>>> to synchronise access to a per-mm flag.
>>
>> Although it is a per-mm flag, the only situations we care about it are those
>> in which “the PTL” (i.e. the same PTL) is accessed by both the reclaimer
>> (which batches the flushes) and mprotect/munmap/etc.
>
> Is there anything that presents this sequence?
>
> P0 P1 P2
> -- -- --
>
> change_pte_range() [ptl=X]
> -> flush_tlb_batch_pending()
> -> flush_tlb_mm()
> try_to_unmap_one() [ptl=Y]
> -> set_tlb_ubc_flush_pending()
> -> tlb_flush_batched = true
> -> tlb_flush_batched = false
>
> change_pte_range() [ptl=Y]
> ->
> flush_tlb_batch_pending()
> (nop)

I think (but not sure) that you regard a similar concern I raised before.
Mel gave an answer [1], but I cannot say I feel very comfortable with it.

[1] http://www.spinics.net/lists/linux-mm/msg131265.html

Nadav