2017-08-28 08:58:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 00/84] 4.9.46-stable review

This is the start of the stable review cycle for the 4.9.46 release.
There are 84 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Aug 30 08:05:10 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.46-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.9.46-rc1

Benjamin Herrenschmidt <[email protected]>
powerpc/mm: Ensure cpumask update is ordered

Lv Zheng <[email protected]>
ACPI: EC: Fix regression related to wrong ECDT initialization order

James Morse <[email protected]>
ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal

Joerg Roedel <[email protected]>
ACPI: ioapic: Clear on-stack resource before using it

Dave Jiang <[email protected]>
ntb: transport shouldn't disable link due to bogus values in SPADs

Logan Gunthorpe <[email protected]>
ntb: ntb_test: ensure the link is up before trying to configure the mws

Allen Hubbe <[email protected]>
ntb: no sleep in ntb_async_tx_submit

Logan Gunthorpe <[email protected]>
NTB: ntb_test: fix bug printing ntb_perf results

Logan Gunthorpe <[email protected]>
ntb_transport: fix bug calculating num_qps_mw

Logan Gunthorpe <[email protected]>
ntb_transport: fix qp count bug

Linus Torvalds <[email protected]>
Clarify (and fix) MAX_LFS_FILESIZE macros

Charles Milette <[email protected]>
staging: rtl8188eu: add RNX-N150NUB support

Srinivas Pandruvada <[email protected]>
iio: hid-sensor-trigger: Fix the race with user space powering up sensors

Dragos Bogdan <[email protected]>
iio: imu: adis16480: Fix acceleration scale factor for adis16480

Martijn Coenen <[email protected]>
ANDROID: binder: fix proc->tsk check.

Riley Andrews <[email protected]>
binder: Use wake up hint for synchronous transactions.

Todd Kjos <[email protected]>
binder: use group leader instead of open thread

Todd Kjos <[email protected]>
Revert "android: binder: Sanity check at binder ioctl"

Jeffy Chen <[email protected]>
Bluetooth: bnep: fix possible might sleep error in bnep_session

Jeffy Chen <[email protected]>
Bluetooth: cmtp: fix possible might sleep error in cmtp_session

Jeffy Chen <[email protected]>
Bluetooth: hidp: fix possible might sleep error in hidp_session_thread

Florian Westphal <[email protected]>
netfilter: nat: fix src map lookup

Zhang Bo <[email protected]>
Revert "leds: handle suspend/resume in heartbeat trigger"

Vadim Lomovtsev <[email protected]>
net: sunrpc: svcsock: fix NULL-pointer exception

Eric Biggers <[email protected]>
x86/mm: Fix use-after-free of ldt_struct

Nicholas Piggin <[email protected]>
timers: Fix excessive granularity of new timers after a nohz idle

Thomas Gleixner <[email protected]>
perf/x86/intel/rapl: Make package handling more robust

Masami Hiramatsu <[email protected]>
perf probe: Fix --funcs to show correct symbols for offline module

Mark Rutland <[email protected]>
perf/core: Fix group {cpu,task} validation

Steven Rostedt (VMware) <[email protected]>
ftrace: Check for null ret_stack on profile function graph entry function

Chuck Lever <[email protected]>
nfsd: Limit end of page list when decoding NFSv4 WRITE

Ronnie Sahlberg <[email protected]>
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()

Sachin Prabhu <[email protected]>
cifs: Fix df output for users with quota limits

Nicholas Piggin <[email protected]>
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured

Steven Rostedt (VMware) <[email protected]>
tracing: Fix freeing of filter in create_filter() when set_str is false

Chunyu Hu <[email protected]>
tracing: Fix kmemleak in tracing_map_array_free()

Steven Rostedt (VMware) <[email protected]>
tracing: Call clear_boot_tracer() at lateinit_sync

Koji Matsuoka <[email protected]>
drm: rcar-du: Fix H/V sync signal polarity configuration

Koji Matsuoka <[email protected]>
drm: rcar-du: Fix display timing controller parameter

Laurent Pinchart <[email protected]>
drm: rcar-du: Fix crash in encoder failure error path

Maarten Lankhorst <[email protected]>
drm/atomic: If the atomic check fails, return its value first

Chris Wilson <[email protected]>
drm: Release driver tracking before making the object available again

Pavel Tatashin <[email protected]>
mm/memblock.c: reversed logic in memblock_discard()

Eric Biggers <[email protected]>
fork: fix incorrect fput of ->exe_file causing use-after-free

Eric Biggers <[email protected]>
mm/madvise.c: fix freeing of locked page with MADV_FREE

Ulf Hansson <[email protected]>
i2c: designware: Fix system suspend

Kirill A. Shutemov <[email protected]>
mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled

Alexey Brodkin <[email protected]>
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses

Takashi Sakamoto <[email protected]>
ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource

Takashi Iwai <[email protected]>
ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)

Takashi Iwai <[email protected]>
ALSA: core: Fix unexpected error at replacing user TLV

Joakim Tjernlund <[email protected]>
ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets

Paolo Bonzini <[email protected]>
KVM: x86: block guest protection keys unless the host has them enabled

Heiko Carstens <[email protected]>
KVM: s390: sthyi: fix specification exception detection

Heiko Carstens <[email protected]>
KVM: s390: sthyi: fix sthyi inline assembly

Masaki Ota <[email protected]>
Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad

KT Liao <[email protected]>
Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310

Aaron Ma <[email protected]>
Input: trackpoint - add new trackpoint firmware ID

Edward Cree <[email protected]>
bpf/verifier: fix min/max handling in BPF_SUB

Daniel Borkmann <[email protected]>
bpf: fix mixed signed/unsigned derived min/max value bounds

Daniel Borkmann <[email protected]>
bpf, verifier: fix alu ops against map_value{, _adj} register types

Daniel Borkmann <[email protected]>
bpf: adjust verifier heuristics

John Fastabend <[email protected]>
bpf, verifier: add additional patterns to evaluate_reg_imm_alu

Konstantin Khlebnikov <[email protected]>
net_sched: fix order of queue length updates in qdisc_replace()

Xin Long <[email protected]>
net: sched: fix NULL pointer dereference when action calls some targets

Colin Ian King <[email protected]>
irda: do not leak initialized list.dev to userspace

Huy Nguyen <[email protected]>
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled

Neal Cardwell <[email protected]>
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP

Wei Wang <[email protected]>
ipv6: repair fib6 tree in failure case

Wei Wang <[email protected]>
ipv6: reset fn->rr_ptr when replacing route

Eric Dumazet <[email protected]>
tipc: fix use-after-free

Alexander Potapenko <[email protected]>
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()

Colin Ian King <[email protected]>
nfp: fix infinite loop on umapping cleanup

Eric Dumazet <[email protected]>
ipv4: better IP_MAX_MTU enforcement

Eric Dumazet <[email protected]>
ptr_ring: use kmalloc_array()

Liping Zhang <[email protected]>
openvswitch: fix skb_panic due to the incorrect actions attrlen

Daniel Borkmann <[email protected]>
bpf: fix bpf_trace_printk on 32 bit archs

Konstantin Khlebnikov <[email protected]>
net_sched: remove warning from qdisc_hash_add

Konstantin Khlebnikov <[email protected]>
net_sched/sfq: update hierarchical backlog when drop packet

Eric Dumazet <[email protected]>
ipv4: fix NULL dereference in free_fib_info_rcu()

Eric Dumazet <[email protected]>
dccp: defer ccid_hc_tx_delete() at dismantle time

Eric Dumazet <[email protected]>
dccp: purge write queue in dccp_destroy_sock()

Eric Dumazet <[email protected]>
af_key: do not use GFP_KERNEL in atomic contexts

Tushar Dave <[email protected]>
sparc64: remove unnecessary log message


-------------

Diffstat:

Makefile | 4 +-
arch/arc/include/asm/cache.h | 2 +
arch/arc/mm/cache.c | 13 +-
arch/powerpc/include/asm/mmu_context.h | 20 +-
arch/powerpc/include/asm/pgtable-be-types.h | 1 +
arch/powerpc/include/asm/pgtable-types.h | 1 +
arch/s390/kvm/sthyi.c | 7 +-
arch/sparc/kernel/pci_sun4v.c | 2 -
arch/x86/events/intel/rapl.c | 58 +++---
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/kvm/cpuid.c | 2 +-
drivers/acpi/apei/ghes.c | 1 +
drivers/acpi/ec.c | 17 +-
drivers/acpi/internal.h | 1 -
drivers/acpi/ioapic.c | 6 +
drivers/acpi/scan.c | 1 -
drivers/android/binder.c | 19 +-
drivers/gpu/drm/drm_atomic.c | 5 +-
drivers/gpu/drm/drm_gem.c | 6 +-
drivers/gpu/drm/rcar-du/rcar_du_crtc.c | 6 +-
drivers/gpu/drm/rcar-du/rcar_du_kms.c | 10 +-
drivers/i2c/busses/i2c-designware-platdrv.c | 14 +-
.../iio/common/hid-sensors/hid-sensor-trigger.c | 8 +-
drivers/iio/imu/adis16480.c | 2 +-
drivers/input/mouse/alps.c | 41 +++-
drivers/input/mouse/alps.h | 8 +
drivers/input/mouse/elan_i2c_core.c | 1 +
drivers/input/mouse/trackpoint.c | 3 +-
drivers/input/mouse/trackpoint.h | 3 +-
drivers/leds/trigger/ledtrig-heartbeat.c | 31 ----
drivers/net/ethernet/mellanox/mlx4/main.c | 4 +-
.../net/ethernet/netronome/nfp/nfp_net_common.c | 3 +-
drivers/ntb/ntb_transport.c | 62 ++-----
drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 +
fs/cifs/dir.c | 18 +-
fs/cifs/smb2pdu.c | 4 +-
fs/nfsd/nfs4xdr.c | 6 +-
include/asm-generic/vmlinux.lds.h | 38 ++--
include/linux/bpf_verifier.h | 1 +
include/linux/cpuhotplug.h | 1 -
include/linux/fs.h | 4 +-
include/linux/ptr_ring.h | 9 +-
include/linux/skb_array.h | 3 +-
include/net/ip.h | 4 +-
include/net/sch_generic.h | 5 +-
kernel/bpf/verifier.c | 206 ++++++++++++++++++---
kernel/events/core.c | 39 ++--
kernel/fork.c | 1 +
kernel/time/timer.c | 50 ++++-
kernel/trace/bpf_trace.c | 34 +++-
kernel/trace/ftrace.c | 4 +
kernel/trace/trace.c | 2 +-
kernel/trace/trace_events_filter.c | 4 +
kernel/trace/tracing_map.c | 11 +-
mm/madvise.c | 2 +-
mm/memblock.c | 2 +-
mm/shmem.c | 4 +-
net/bluetooth/bnep/core.c | 11 +-
net/bluetooth/cmtp/core.c | 17 +-
net/bluetooth/hidp/core.c | 33 ++--
net/dccp/proto.c | 19 +-
net/ipv4/fib_semantics.c | 12 +-
net/ipv4/route.c | 2 +-
net/ipv4/tcp_input.c | 3 +-
net/ipv6/ip6_fib.c | 26 +--
net/irda/af_irda.c | 2 +-
net/key/af_key.c | 48 ++---
net/netfilter/nf_nat_core.c | 17 +-
net/openvswitch/actions.c | 1 +
net/openvswitch/datapath.c | 7 +-
net/openvswitch/datapath.h | 2 +
net/sched/act_ipt.c | 2 +
net/sched/sch_api.c | 3 -
net/sched/sch_sfq.c | 5 +-
net/sctp/ipv6.c | 2 +
net/sunrpc/svcsock.c | 22 ++-
net/tipc/netlink_compat.c | 6 +-
sound/core/control.c | 2 +-
sound/firewire/iso-resources.c | 7 +-
sound/pci/hda/patch_conexant.c | 1 +
sound/usb/quirks.c | 9 +-
tools/perf/util/probe-event.c | 25 +--
tools/testing/selftests/ntb/ntb_test.sh | 6 +-
83 files changed, 711 insertions(+), 398 deletions(-)



2017-08-28 08:10:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 03/84] dccp: purge write queue in dccp_destroy_sock()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 7749d4ff88d31b0be17c8683143135adaaadc6a7 ]

syzkaller reported that DCCP could have a non empty
write queue at dismantle time.

WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:180
__warn+0x1c4/0x1d9 kernel/panic.c:541
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
sock_release+0x8d/0x1e0 net/socket.c:597
sock_close+0x16/0x20 net/socket.c:1126
__fput+0x327/0x7e0 fs/file_table.c:210
____fput+0x15/0x20 fs/file_table.c:246
task_work_run+0x18a/0x260 kernel/task_work.c:116
exit_task_work include/linux/task_work.h:21 [inline]
do_exit+0xa32/0x1b10 kernel/exit.c:865
do_group_exit+0x149/0x400 kernel/exit.c:969
get_signal+0x7e8/0x17e0 kernel/signal.c:2330
do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/proto.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -201,10 +201,7 @@ void dccp_destroy_sock(struct sock *sk)
{
struct dccp_sock *dp = dccp_sk(sk);

- /*
- * DCCP doesn't use sk_write_queue, just sk_send_head
- * for retransmissions
- */
+ __skb_queue_purge(&sk->sk_write_queue);
if (sk->sk_send_head != NULL) {
kfree_skb(sk->sk_send_head);
sk->sk_send_head = NULL;


2017-08-28 08:10:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 04/84] dccp: defer ccid_hc_tx_delete() at dismantle time

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 120e9dabaf551c6dc03d3a10a1f026376cb1811c ]

syszkaller team reported another problem in DCCP [1]

Problem here is that the structure holding RTO timer
(ccid2_hc_tx_rto_expire() handler) is freed too soon.

We can not use del_timer_sync() to cancel the timer
since this timer wants to grab socket lock (that would risk a dead lock)

Solution is to defer the freeing of memory when all references to
the socket were released. Socket timers do own a reference, so this
should fix the issue.

[1]

==================================================================
BUG: KASAN: use-after-free in ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
Read of size 4 at addr ffff8801d2660540 by task kworker/u4:7/3365

CPU: 1 PID: 3365 Comm: kworker/u4:7 Not tainted 4.13.0-rc4+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound call_usermodehelper_exec_work
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
call_timer_fn+0x233/0x830 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307 [inline]
__run_timers+0x7fd/0xb90 kernel/time/timer.c:1601
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
__do_softirq+0x2f5/0xba3 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:638 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:1044
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:702
RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:824 [inline]
RIP: 0010:__raw_write_unlock_irq include/linux/rwlock_api_smp.h:267 [inline]
RIP: 0010:_raw_write_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:343
RSP: 0018:ffff8801cd50eaa8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
RAX: dffffc0000000000 RBX: ffffffff85a090c0 RCX: 0000000000000006
RDX: 1ffffffff0b595f3 RSI: 1ffff1003962f989 RDI: ffffffff85acaf98
RBP: ffff8801cd50eab0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801cc96ea60
R13: dffffc0000000000 R14: ffff8801cc96e4c0 R15: ffff8801cc96e4c0
</IRQ>
release_task+0xe9e/0x1a40 kernel/exit.c:220
wait_task_zombie kernel/exit.c:1162 [inline]
wait_consider_task+0x29b8/0x33c0 kernel/exit.c:1389
do_wait_thread kernel/exit.c:1452 [inline]
do_wait+0x441/0xa90 kernel/exit.c:1523
kernel_wait4+0x1f5/0x370 kernel/exit.c:1665
SYSC_wait4+0x134/0x140 kernel/exit.c:1677
SyS_wait4+0x2c/0x40 kernel/exit.c:1673
call_usermodehelper_exec_sync kernel/kmod.c:286 [inline]
call_usermodehelper_exec_work+0x1a0/0x2c0 kernel/kmod.c:323
process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2097
worker_thread+0x223/0x1860 kernel/workqueue.c:2231
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425

Allocated by task 21267:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc+0x127/0x750 mm/slab.c:3561
ccid_new+0x20e/0x390 net/dccp/ccid.c:151
dccp_hdlr_ccid+0x27/0x140 net/dccp/feat.c:44
__dccp_feat_activate+0x142/0x2a0 net/dccp/feat.c:344
dccp_feat_activate_values+0x34e/0xa90 net/dccp/feat.c:1538
dccp_rcv_request_sent_state_process net/dccp/input.c:472 [inline]
dccp_rcv_state_process+0xed1/0x1620 net/dccp/input.c:677
dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
sk_backlog_rcv include/net/sock.h:911 [inline]
__release_sock+0x124/0x360 net/core/sock.c:2269
release_sock+0xa4/0x2a0 net/core/sock.c:2784
inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
__inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
SYSC_connect+0x204/0x470 net/socket.c:1642
SyS_connect+0x24/0x30 net/socket.c:1623
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3049:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3763
ccid_hc_tx_delete+0xc5/0x100 net/dccp/ccid.c:190
dccp_destroy_sock+0x1d1/0x2b0 net/dccp/proto.c:225
inet_csk_destroy_sock+0x166/0x3f0 net/ipv4/inet_connection_sock.c:833
dccp_done+0xb7/0xd0 net/dccp/proto.c:145
dccp_time_wait+0x13d/0x300 net/dccp/minisocks.c:72
dccp_rcv_reset+0x1d1/0x5b0 net/dccp/input.c:160
dccp_rcv_state_process+0x8fc/0x1620 net/dccp/input.c:663
dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
sk_backlog_rcv include/net/sock.h:911 [inline]
__sk_receive_skb+0x33e/0xc00 net/core/sock.c:521
dccp_v4_rcv+0xef1/0x1c00 net/dccp/ipv4.c:871
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:248 [inline]
ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:477 [inline]
ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:248 [inline]
ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
__netif_receive_skb_core+0x19af/0x33d0 net/core/dev.c:4417
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4455
process_backlog+0x203/0x740 net/core/dev.c:5130
napi_poll net/core/dev.c:5527 [inline]
net_rx_action+0x792/0x1910 net/core/dev.c:5593
__do_softirq+0x2f5/0xba3 kernel/softirq.c:284

The buggy address belongs to the object at ffff8801d2660100
which belongs to the cache ccid2_hc_tx_sock of size 1240
The buggy address is located 1088 bytes inside of
1240-byte region [ffff8801d2660100, ffff8801d26605d8)
The buggy address belongs to the page:
page:ffffea0007499800 count:1 mapcount:0 mapping:ffff8801d2660100 index:0x0 compound_mapcount: 0
flags: 0x200000000008100(slab|head)
raw: 0200000000008100 ffff8801d2660100 0000000000000000 0000000100000005
raw: ffffea00075271a0 ffffea0007538820 ffff8801d3aef9c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801d2660400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801d2660480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801d2660500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801d2660580: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff8801d2660600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Gerrit Renker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/proto.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -24,6 +24,7 @@
#include <net/checksum.h>

#include <net/inet_sock.h>
+#include <net/inet_common.h>
#include <net/sock.h>
#include <net/xfrm.h>

@@ -170,6 +171,15 @@ const char *dccp_packet_name(const int t

EXPORT_SYMBOL_GPL(dccp_packet_name);

+static void dccp_sk_destruct(struct sock *sk)
+{
+ struct dccp_sock *dp = dccp_sk(sk);
+
+ ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk);
+ dp->dccps_hc_tx_ccid = NULL;
+ inet_sock_destruct(sk);
+}
+
int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized)
{
struct dccp_sock *dp = dccp_sk(sk);
@@ -179,6 +189,7 @@ int dccp_init_sock(struct sock *sk, cons
icsk->icsk_syn_retries = sysctl_dccp_request_retries;
sk->sk_state = DCCP_CLOSED;
sk->sk_write_space = dccp_write_space;
+ sk->sk_destruct = dccp_sk_destruct;
icsk->icsk_sync_mss = dccp_sync_mss;
dp->dccps_mss_cache = 536;
dp->dccps_rate_last = jiffies;
@@ -219,8 +230,7 @@ void dccp_destroy_sock(struct sock *sk)
dp->dccps_hc_rx_ackvec = NULL;
}
ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk);
- ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk);
- dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL;
+ dp->dccps_hc_rx_ccid = NULL;

/* clean up feature negotiation state */
dccp_feat_list_purge(&dp->dccps_featneg);


2017-08-28 08:10:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 06/84] net_sched/sfq: update hierarchical backlog when drop packet

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <[email protected]>


[ Upstream commit 325d5dc3f7e7c2840b65e4a2988c082c2c0025c5 ]

When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_sfq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -434,6 +434,7 @@ congestion_drop:
qdisc_drop(head, sch, to_free);

slot_queue_add(slot, skb);
+ qdisc_tree_reduce_backlog(sch, 0, delta);
return NET_XMIT_CN;
}

@@ -465,8 +466,10 @@ enqueue:
/* Return Congestion Notification only if we dropped a packet
* from this flow.
*/
- if (qlen != slot->qlen)
+ if (qlen != slot->qlen) {
+ qdisc_tree_reduce_backlog(sch, 0, dropped - qdisc_pkt_len(skb));
return NET_XMIT_CN;
+ }

/* As we dropped a packet, better let upper stack know this */
qdisc_tree_reduce_backlog(sch, 1, dropped);


2017-08-28 08:10:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 09/84] openvswitch: fix skb_panic due to the incorrect actions attrlen

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Liping Zhang <[email protected]>


[ Upstream commit 494bea39f3201776cdfddc232705f54a0bd210c4 ]

For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8148be82>] skb_put+0x43/0x44
[<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
[<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
[<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
[<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
[<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
[<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
[<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
[<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
[<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
[<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
[...]

Also we can find that the actions_len is much little than the orig_len:
crash> struct sw_flow_actions 0xffff8812f539d000
struct sw_flow_actions {
rcu = {
next = 0xffff8812f5398800,
func = 0xffffe3b00035db32
},
orig_len = 1384,
actions_len = 592,
actions = 0xffff8812f539d01c
}

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee <[email protected]>
Signed-off-by: Liping Zhang <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/openvswitch/actions.c | 1 +
net/openvswitch/datapath.c | 7 ++++---
net/openvswitch/datapath.h | 2 ++
3 files changed, 7 insertions(+), 3 deletions(-)

--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1240,6 +1240,7 @@ int ovs_execute_actions(struct datapath
goto out;
}

+ OVS_CB(skb)->acts_origlen = acts->orig_len;
err = do_execute_actions(dp, skb, key,
acts->actions, acts->actions_len);

--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -383,7 +383,7 @@ static int queue_gso_packets(struct data
}

static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
- unsigned int hdrlen)
+ unsigned int hdrlen, int actions_attrlen)
{
size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
@@ -400,7 +400,7 @@ static size_t upcall_msg_size(const stru

/* OVS_PACKET_ATTR_ACTIONS */
if (upcall_info->actions_len)
- size += nla_total_size(upcall_info->actions_len);
+ size += nla_total_size(actions_attrlen);

/* OVS_PACKET_ATTR_MRU */
if (upcall_info->mru)
@@ -467,7 +467,8 @@ static int queue_userspace_packet(struct
else
hlen = skb->len;

- len = upcall_msg_size(upcall_info, hlen - cutlen);
+ len = upcall_msg_size(upcall_info, hlen - cutlen,
+ OVS_CB(skb)->acts_origlen);
user_skb = genlmsg_new(len, GFP_ATOMIC);
if (!user_skb) {
err = -ENOMEM;
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -100,12 +100,14 @@ struct datapath {
* @input_vport: The original vport packet came in on. This value is cached
* when a packet is received by OVS.
* @mru: The maximum received fragement size; 0 if the packet is not
+ * @acts_origlen: The netlink size of the flow actions applied to this skb.
* @cutlen: The number of bytes from the packet end to be removed.
* fragmented.
*/
struct ovs_skb_cb {
struct vport *input_vport;
u16 mru;
+ u16 acts_origlen;
u32 cutlen;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)


2017-08-28 08:10:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 08/84] bpf: fix bpf_trace_printk on 32 bit archs

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>


[ Upstream commit 88a5c690b66110ad255380d8f629c629cf6ca559 ]

James reported that on MIPS32 bpf_trace_printk() is currently
broken while MIPS64 works fine:

bpf_trace_printk() uses conditional operators to attempt to
pass different types to __trace_printk() depending on the
format operators. This doesn't work as intended on 32-bit
architectures where u32 and long are passed differently to
u64, since the result of C conditional operators follows the
"usual arithmetic conversions" rules, such that the values
passed to __trace_printk() will always be u64 [causing issues
later in the va_list handling for vscnprintf()].

For example the samples/bpf/tracex5 test printed lines like
below on MIPS32, where the fd and buf have come from the u64
fd argument, and the size from the buf argument:

[...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688)

Instead of this:

[...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)

One way to get it working is to expand various combinations
of argument types into 8 different combinations for 32 bit
and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
as well that it resolves the issue.

Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: James Hogan <[email protected]>
Tested-by: James Hogan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/trace/bpf_trace.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)

--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -203,10 +203,36 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt
fmt_cnt++;
}

- return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3);
+/* Horrid workaround for getting va_list handling working with different
+ * argument type combinations generically for 32 and 64 bit archs.
+ */
+#define __BPF_TP_EMIT() __BPF_ARG3_TP()
+#define __BPF_TP(...) \
+ __trace_printk(1 /* Fake ip will not be printed. */, \
+ fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_TP(...) \
+ ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_TP(arg1, ##__VA_ARGS__) \
+ : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
+ : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_TP(...) \
+ ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__) \
+ : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__) \
+ : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
+
+#define __BPF_ARG3_TP(...) \
+ ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__) \
+ : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__) \
+ : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
+
+ return __BPF_TP_EMIT();
}

static const struct bpf_func_proto bpf_trace_printk_proto = {


2017-08-28 08:10:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 11/84] ipv4: better IP_MAX_MTU enforcement

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit c780a049f9bf442314335372c9abc4548bfe3e44 ]

While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.

gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/

This is a problem because device mtu can be changed on other cpus or
threads.

While this patch does not fix the issue I am working on, it is
probably worth addressing it.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/ip.h | 4 ++--
net/ipv4/route.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -339,7 +339,7 @@ static inline unsigned int ip_dst_mtu_ma
!forwarding)
return dst_mtu(dst);

- return min(dst->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
}

static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
@@ -351,7 +351,7 @@ static inline unsigned int ip_skb_dst_mt
return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
}

- return min(skb_dst(skb)->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
}

u32 ip_idents_reserve(u32 hash, int segs);
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1247,7 +1247,7 @@ static unsigned int ipv4_mtu(const struc
if (mtu)
return mtu;

- mtu = dst->dev->mtu;
+ mtu = READ_ONCE(dst->dev->mtu);

if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
if (rt->rt_uses_gateway && mtu > 576)


2017-08-28 08:10:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 21/84] net_sched: fix order of queue length updates in qdisc_replace()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <[email protected]>


[ Upstream commit 68a66d149a8c78ec6720f268597302883e48e9fa ]

This important to call qdisc_tree_reduce_backlog() after changing queue
length. Parent qdisc should deactivate class in ->qlen_notify() called from
qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.

Missed class deactivations leads to crashes/warnings at picking packets
from empty qdisc and corrupting state at reactivating this class in future.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper")
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/sch_generic.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -768,8 +768,11 @@ static inline struct Qdisc *qdisc_replac
old = *pold;
*pold = new;
if (old != NULL) {
- qdisc_tree_reduce_backlog(old, old->q.qlen, old->qstats.backlog);
+ unsigned int qlen = old->q.qlen;
+ unsigned int backlog = old->qstats.backlog;
+
qdisc_reset(old);
+ qdisc_tree_reduce_backlog(old, qlen, backlog);
}
sch_tree_unlock(sch);



2017-08-28 08:10:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 23/84] bpf: adjust verifier heuristics

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>


[ Upstream commit 3c2ce60bdd3d57051bf85615deec04a694473840 ]

Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -139,7 +139,7 @@ struct bpf_verifier_stack_elem {
struct bpf_verifier_stack_elem *next;
};

-#define BPF_COMPLEXITY_LIMIT_INSNS 65536
+#define BPF_COMPLEXITY_LIMIT_INSNS 98304
#define BPF_COMPLEXITY_LIMIT_STACK 1024

struct bpf_call_arg_meta {
@@ -2452,6 +2452,7 @@ peek_stack:
env->explored_states[t + 1] = STATE_LIST_MARK;
} else {
/* conditional jump with two edges */
+ env->explored_states[t] = STATE_LIST_MARK;
ret = push_insn(t, t + 1, FALLTHROUGH, env);
if (ret == 1)
goto peek_stack;
@@ -2610,6 +2611,12 @@ static bool states_equal(struct bpf_veri
rcur->type != NOT_INIT))
continue;

+ /* Don't care about the reg->id in this case. */
+ if (rold->type == PTR_TO_MAP_VALUE_OR_NULL &&
+ rcur->type == PTR_TO_MAP_VALUE_OR_NULL &&
+ rold->map_ptr == rcur->map_ptr)
+ continue;
+
if (rold->type == PTR_TO_PACKET && rcur->type == PTR_TO_PACKET &&
compare_ptrs_to_packet(rold, rcur))
continue;
@@ -2744,6 +2751,9 @@ static int do_check(struct bpf_verifier_
goto process_bpf_exit;
}

+ if (need_resched())
+ cond_resched();
+
if (log_level && do_print_state) {
verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
print_verifier_state(&env->cur_state);


2017-08-28 08:11:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 25/84] bpf: fix mixed signed/unsigned derived min/max value bounds

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>


[ Upstream commit 4cabc5b186b5427b9ee5a7495172542af105f02b ]

Edward reported that there's an issue in min/max value bounds
tracking when signed and unsigned compares both provide hints
on limits when having unknown variables. E.g. a program such
as the following should have been rejected:

0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff8a94cda93400
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
11: (65) if r1 s> 0x1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
14: (b7) r0 = 0
15: (95) exit

What happens is that in the first part ...

8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3

... r1 carries an unsigned value, and is compared as unsigned
against a register carrying an immediate. Verifier deduces in
reg_set_min_max() that since the compare is unsigned and operation
is greater than (>), that in the fall-through/false case, r1's
minimum bound must be 0 and maximum bound must be r2. Latter is
larger than the bound and thus max value is reset back to being
'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
'R1=inv,min_value=0'. The subsequent test ...

11: (65) if r1 s> 0x1 goto pc+2

... is a signed compare of r1 with immediate value 1. Here,
verifier deduces in reg_set_min_max() that since the compare
is signed this time and operation is greater than (>), that
in the fall-through/false case, we can deduce that r1's maximum
bound must be 1, meaning with prior test, we result in r1 having
the following state: R1=inv,min_value=0,max_value=1. Given that
the actual value this holds is -8, the bounds are wrongly deduced.
When this is being added to r0 which holds the map_value(_adj)
type, then subsequent store access in above case will go through
check_mem_access() which invokes check_map_access_adj(), that
will then probe whether the map memory is in bounds based
on the min_value and max_value as well as access size since
the actual unknown value is min_value <= x <= max_value; commit
fce366a9dd0d ("bpf, verifier: fix alu ops against map_value{,
_adj} register types") provides some more explanation on the
semantics.

It's worth to note in this context that in the current code,
min_value and max_value tracking are used for two things, i)
dynamic map value access via check_map_access_adj() and since
commit 06c1c049721a ("bpf: allow helpers access to variable memory")
ii) also enforced at check_helper_mem_access() when passing a
memory address (pointer to packet, map value, stack) and length
pair to a helper and the length in this case is an unknown value
defining an access range through min_value/max_value in that
case. The min_value/max_value tracking is /not/ used in the
direct packet access case to track ranges. However, the issue
also affects case ii), for example, the following crafted program
based on the same principle must be rejected as well:

0: (b7) r2 = 0
1: (bf) r3 = r10
2: (07) r3 += -512
3: (7a) *(u64 *)(r10 -16) = -8
4: (79) r4 = *(u64 *)(r10 -16)
5: (b7) r6 = -1
6: (2d) if r4 > r6 goto pc+5
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
7: (65) if r4 s> 0x1 goto pc+4
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
R10=fp
8: (07) r4 += 1
9: (b7) r5 = 0
10: (6a) *(u16 *)(r10 -512) = 0
11: (85) call bpf_skb_load_bytes#26
12: (b7) r0 = 0
13: (95) exit

Meaning, while we initialize the max_value stack slot that the
verifier thinks we access in the [1,2] range, in reality we
pass -7 as length which is interpreted as u32 in the helper.
Thus, this issue is relevant also for the case of helper ranges.
Resetting both bounds in check_reg_overflow() in case only one
of them exceeds limits is also not enough as similar test can be
created that uses values which are within range, thus also here
learned min value in r1 is incorrect when mixed with later signed
test to create a range:

0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff880ad081fa00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (65) if r1 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
14: (b7) r0 = 0
15: (95) exit

This leaves us with two options for fixing this: i) to invalidate
all prior learned information once we switch signed context, ii)
to track min/max signed and unsigned boundaries separately as
done in [0]. (Given latter introduces major changes throughout
the whole verifier, it's rather net-next material, thus this
patch follows option i), meaning we can derive bounds either
from only signed tests or only unsigned tests.) There is still the
case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
operations, meaning programs like the following where boundaries
on the reg get mixed in context later on when bounds are merged
on the dst reg must get rejected, too:

0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff89b2bf87ce00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+6
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (b7) r7 = 1
12: (65) if r7 s> 0x0 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
13: (b7) r0 = 0
14: (95) exit

from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
15: (0f) r7 += r1
16: (65) if r7 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
17: (0f) r0 += r7
18: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
19: (b7) r0 = 0
20: (95) exit

Meaning, in adjust_reg_min_max_vals() we must also reset range
values on the dst when src/dst registers have mixed signed/
unsigned derived min/max value bounds with one unbounded value
as otherwise they can be added together deducing false boundaries.
Once both boundaries are established from either ALU ops or
compare operations w/o mixing signed/unsigned insns, then they
can safely be added to other regs also having both boundaries
established. Adding regs with one unbounded side to a map value
where the bounded side has been learned w/o mixing ops is
possible, but the resulting map value won't recover from that,
meaning such op is considered invalid on the time of actual
access. Invalid bounds are set on the dst reg in case i) src reg,
or ii) in case dst reg already had them. The only way to recover
would be to perform i) ALU ops but only 'add' is allowed on map
value types or ii) comparisons, but these are disallowed on
pointers in case they span a range. This is fine as only BPF_JEQ
and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
which potentially turn them into PTR_TO_MAP_VALUE type depending
on the branch, so only here min/max value cannot be invalidated
for them.

In terms of state pruning, value_from_signed is considered
as well in states_equal() when dealing with adjusted map values.
With regards to breaking existing programs, there is a small
risk, but use-cases are rather quite narrow where this could
occur and mixing compares probably unlikely.

Joint work with Josef and Edward.

[0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Reported-by: Edward Cree <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/bpf_verifier.h | 1
kernel/bpf/verifier.c | 110 +++++++++++++++++++++++++++++++++++++------
2 files changed, 97 insertions(+), 14 deletions(-)

--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -40,6 +40,7 @@ struct bpf_reg_state {
*/
s64 min_value;
u64 max_value;
+ bool value_from_signed;
};

enum bpf_stack_slot_type {
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -682,12 +682,13 @@ static int check_ctx_access(struct bpf_v
return -EACCES;
}

-static bool is_pointer_value(struct bpf_verifier_env *env, int regno)
+static bool __is_pointer_value(bool allow_ptr_leaks,
+ const struct bpf_reg_state *reg)
{
- if (env->allow_ptr_leaks)
+ if (allow_ptr_leaks)
return false;

- switch (env->cur_state.regs[regno].type) {
+ switch (reg->type) {
case UNKNOWN_VALUE:
case CONST_IMM:
return false;
@@ -696,6 +697,11 @@ static bool is_pointer_value(struct bpf_
}
}

+static bool is_pointer_value(struct bpf_verifier_env *env, int regno)
+{
+ return __is_pointer_value(env->allow_ptr_leaks, &env->cur_state.regs[regno]);
+}
+
static int check_ptr_alignment(struct bpf_verifier_env *env,
struct bpf_reg_state *reg, int off, int size)
{
@@ -1592,10 +1598,24 @@ static void adjust_reg_min_max_vals(stru
}

/* We don't know anything about what was done to this register, mark it
- * as unknown.
+ * as unknown. Also, if both derived bounds came from signed/unsigned
+ * mixed compares and one side is unbounded, we cannot really do anything
+ * with them as boundaries cannot be trusted. Thus, arithmetic of two
+ * regs of such kind will get invalidated bounds on the dst side.
*/
- if (min_val == BPF_REGISTER_MIN_RANGE &&
- max_val == BPF_REGISTER_MAX_RANGE) {
+ if ((min_val == BPF_REGISTER_MIN_RANGE &&
+ max_val == BPF_REGISTER_MAX_RANGE) ||
+ (BPF_SRC(insn->code) == BPF_X &&
+ ((min_val != BPF_REGISTER_MIN_RANGE &&
+ max_val == BPF_REGISTER_MAX_RANGE) ||
+ (min_val == BPF_REGISTER_MIN_RANGE &&
+ max_val != BPF_REGISTER_MAX_RANGE) ||
+ (dst_reg->min_value != BPF_REGISTER_MIN_RANGE &&
+ dst_reg->max_value == BPF_REGISTER_MAX_RANGE) ||
+ (dst_reg->min_value == BPF_REGISTER_MIN_RANGE &&
+ dst_reg->max_value != BPF_REGISTER_MAX_RANGE)) &&
+ regs[insn->dst_reg].value_from_signed !=
+ regs[insn->src_reg].value_from_signed)) {
reset_reg_range_values(regs, insn->dst_reg);
return;
}
@@ -1939,38 +1959,63 @@ static void reg_set_min_max(struct bpf_r
struct bpf_reg_state *false_reg, u64 val,
u8 opcode)
{
+ bool value_from_signed = true;
+ bool is_range = true;
+
switch (opcode) {
case BPF_JEQ:
/* If this is false then we know nothing Jon Snow, but if it is
* true then we know for sure.
*/
true_reg->max_value = true_reg->min_value = val;
+ is_range = false;
break;
case BPF_JNE:
/* If this is true we know nothing Jon Snow, but if it is false
* we know the value for sure;
*/
false_reg->max_value = false_reg->min_value = val;
+ is_range = false;
break;
case BPF_JGT:
- /* Unsigned comparison, the minimum value is 0. */
- false_reg->min_value = 0;
+ value_from_signed = false;
+ /* fallthrough */
case BPF_JSGT:
+ if (true_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(true_reg, 0);
+ if (false_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(false_reg, 0);
+ if (opcode == BPF_JGT) {
+ /* Unsigned comparison, the minimum value is 0. */
+ false_reg->min_value = 0;
+ }
/* If this is false then we know the maximum val is val,
* otherwise we know the min val is val+1.
*/
false_reg->max_value = val;
+ false_reg->value_from_signed = value_from_signed;
true_reg->min_value = val + 1;
+ true_reg->value_from_signed = value_from_signed;
break;
case BPF_JGE:
- /* Unsigned comparison, the minimum value is 0. */
- false_reg->min_value = 0;
+ value_from_signed = false;
+ /* fallthrough */
case BPF_JSGE:
+ if (true_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(true_reg, 0);
+ if (false_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(false_reg, 0);
+ if (opcode == BPF_JGE) {
+ /* Unsigned comparison, the minimum value is 0. */
+ false_reg->min_value = 0;
+ }
/* If this is false then we know the maximum value is val - 1,
* otherwise we know the mimimum value is val.
*/
false_reg->max_value = val - 1;
+ false_reg->value_from_signed = value_from_signed;
true_reg->min_value = val;
+ true_reg->value_from_signed = value_from_signed;
break;
default:
break;
@@ -1978,6 +2023,12 @@ static void reg_set_min_max(struct bpf_r

check_reg_overflow(false_reg);
check_reg_overflow(true_reg);
+ if (is_range) {
+ if (__is_pointer_value(false, false_reg))
+ reset_reg_range_values(false_reg, 0);
+ if (__is_pointer_value(false, true_reg))
+ reset_reg_range_values(true_reg, 0);
+ }
}

/* Same as above, but for the case that dst_reg is a CONST_IMM reg and src_reg
@@ -1987,39 +2038,64 @@ static void reg_set_min_max_inv(struct b
struct bpf_reg_state *false_reg, u64 val,
u8 opcode)
{
+ bool value_from_signed = true;
+ bool is_range = true;
+
switch (opcode) {
case BPF_JEQ:
/* If this is false then we know nothing Jon Snow, but if it is
* true then we know for sure.
*/
true_reg->max_value = true_reg->min_value = val;
+ is_range = false;
break;
case BPF_JNE:
/* If this is true we know nothing Jon Snow, but if it is false
* we know the value for sure;
*/
false_reg->max_value = false_reg->min_value = val;
+ is_range = false;
break;
case BPF_JGT:
- /* Unsigned comparison, the minimum value is 0. */
- true_reg->min_value = 0;
+ value_from_signed = false;
+ /* fallthrough */
case BPF_JSGT:
+ if (true_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(true_reg, 0);
+ if (false_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(false_reg, 0);
+ if (opcode == BPF_JGT) {
+ /* Unsigned comparison, the minimum value is 0. */
+ true_reg->min_value = 0;
+ }
/*
* If this is false, then the val is <= the register, if it is
* true the register <= to the val.
*/
false_reg->min_value = val;
+ false_reg->value_from_signed = value_from_signed;
true_reg->max_value = val - 1;
+ true_reg->value_from_signed = value_from_signed;
break;
case BPF_JGE:
- /* Unsigned comparison, the minimum value is 0. */
- true_reg->min_value = 0;
+ value_from_signed = false;
+ /* fallthrough */
case BPF_JSGE:
+ if (true_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(true_reg, 0);
+ if (false_reg->value_from_signed != value_from_signed)
+ reset_reg_range_values(false_reg, 0);
+ if (opcode == BPF_JGE) {
+ /* Unsigned comparison, the minimum value is 0. */
+ true_reg->min_value = 0;
+ }
/* If this is false then constant < register, if it is true then
* the register < constant.
*/
false_reg->min_value = val + 1;
+ false_reg->value_from_signed = value_from_signed;
true_reg->max_value = val;
+ true_reg->value_from_signed = value_from_signed;
break;
default:
break;
@@ -2027,6 +2103,12 @@ static void reg_set_min_max_inv(struct b

check_reg_overflow(false_reg);
check_reg_overflow(true_reg);
+ if (is_range) {
+ if (__is_pointer_value(false, false_reg))
+ reset_reg_range_values(false_reg, 0);
+ if (__is_pointer_value(false, true_reg))
+ reset_reg_range_values(true_reg, 0);
+ }
}

static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,


2017-08-28 08:11:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 26/84] bpf/verifier: fix min/max handling in BPF_SUB

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Edward Cree <[email protected]>


[ Upstream commit 9305706c2e808ae59f1eb201867f82f1ddf6d7a6 ]

We have to subtract the src max from the dst min, and vice-versa, since
(e.g.) the smallest result comes from the largest subtrahend.

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Edward Cree <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1624,10 +1624,12 @@ static void adjust_reg_min_max_vals(stru
* do our normal operations to the register, we need to set the values
* to the min/max since they are undefined.
*/
- if (min_val == BPF_REGISTER_MIN_RANGE)
- dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
- if (max_val == BPF_REGISTER_MAX_RANGE)
- dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ if (opcode != BPF_SUB) {
+ if (min_val == BPF_REGISTER_MIN_RANGE)
+ dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+ if (max_val == BPF_REGISTER_MAX_RANGE)
+ dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ }

switch (opcode) {
case BPF_ADD:
@@ -1637,10 +1639,17 @@ static void adjust_reg_min_max_vals(stru
dst_reg->max_value += max_val;
break;
case BPF_SUB:
+ /* If one of our values was at the end of our ranges, then the
+ * _opposite_ value in the dst_reg goes to the end of our range.
+ */
+ if (min_val == BPF_REGISTER_MIN_RANGE)
+ dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ if (max_val == BPF_REGISTER_MAX_RANGE)
+ dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
- dst_reg->min_value -= min_val;
+ dst_reg->min_value -= max_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
- dst_reg->max_value -= max_val;
+ dst_reg->max_value -= min_val;
break;
case BPF_MUL:
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)


2017-08-28 08:11:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 31/84] KVM: s390: sthyi: fix specification exception detection

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <[email protected]>

commit 857b8de96795646c5891cf44ae6fb19b9ff74bf9 upstream.

sthyi should only generate a specification exception if the function
code is zero and the response buffer is not on a 4k boundary.

The current code would also test for unknown function codes if the
response buffer, that is currently only defined for function code 0,
is not on a 4k boundary and incorrectly inject a specification
exception instead of returning with condition code 3 and return code 4
(unsupported function code).

Fix this by moving the boundary check.

Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/s390/kvm/sthyi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/s390/kvm/sthyi.c
+++ b/arch/s390/kvm/sthyi.c
@@ -422,7 +422,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
VCPU_EVENT(vcpu, 3, "STHYI: fc: %llu addr: 0x%016llx", code, addr);
trace_kvm_s390_handle_sthyi(vcpu, code, addr);

- if (reg1 == reg2 || reg1 & 1 || reg2 & 1 || addr & ~PAGE_MASK)
+ if (reg1 == reg2 || reg1 & 1 || reg2 & 1)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);

if (code & 0xffff) {
@@ -430,6 +430,9 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
goto out;
}

+ if (addr & ~PAGE_MASK)
+ return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+
/*
* If the page has not yet been faulted in, we want to do that
* now and not after all the expensive calculations.


2017-08-28 08:11:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 12/84] nfp: fix infinite loop on umapping cleanup

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Colin Ian King <[email protected]>


[ Upstream commit eac2c68d663effb077210218788952b5a0c1f60e ]

The while loop that performs the dma page unmapping never decrements
index counter f and hence loops forever. Fix this with a pre-decrement
on f.

Detected by CoverityScan, CID#1357309 ("Infinite loop")

Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -871,8 +871,7 @@ static int nfp_net_tx(struct sk_buff *sk
return NETDEV_TX_OK;

err_unmap:
- --f;
- while (f >= 0) {
+ while (--f >= 0) {
frag = &skb_shinfo(skb)->frags[f];
dma_unmap_page(&nn->pdev->dev,
tx_ring->txbufs[wr_idx].dma_addr,


2017-08-28 08:11:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 33/84] ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joakim Tjernlund <[email protected]>

commit 07b3b5e9ed807a0d2077319b8e43a42e941db818 upstream.

These headsets reports a lot of: cannot set freq 44100 to ep 0x81
and need a small delay between sample rate settings, just like
Zoom R16/24. Add both headsets to the Zoom R16/24 quirk for
a 1 ms delay between control msgs.

Signed-off-by: Joakim Tjernlund <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/usb/quirks.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -1309,10 +1309,13 @@ void snd_usb_ctl_msg_quirk(struct usb_de
&& (requesttype & USB_TYPE_MASK) == USB_TYPE_CLASS)
mdelay(20);

- /* Zoom R16/24 needs a tiny delay here, otherwise requests like
- * get/set frequency return as failed despite actually succeeding.
+ /* Zoom R16/24, Logitech H650e, Jabra 550a needs a tiny delay here,
+ * otherwise requests like get/set frequency return as failed despite
+ * actually succeeding.
*/
- if (chip->usb_id == USB_ID(0x1686, 0x00dd) &&
+ if ((chip->usb_id == USB_ID(0x1686, 0x00dd) ||
+ chip->usb_id == USB_ID(0x046d, 0x0a46) ||
+ chip->usb_id == USB_ID(0x0b0e, 0x0349)) &&
(requesttype & USB_TYPE_MASK) == USB_TYPE_CLASS)
mdelay(1);
}


2017-08-28 08:11:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 39/84] i2c: designware: Fix system suspend

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ulf Hansson <[email protected]>

commit a23318feeff662c8d25d21623daebdd2e55ec221 upstream.

The commit 8503ff166504 ("i2c: designware: Avoid unnecessary resuming
during system suspend"), may suggest to the PM core to try out the so
called direct_complete path for system sleep. In this path, the PM core
treats a runtime suspended device as it's already in a proper low power
state for system sleep, which makes it skip calling the system sleep
callbacks for the device, except for the ->prepare() and the ->complete()
callbacks.

However, the PM core may unset the direct_complete flag for a parent
device, in case its child device are being system suspended before. In this
scenario, the PM core invokes the system sleep callbacks, no matter if the
device is runtime suspended or not.

Particularly in cases of an existing i2c slave device, the above path is
triggered, which breaks the assumption that the i2c device is always
runtime resumed whenever the dw_i2c_plat_suspend() is being called.

More precisely, dw_i2c_plat_suspend() calls clk_core_disable() and
clk_core_unprepare(), for an already disabled/unprepared clock, leading to
a splat in the log about clocks calls being wrongly balanced and breaking
system sleep.

To still allow the direct_complete path in cases when it's possible, but
also to keep the fix simple, let's runtime resume the i2c device in the
->suspend() callback, before continuing to put the device into low power
state.

Note, in cases when the i2c device is attached to the ACPI PM domain, this
problem doesn't occur, because ACPI's ->suspend() callback, assigned to
acpi_subsys_suspend(), already calls pm_runtime_resume() for the device.

It should also be noted that this change does not fix commit 8503ff166504
("i2c: designware: Avoid unnecessary resuming during system suspend").
Because for the non-ACPI case, the system sleep support was already broken
prior that point.

Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: John Stultz <[email protected]>
Tested-by: Jarkko Nikula <[email protected]>
Acked-by: Jarkko Nikula <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/i2c/busses/i2c-designware-platdrv.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/drivers/i2c/busses/i2c-designware-platdrv.c
+++ b/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -319,7 +319,7 @@ static void dw_i2c_plat_complete(struct
#endif

#ifdef CONFIG_PM
-static int dw_i2c_plat_suspend(struct device *dev)
+static int dw_i2c_plat_runtime_suspend(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev);
@@ -343,11 +343,21 @@ static int dw_i2c_plat_resume(struct dev
return 0;
}

+#ifdef CONFIG_PM_SLEEP
+static int dw_i2c_plat_suspend(struct device *dev)
+{
+ pm_runtime_resume(dev);
+ return dw_i2c_plat_runtime_suspend(dev);
+}
+#endif
+
static const struct dev_pm_ops dw_i2c_dev_pm_ops = {
.prepare = dw_i2c_plat_prepare,
.complete = dw_i2c_plat_complete,
SET_SYSTEM_SLEEP_PM_OPS(dw_i2c_plat_suspend, dw_i2c_plat_resume)
- SET_RUNTIME_PM_OPS(dw_i2c_plat_suspend, dw_i2c_plat_resume, NULL)
+ SET_RUNTIME_PM_OPS(dw_i2c_plat_runtime_suspend,
+ dw_i2c_plat_resume,
+ NULL)
};

#define DW_I2C_DEV_PMOPS (&dw_i2c_dev_pm_ops)


2017-08-28 08:11:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 40/84] mm/madvise.c: fix freeing of locked page with MADV_FREE

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 263630e8d176d87308481ebdcd78ef9426739c6b upstream.

If madvise(..., MADV_FREE) split a transparent hugepage, it called
put_page() before unlock_page().

This was wrong because put_page() can free the page, e.g. if a
concurrent madvise(..., MADV_DONTNEED) has removed it from the memory
mapping. put_page() then rightfully complained about freeing a locked
page.

Fix this by moving the unlock_page() before put_page().

This bug was found by syzkaller, which encountered the following splat:

BUG: Bad page state in process syzkaller412798 pfn:1bd800
page:ffffea0006f60000 count:0 mapcount:0 mapping: (null) index:0x20a00
flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1(locked)
Modules linked in:
CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
bad_page+0x230/0x2b0 mm/page_alloc.c:565
free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
free_pages_check mm/page_alloc.c:952 [inline]
free_pages_prepare mm/page_alloc.c:1043 [inline]
free_pcp_prepare mm/page_alloc.c:1068 [inline]
free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
__put_single_page mm/swap.c:79 [inline]
__put_page+0xfb/0x160 mm/swap.c:113
put_page include/linux/mm.h:814 [inline]
madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:108 [inline]
walk_p4d_range mm/pagewalk.c:134 [inline]
walk_pgd_range mm/pagewalk.c:160 [inline]
__walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
walk_page_range+0x200/0x470 mm/pagewalk.c:326
madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
madvise_dontneed_free mm/madvise.c:555 [inline]
madvise_vma mm/madvise.c:664 [inline]
SYSC_madvise mm/madvise.c:832 [inline]
SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

#define _GNU_SOURCE
#include <pthread.h>
#include <sys/mman.h>
#include <unistd.h>

#define MADV_FREE 8
#define PAGE_SIZE 4096

static void *mapping;
static const size_t mapping_size = 0x1000000;

static void *madvise_thrproc(void *arg)
{
madvise(mapping, mapping_size, (long)arg);
}

int main(void)
{
pthread_t t[2];

for (;;) {
mapping = mmap(NULL, mapping_size, PROT_WRITE,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

munmap(mapping + mapping_size / 2, PAGE_SIZE);

pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
munmap(mapping, mapping_size);
}
}

Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_DEBUG_VM=y are needed.

Google Bug Id: 64696096

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -331,8 +331,8 @@ static int madvise_free_pte_range(pmd_t
pte_offset_map_lock(mm, pmd, addr, &ptl);
goto out;
}
- put_page(page);
unlock_page(page);
+ put_page(page);
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte--;
addr -= PAGE_SIZE;


2017-08-28 08:11:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 44/84] drm/atomic: If the atomic check fails, return its value first

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Maarten Lankhorst <[email protected]>

commit a0ffc51e20e90e0c1c2491de2b4b03f48b6caaba upstream.

The last part of drm_atomic_check_only is testing whether we need to
fail with -EINVAL when modeset is not allowed, but forgets to return
the value when atomic_check() fails first.

This results in -EDEADLK being replaced by -EINVAL, and the sanity
check in drm_modeset_drop_locks kicks in:

[ 308.531734] ------------[ cut here ]------------
[ 308.531791] WARNING: CPU: 0 PID: 1886 at drivers/gpu/drm/drm_modeset_lock.c:217 drm_modeset_drop_locks+0x33/0xc0 [drm]
[ 308.531828] Modules linked in:
[ 308.532050] CPU: 0 PID: 1886 Comm: kms_atomic Tainted: G U W 4.13.0-rc5-patser+ #5225
[ 308.532082] Hardware name: NUC5i7RYB, BIOS RYBDWi35.86A.0246.2015.0309.1355 03/09/2015
[ 308.532124] task: ffff8800cd9dae00 task.stack: ffff8800ca3b8000
[ 308.532168] RIP: 0010:drm_modeset_drop_locks+0x33/0xc0 [drm]
[ 308.532189] RSP: 0018:ffff8800ca3bf980 EFLAGS: 00010282
[ 308.532211] RAX: dffffc0000000000 RBX: ffff8800ca3bfaf8 RCX: 0000000013a171e6
[ 308.532235] RDX: 1ffff10019477f69 RSI: ffffffffa8ba4fa0 RDI: ffff8800ca3bfb48
[ 308.532258] RBP: ffff8800ca3bf998 R08: 0000000000000000 R09: 0000000000000003
[ 308.532281] R10: 0000000079dbe066 R11: 00000000f760b34b R12: 0000000000000001
[ 308.532304] R13: dffffc0000000000 R14: 00000000ffffffea R15: ffff880096889680
[ 308.532328] FS: 00007ff00959cec0(0000) GS:ffff8800d4e00000(0000) knlGS:0000000000000000
[ 308.532359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 308.532380] CR2: 0000000000000008 CR3: 00000000ca2e3000 CR4: 00000000003406f0
[ 308.532402] Call Trace:
[ 308.532440] drm_mode_atomic_ioctl+0x19fa/0x1c00 [drm]
[ 308.532488] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532565] ? avc_has_extended_perms+0xc39/0xff0
[ 308.532593] ? lock_downgrade+0x610/0x610
[ 308.532640] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532680] drm_ioctl_kernel+0x154/0x1a0 [drm]
[ 308.532755] drm_ioctl+0x624/0x8f0 [drm]
[ 308.532858] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532976] ? drm_getunique+0x210/0x210 [drm]
[ 308.533061] do_vfs_ioctl+0xd92/0xe40
[ 308.533121] ? ioctl_preallocate+0x1b0/0x1b0
[ 308.533160] ? selinux_capable+0x20/0x20
[ 308.533191] ? do_fcntl+0x1b1/0xbf0
[ 308.533219] ? kasan_slab_free+0xa2/0xb0
[ 308.533249] ? f_getown+0x4b/0xa0
[ 308.533278] ? putname+0xcf/0xe0
[ 308.533309] ? security_file_ioctl+0x57/0x90
[ 308.533342] SyS_ioctl+0x4e/0x80
[ 308.533374] entry_SYSCALL_64_fastpath+0x18/0xad
[ 308.533405] RIP: 0033:0x7ff00779e4d7
[ 308.533431] RSP: 002b:00007fff66a043d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 308.533481] RAX: ffffffffffffffda RBX: 000000e7c7ca5910 RCX: 00007ff00779e4d7
[ 308.533560] RDX: 00007fff66a04430 RSI: 00000000c03864bc RDI: 0000000000000003
[ 308.533608] RBP: 00007ff007a5fb00 R08: 000000e7c7ca4620 R09: 000000e7c7ca5e60
[ 308.533647] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000070
[ 308.533685] R13: 0000000000000000 R14: 0000000000000000 R15: 000000e7c7ca5930
[ 308.533770] Code: ff df 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 c7
50 48 89 fa 48 c1 ea 03 80 3c 02 00 74 05 e8 94 d4 16 e7 48 83 7b 50 00
74 02 <0f> ff 4c 8d 6b 58 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1
[ 308.534086] ---[ end trace 77f11e53b1df44ad ]---

Solve this by adding the missing return.

This is also a bugfix because we could end up rejecting updates with
-EINVAL because of a early -EDEADLK, while if atomic_check ran to
completion it might have downgraded the modeset to a fastset.

Signed-off-by: Maarten Lankhorst <[email protected]>
Testcase: kms_atomic
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Fixes: d34f20d6e2f2 ("drm: Atomic modeset ioctl")
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/drm_atomic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1386,6 +1386,9 @@ int drm_atomic_check_only(struct drm_ato
if (config->funcs->atomic_check)
ret = config->funcs->atomic_check(state->dev, state);

+ if (ret)
+ return ret;
+
if (!state->allow_modeset) {
for_each_crtc_in_state(state, crtc, crtc_state, i) {
if (drm_atomic_crtc_needs_modeset(crtc_state)) {
@@ -1396,7 +1399,7 @@ int drm_atomic_check_only(struct drm_ato
}
}

- return ret;
+ return 0;
}
EXPORT_SYMBOL(drm_atomic_check_only);



2017-08-28 08:12:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 46/84] drm: rcar-du: Fix display timing controller parameter

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Koji Matsuoka <[email protected]>

commit 9cdced8a39c04cf798ddb2a27cb5952f7d39f633 upstream.

There is a bug in the setting of the DES (Display Enable Signal)
register. This current setting occurs 1 dot left shift. The DES
register should be set minus one value about the specifying value
with H/W specification. This patch corrects it.

Signed-off-by: Koji Matsuoka <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Thong Ho <[email protected]>
Signed-off-by: Nhan Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/rcar-du/rcar_du_crtc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
@@ -172,7 +172,7 @@ static void rcar_du_crtc_set_display_tim
mode->crtc_vsync_start - 1);
rcar_du_crtc_write(rcrtc, VCR, mode->crtc_vtotal - 1);

- rcar_du_crtc_write(rcrtc, DESR, mode->htotal - mode->hsync_start);
+ rcar_du_crtc_write(rcrtc, DESR, mode->htotal - mode->hsync_start - 1);
rcar_du_crtc_write(rcrtc, DEWR, mode->hdisplay);
}



2017-08-28 08:11:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 42/84] mm/memblock.c: reversed logic in memblock_discard()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>

commit 91b540f98872a206ea1c49e4aa6ea8eed0886644 upstream.

In recently introduced memblock_discard() there is a reversed logic bug.
Memory is freed of static array instead of dynamically allocated one.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3010f876500f ("mm: discard memblock data later")
Signed-off-by: Pavel Tatashin <[email protected]>
Reported-by: Woody Suwalski <[email protected]>
Tested-by: Woody Suwalski <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/memblock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -311,7 +311,7 @@ void __init memblock_discard(void)
__memblock_free_late(addr, size);
}

- if (memblock.memory.regions == memblock_memory_init_regions) {
+ if (memblock.memory.regions != memblock_memory_init_regions) {
addr = __pa(memblock.memory.regions);
size = PAGE_ALIGN(sizeof(struct memblock_region) *
memblock.memory.max);


2017-08-28 08:12:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 48/84] tracing: Call clear_boot_tracer() at lateinit_sync

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 4bb0f0e73c8c30917d169c4a0f1ac083690c545b upstream.

The clear_boot_tracer function is used to reset the default_bootup_tracer
string to prevent it from being accessed after boot, as it originally points
to init data. But since clear_boot_tracer() is called via the
init_lateinit() call, it races with the initcall for registering the hwlat
tracer. If someone adds "ftrace=hwlat" to the kernel command line, depending
on how the linker sets up the text, the saved command line may be cleared,
and the hwlat tracer never is initialized.

Simply have the clear_boot_tracer() be called by initcall_lateinit_sync() as
that's for tasks to be called after lateinit.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196551

Fixes: e7c15cd8a ("tracing: Added hardware latency tracer")
Reported-by: Zamir SUN <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7767,4 +7767,4 @@ __init static int clear_boot_tracer(void
}

fs_initcall(tracer_init_tracefs);
-late_initcall(clear_boot_tracer);
+late_initcall_sync(clear_boot_tracer);


2017-08-28 08:12:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 51/84] kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit cb87481ee89dbd6609e227afbf64900fb4e5c930 upstream.

The .data and .bss sections were modified in the generic linker script to
pull in sections named .data.<C identifier>, which are generated by gcc with
-ffunction-sections and -fdata-sections options.

The problem with this pattern is it can also match section names that Linux
defines explicitly, e.g., .data.unlikely. This can cause Linux sections to
get moved into the wrong place.

The way to avoid this is to use ".." separators for explicit section names
(the dot character is valid in a section name but not a C identifier).
However currently there are sections which don't follow this rule, so for
now just disable the wild card by default.

Example: http://marc.info/?l=linux-arm-kernel&m=150106824024221&w=2

Fixes: b67067f1176df ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/asm-generic/vmlinux.lds.h | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)

--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -60,6 +60,22 @@
#define ALIGN_FUNCTION() . = ALIGN(8)

/*
+ * LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections, which
+ * generates .data.identifier sections, which need to be pulled in with
+ * .data. We don't want to pull in .data..other sections, which Linux
+ * has defined. Same for text and bss.
+ */
+#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
+#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#define DATA_MAIN .data .data.[0-9a-zA-Z_]*
+#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
+#else
+#define TEXT_MAIN .text
+#define DATA_MAIN .data
+#define BSS_MAIN .bss
+#endif
+
+/*
* Align to a 32 byte boundary equal to the
* alignment gcc 4.5 uses for a struct
*/
@@ -198,12 +214,9 @@

/*
* .data section
- * LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections generates
- * .data.identifier which needs to be pulled in with .data, but don't want to
- * pull in .data..stuff which has its own requirements. Same for bss.
*/
#define DATA_DATA \
- *(.data .data.[0-9a-zA-Z_]*) \
+ *(DATA_MAIN) \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
MEM_KEEP(init.data) \
@@ -436,16 +449,17 @@
VMLINUX_SYMBOL(__security_initcall_end) = .; \
}

-/* .text section. Map to function alignment to avoid address changes
+/*
+ * .text section. Map to function alignment to avoid address changes
* during second ld run in second ld pass when generating System.map
- * LD_DEAD_CODE_DATA_ELIMINATION option enables -ffunction-sections generates
- * .text.identifier which needs to be pulled in with .text , but some
- * architectures define .text.foo which is not intended to be pulled in here.
- * Those enabling LD_DEAD_CODE_DATA_ELIMINATION must ensure they don't have
- * conflicting section names, and must pull in .text.[0-9a-zA-Z_]* */
+ *
+ * TEXT_MAIN here will match .text.fixup and .text.unlikely if dead
+ * code elimination is enabled, so these sections should be converted
+ * to use ".." first.
+ */
#define TEXT_TEXT \
ALIGN_FUNCTION(); \
- *(.text.hot .text .text.fixup .text.unlikely) \
+ *(.text.hot TEXT_MAIN .text.fixup .text.unlikely) \
*(.ref.text) \
MEM_KEEP(init.text) \
MEM_KEEP(exit.text) \
@@ -613,7 +627,7 @@
BSS_FIRST_SECTIONS \
*(.bss..page_aligned) \
*(.dynbss) \
- *(.bss .bss.[0-9a-zA-Z_]*) \
+ *(BSS_MAIN) \
*(COMMON) \
}



2017-08-28 08:12:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 14/84] tipc: fix use-after-free

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 5bfd37b4de5c98e86b12bd13be5aa46c7484a125 ]

syszkaller reported use-after-free in tipc [1]

When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.

[1]

==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115

CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
skb_push+0xd4/0xe0 net/core/skbuff.c:1466
tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000

Allocated by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
__alloc_skb+0xf1/0x740 net/core/skbuff.c:219
alloc_skb include/linux/skbuff.h:903 [inline]
tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3763
kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
__kfree_skb net/core/skbuff.c:682 [inline]
kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff8801c6e71dc0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 208 bytes inside of
224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
The buggy address belongs to the page:
page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/netlink_compat.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -258,13 +258,15 @@ static int tipc_nl_compat_dumpit(struct
arg = nlmsg_new(0, GFP_KERNEL);
if (!arg) {
kfree_skb(msg->rep);
+ msg->rep = NULL;
return -ENOMEM;
}

err = __tipc_nl_compat_dumpit(cmd, msg, arg);
- if (err)
+ if (err) {
kfree_skb(msg->rep);
-
+ msg->rep = NULL;
+ }
kfree_skb(arg);

return err;


2017-08-28 08:12:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 17/84] tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit cdbeb633ca71a02b7b63bfeb94994bf4e1a0b894 ]

In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Suggested-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3036,8 +3036,7 @@ void tcp_rearm_rto(struct sock *sk)
/* delta may not be positive if the socket is locked
* when the retrans timer fires and is rescheduled.
*/
- if (delta > 0)
- rto = delta;
+ rto = max(delta, 1);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto,
TCP_RTO_MAX);


2017-08-28 08:12:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 16/84] ipv6: repair fib6 tree in failure case

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Wang <[email protected]>


[ Upstream commit 348a4002729ccab8b888b38cbc099efa2f2a2036 ]

In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.

Here is the syzkaller reported general protection fault in fib6_locate:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
Call Trace:
[<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
[<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
[<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
[<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
[<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
[<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
[<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
[<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
[<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
[<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
[<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
[<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
[<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
[<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
[<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
[<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
[<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17

Note: there is no "Fixes" tag as this seems to be a bug introduced
very early.

Signed-off-by: Wei Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_fib.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1001,7 +1001,7 @@ int fib6_add(struct fib6_node *root, str
/* Create subtree root node */
sfn = node_alloc();
if (!sfn)
- goto st_failure;
+ goto failure;

sfn->leaf = info->nl_net->ipv6.ip6_null_entry;
atomic_inc(&info->nl_net->ipv6.ip6_null_entry->rt6i_ref);
@@ -1017,12 +1017,12 @@ int fib6_add(struct fib6_node *root, str

if (IS_ERR(sn)) {
/* If it is failed, discard just allocated
- root, and then (in st_failure) stale node
+ root, and then (in failure) stale node
in main tree.
*/
node_free(sfn);
err = PTR_ERR(sn);
- goto st_failure;
+ goto failure;
}

/* Now link new subtree to main tree */
@@ -1036,7 +1036,7 @@ int fib6_add(struct fib6_node *root, str

if (IS_ERR(sn)) {
err = PTR_ERR(sn);
- goto st_failure;
+ goto failure;
}
}

@@ -1078,22 +1078,22 @@ out:
atomic_inc(&pn->leaf->rt6i_ref);
}
#endif
- if (!(rt->dst.flags & DST_NOCACHE))
- dst_free(&rt->dst);
+ goto failure;
}
return err;

-#ifdef CONFIG_IPV6_SUBTREES
- /* Subtree creation failed, probably main tree node
- is orphan. If it is, shoot it.
+failure:
+ /* fn->leaf could be NULL if fn is an intermediate node and we
+ * failed to add the new route to it in both subtree creation
+ * failure and fib6_add_rt2node() failure case.
+ * In both cases, fib6_repair_tree() should be called to fix
+ * fn->leaf.
*/
-st_failure:
if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT)))
fib6_repair_tree(info->nl_net, fn);
if (!(rt->dst.flags & DST_NOCACHE))
dst_free(&rt->dst);
return err;
-#endif
}

/*


2017-08-28 08:12:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 63/84] netfilter: nat: fix src map lookup

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Westphal <[email protected]>

commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream.

When doing initial conversion to rhashtable I replaced the bucket
walk with a single rhashtable_lookup_fast().

When moving to rhlist I failed to properly walk the list of identical
tuples, but that is what is needed for this to work correctly.
The table contains the original tuples, so the reply tuples are all
distinct.

We currently decide that mapping is (not) in range only based on the
first entry, but in case its not we need to try the reply tuple of the
next entry until we either find an in-range mapping or we checked
all the entries.

This bug makes nat core attempt collision resolution while it might be
able to use the mapping as-is.

Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Reported-by: Jaco Kroon <[email protected]>
Tested-by: Jaco Kroon <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/netfilter/nf_nat_core.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -225,20 +225,21 @@ find_appropriate_src(struct net *net,
.tuple = tuple,
.zone = zone
};
- struct rhlist_head *hl;
+ struct rhlist_head *hl, *h;

hl = rhltable_lookup(&nf_nat_bysource_table, &key,
nf_nat_bysource_params);
- if (!hl)
- return 0;

- ct = container_of(hl, typeof(*ct), nat_bysource);
+ rhl_for_each_entry_rcu(ct, h, hl, nat_bysource) {
+ nf_ct_invert_tuplepr(result,
+ &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+ result->dst = tuple->dst;

- nf_ct_invert_tuplepr(result,
- &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- result->dst = tuple->dst;
+ if (in_range(l3proto, l4proto, result, range))
+ return 1;
+ }

- return in_range(l3proto, l4proto, result, range);
+ return 0;
}

/* For [FUTURE] fragmentation handling, we want the least-used


2017-08-28 08:12:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 62/84] Revert "leds: handle suspend/resume in heartbeat trigger"

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zhang Bo <[email protected]>

commit 436c4c45b5b9562b59cedbb51b7343ab4a6dd8cc upstream.

This reverts commit 5ab92a7cb82c66bf30685583a38a18538e3807db.

System cannot enter suspend mode because of heartbeat led trigger.
In autosleep_wq, try_to_suspend function will try to enter suspend
mode in specific period. it will get wakeup_count then call pm_notifier
chain callback function and freeze processes.
Heartbeat_pm_notifier is called and it call led_trigger_unregister to
change the trigger of led device to none. It will send uevent message
and the wakeup source count changed. As wakeup_count changed, suspend
will abort.

Fixes: 5ab92a7cb82c ("leds: handle suspend/resume in heartbeat trigger")
Signed-off-by: Zhang Bo <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Jacek Anaszewski <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/leds/trigger/ledtrig-heartbeat.c | 31 -------------------------------
1 file changed, 31 deletions(-)

--- a/drivers/leds/trigger/ledtrig-heartbeat.c
+++ b/drivers/leds/trigger/ledtrig-heartbeat.c
@@ -19,7 +19,6 @@
#include <linux/sched.h>
#include <linux/leds.h>
#include <linux/reboot.h>
-#include <linux/suspend.h>
#include "../leds.h"

static int panic_heartbeats;
@@ -155,30 +154,6 @@ static struct led_trigger heartbeat_led_
.deactivate = heartbeat_trig_deactivate,
};

-static int heartbeat_pm_notifier(struct notifier_block *nb,
- unsigned long pm_event, void *unused)
-{
- int rc;
-
- switch (pm_event) {
- case PM_SUSPEND_PREPARE:
- case PM_HIBERNATION_PREPARE:
- case PM_RESTORE_PREPARE:
- led_trigger_unregister(&heartbeat_led_trigger);
- break;
- case PM_POST_SUSPEND:
- case PM_POST_HIBERNATION:
- case PM_POST_RESTORE:
- rc = led_trigger_register(&heartbeat_led_trigger);
- if (rc)
- pr_err("could not re-register heartbeat trigger\n");
- break;
- default:
- break;
- }
- return NOTIFY_DONE;
-}
-
static int heartbeat_reboot_notifier(struct notifier_block *nb,
unsigned long code, void *unused)
{
@@ -193,10 +168,6 @@ static int heartbeat_panic_notifier(stru
return NOTIFY_DONE;
}

-static struct notifier_block heartbeat_pm_nb = {
- .notifier_call = heartbeat_pm_notifier,
-};
-
static struct notifier_block heartbeat_reboot_nb = {
.notifier_call = heartbeat_reboot_notifier,
};
@@ -213,14 +184,12 @@ static int __init heartbeat_trig_init(vo
atomic_notifier_chain_register(&panic_notifier_list,
&heartbeat_panic_nb);
register_reboot_notifier(&heartbeat_reboot_nb);
- register_pm_notifier(&heartbeat_pm_nb);
}
return rc;
}

static void __exit heartbeat_trig_exit(void)
{
- unregister_pm_notifier(&heartbeat_pm_nb);
unregister_reboot_notifier(&heartbeat_reboot_nb);
atomic_notifier_chain_unregister(&panic_notifier_list,
&heartbeat_panic_nb);


2017-08-28 08:12:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 68/84] binder: use group leader instead of open thread

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Todd Kjos <[email protected]>

commit c4ea41ba195d01c9af66fb28711a16cc97caa9c5 upstream.

The binder allocator assumes that the thread that
called binder_open will never die for the lifetime of
that proc. That thread is normally the group_leader,
however it may not be. Use the group_leader instead
of current.

Signed-off-by: Todd Kjos <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2972,8 +2972,8 @@ static int binder_open(struct inode *nod
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
if (proc == NULL)
return -ENOMEM;
- get_task_struct(current);
- proc->tsk = current;
+ get_task_struct(current->group_leader);
+ proc->tsk = current->group_leader;
INIT_LIST_HEAD(&proc->todo);
init_waitqueue_head(&proc->wait);
proc->default_priority = task_nice(current);


2017-08-28 08:12:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 71/84] iio: imu: adis16480: Fix acceleration scale factor for adis16480

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dragos Bogdan <[email protected]>

commit fdd0d32eb95f135041236a6885d9006315aa9a1d upstream.

According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
so the scale factor should be 10 instead of 5.

Signed-off-by: Dragos Bogdan <[email protected]>
Acked-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iio/imu/adis16480.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/iio/imu/adis16480.c
+++ b/drivers/iio/imu/adis16480.c
@@ -696,7 +696,7 @@ static const struct adis16480_chip_info
.gyro_max_val = IIO_RAD_TO_DEGREE(22500),
.gyro_max_scale = 450,
.accel_max_val = IIO_M_S_2_TO_G(12500),
- .accel_max_scale = 5,
+ .accel_max_scale = 10,
},
[ADIS16485] = {
.channels = adis16485_channels,


2017-08-28 08:13:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 54/84] nfsd: Limit end of page list when decoding NFSv4 WRITE

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chuck Lever <[email protected]>

commit fc788f64f1f3eb31e87d4f53bcf1ab76590d5838 upstream.

When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.

More critical, nfsd4_decode_write is failing to increment argp->pagelen
when it increments argp->pagelist. This can cause later xdr decoders
to assume more data is available than really is, which can cause server
crashes on malformed requests.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfsd/nfs4xdr.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -129,7 +129,7 @@ static void next_decode_page(struct nfsd
argp->p = page_address(argp->pagelist[0]);
argp->pagelist++;
if (argp->pagelen < PAGE_SIZE) {
- argp->end = argp->p + (argp->pagelen>>2);
+ argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
argp->pagelen = 0;
} else {
argp->end = argp->p + (PAGE_SIZE>>2);
@@ -1246,9 +1246,7 @@ nfsd4_decode_write(struct nfsd4_compound
argp->pagelen -= pages * PAGE_SIZE;
len -= pages * PAGE_SIZE;

- argp->p = (__be32 *)page_address(argp->pagelist[0]);
- argp->pagelist++;
- argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
+ next_decode_page(argp);
}
argp->p += XDR_QUADLEN(len);



2017-08-28 08:13:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 73/84] staging: rtl8188eu: add RNX-N150NUB support

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Charles Milette <[email protected]>

commit f299aec6ebd747298e35934cff7709c6b119ca52 upstream.

Add support for USB Device Rosewill RNX-N150NUB.
VendorID: 0x0bda, ProductID: 0xffef

Signed-off-by: Charles Milette <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
@@ -45,6 +45,7 @@ static struct usb_device_id rtw_usb_id_t
{USB_DEVICE(0x2001, 0x3311)}, /* DLink GO-USB-N150 REV B1 */
{USB_DEVICE(0x2357, 0x010c)}, /* TP-Link TL-WN722N v2 */
{USB_DEVICE(0x0df6, 0x0076)}, /* Sitecom N150 v2 */
+ {USB_DEVICE(USB_VENDER_ID_REALTEK, 0xffef)}, /* Rosewill RNX-N150NUB */
{} /* Terminating entry */
};



2017-08-28 08:13:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 76/84] ntb_transport: fix bug calculating num_qps_mw

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Logan Gunthorpe <[email protected]>

commit 8e8496e0e9564b66165f5219a4e8ed20b0d3fc6b upstream.

A divide by zero error occurs if qp_count is less than mw_count because
num_qps_mw is calculated to be zero. The calculation appears to be
incorrect.

The requirement is for num_qps_mw to be set to qp_count / mw_count
with any remainder divided among the earlier mws.

For example, if mw_count is 5 and qp_count is 12 then mws 0 and 1
will have 3 qps per window and mws 2 through 4 will have 2 qps per window.
Thus, when mw_num < qp_count % mw_count, num_qps_mw is 1 higher
than when mw_num >= qp_count.

Signed-off-by: Logan Gunthorpe <[email protected]>
Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Acked-by: Allen Hubbe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ntb/ntb_transport.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -625,7 +625,7 @@ static int ntb_transport_setup_qp_mw(str
if (!mw->virt_addr)
return -ENOMEM;

- if (qp_count % mw_count && mw_num + 1 < qp_count / mw_count)
+ if (mw_num < qp_count % mw_count)
num_qps_mw = qp_count / mw_count + 1;
else
num_qps_mw = qp_count / mw_count;
@@ -1002,7 +1002,7 @@ static int ntb_transport_init_queue(stru
qp->event_handler = NULL;
ntb_qp_link_down_reset(qp);

- if (qp_count % mw_count && mw_num + 1 < qp_count / mw_count)
+ if (mw_num < qp_count % mw_count)
num_qps_mw = qp_count / mw_count + 1;
else
num_qps_mw = qp_count / mw_count;


2017-08-28 08:13:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 80/84] ntb: transport shouldnt disable link due to bogus values in SPADs

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Jiang <[email protected]>

commit f3fd2afed8eee91620d05b69ab94c14793c849d7 upstream.

It seems that under certain scenarios the SPAD can have bogus values caused
by an agent (i.e. BIOS or other software) that is not the kernel driver, and
that causes memory window setup failure. This should not cause the link to
be disabled because if we do that, the driver will never recover again. We
have verified in testing that this issue happens and prevents proper link
recovery.

Signed-off-by: Dave Jiang <[email protected]>
Acked-by: Allen Hubbe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Fixes: 84f766855f61 ("ntb: stop link work when we do not have memory")
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ntb/ntb_transport.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -921,10 +921,8 @@ out1:
ntb_free_mw(nt, i);

/* if there's an actual failure, we should just bail */
- if (rc < 0) {
- ntb_link_disable(ndev);
+ if (rc < 0)
return;
- }

out:
if (ntb_link_is_up(ndev, NULL, NULL) == 1)


2017-08-28 08:13:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 84/84] powerpc/mm: Ensure cpumask update is ordered

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <[email protected]>

commit 1a92a80ad386a1a6e3b36d576d52a1a456394b70 upstream.

There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: [email protected]
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Add comments in the code]
[mpe: Backport to 4.12, minor context change]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/mmu_context.h | 20 +++++++++++++++++++-
arch/powerpc/include/asm/pgtable-be-types.h | 1 +
arch/powerpc/include/asm/pgtable-types.h | 1 +
3 files changed, 21 insertions(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -75,9 +75,27 @@ static inline void switch_mm_irqs_off(st
struct task_struct *tsk)
{
/* Mark this context has been used on the new CPU */
- if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next)))
+ if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));

+ /*
+ * This full barrier orders the store to the cpumask above vs
+ * a subsequent operation which allows this CPU to begin loading
+ * translations for next.
+ *
+ * When using the radix MMU that operation is the load of the
+ * MMU context id, which is then moved to SPRN_PID.
+ *
+ * For the hash MMU it is either the first load from slb_cache
+ * in switch_slb(), and/or the store of paca->mm_ctx_id in
+ * copy_mm_to_paca().
+ *
+ * On the read side the barrier is in pte_xchg(), which orders
+ * the store to the PTE vs the load of mm_cpumask.
+ */
+ smp_mb();
+ }
+
/* 32-bit keeps track of the current PGDIR in the thread struct */
#ifdef CONFIG_PPC32
tsk->thread.pgdir = next->pgd;
--- a/arch/powerpc/include/asm/pgtable-be-types.h
+++ b/arch/powerpc/include/asm/pgtable-be-types.h
@@ -87,6 +87,7 @@ static inline bool pte_xchg(pte_t *ptep,
unsigned long *p = (unsigned long *)ptep;
__be64 prev;

+ /* See comment in switch_mm_irqs_off() */
prev = (__force __be64)__cmpxchg_u64(p, (__force unsigned long)pte_raw(old),
(__force unsigned long)pte_raw(new));

--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -62,6 +62,7 @@ static inline bool pte_xchg(pte_t *ptep,
{
unsigned long *p = (unsigned long *)ptep;

+ /* See comment in switch_mm_irqs_off() */
return pte_val(old) == __cmpxchg_u64(p, pte_val(old), pte_val(new));
}
#endif


2017-08-28 08:13:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 60/84] x86/mm: Fix use-after-free of ldt_struct

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit ccd5b3235180eef3cfec337df1c8554ab151b5cc upstream.

The following commit:

39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")

renamed init_new_context() to init_new_context_ldt() and added a new
init_new_context() which calls init_new_context_ldt(). However, the
error code of init_new_context_ldt() was ignored. Consequently, if a
memory allocation in alloc_ldt_struct() failed during a fork(), the
->context.ldt of the new task remained the same as that of the old task
(due to the memcpy() in dup_mm()). ldt_struct's are not intended to be
shared, so a use-after-free occurred after one task exited.

Fix the bug by making init_new_context() pass through the error code of
init_new_context_ldt().

This bug was found by syzkaller, which encountered the following splat:

BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710

CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
__mmdrop+0xe9/0x530 kernel/fork.c:889
mmdrop include/linux/sched/mm.h:42 [inline]
exec_mmap fs/exec.c:1061 [inline]
flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
search_binary_handler+0x142/0x6b0 fs/exec.c:1652
exec_binprm fs/exec.c:1694 [inline]
do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
do_execve+0x31/0x40 fs/exec.c:1860
call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Allocated by task 3700:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
kmalloc include/linux/slab.h:493 [inline]
alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3700:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
__mmdrop+0xe9/0x530 kernel/fork.c:889
mmdrop include/linux/sched/mm.h:42 [inline]
__mmput kernel/fork.c:916 [inline]
mmput+0x541/0x6e0 kernel/fork.c:927
copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
copy_process kernel/fork.c:1546 [inline]
_do_fork+0x1ef/0xfb0 kernel/fork.c:2025
SYSC_clone kernel/fork.c:2135 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2129
do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
return_from_SYSCALL_64+0x0/0x7a

Here is a C reproducer:

#include <asm/ldt.h>
#include <pthread.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static void *fork_thread(void *_arg)
{
fork();
}

int main(void)
{
struct user_desc desc = { .entry_number = 8191 };

syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));

for (;;) {
if (fork() == 0) {
pthread_t t;

srand(getpid());
pthread_create(&t, NULL, fork_thread, NULL);
usleep(rand() % 10000);
syscall(__NR_exit_group, 0);
}
wait(NULL);
}
}

Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
may use vmalloc() to allocate a large ->entries array, and after
commit:

5d17a73a2ebe ("vmalloc: back off when the current task is killed")

it is possible for userspace to fail a task's vmalloc() by
sending a fatal signal, e.g. via exit_group(). It would be more
difficult to reproduce this bug on kernels without that commit.

This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.

Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/mmu_context.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -116,9 +116,7 @@ static inline int init_new_context(struc
mm->context.execute_only_pkey = -1;
}
#endif
- init_new_context_ldt(tsk, mm);
-
- return 0;
+ return init_new_context_ldt(tsk, mm);
}
static inline void destroy_context(struct mm_struct *mm)
{


2017-08-28 08:13:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 61/84] net: sunrpc: svcsock: fix NULL-pointer exception

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vadim Lomovtsev <[email protected]>

commit eebe53e87f97975ee58a21693e44797608bf679c upstream.

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: http://www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700: ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [< (null)>] (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -408,6 +408,9 @@ static void svc_data_ready(struct sock *
dprintk("svc: socket %p(inet %p), busy=%d\n",
svsk, sk,
test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_odata(sk);
if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -424,6 +427,9 @@ static void svc_write_space(struct sock
if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_owspace(sk);
svc_xprt_enqueue(&svsk->sk_xprt);
}
@@ -748,8 +754,12 @@ static void svc_tcp_listen_data_ready(st
dprintk("svc: socket %p TCP (listen) state change %d\n",
sk, sk->sk_state);

- if (svsk)
+ if (svsk) {
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_odata(sk);
+ }
+
/*
* This callback may called twice when a new connection
* is established as a child socket inherits everything
@@ -782,6 +792,8 @@ static void svc_tcp_state_change(struct
if (!svsk)
printk("svc: socket %p: no user data\n", sk);
else {
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_ostate(sk);
if (sk->sk_state != TCP_ESTABLISHED) {
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
@@ -1368,12 +1380,18 @@ static struct svc_sock *svc_setup_socket
return ERR_PTR(err);
}

- inet->sk_user_data = svsk;
svsk->sk_sock = sock;
svsk->sk_sk = inet;
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
+ /*
+ * This barrier is necessary in order to prevent race condition
+ * with svc_data_ready(), svc_listen_data_ready() and others
+ * when calling callbacks above.
+ */
+ wmb();
+ inet->sk_user_data = svsk;

/* Initialize the socket */
if (sock->type == SOCK_DGRAM)


2017-08-28 08:40:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 52/84] cifs: Fix df output for users with quota limits

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sachin Prabhu <[email protected]>

commit 42bec214d8bd432be6d32a1acb0a9079ecd4d142 upstream.

The df for a SMB2 share triggers a GetInfo call for
FS_FULL_SIZE_INFORMATION. The values returned are used to populate
struct statfs.

The problem is that none of the information returned by the call
contains the total blocks available on the filesystem. Instead we use
the blocks available to the user ie. quota limitation when filling out
statfs.f_blocks. The information returned does contain Actual free units
on the filesystem and is used to populate statfs.f_bfree. For users with
quota enabled, it can lead to situations where the total free space
reported is more than the total blocks on the system ending up with df
reports like the following

# df -h /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a

To fix this problem, we instead populate both statfs.f_bfree with the
same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This
is similar to what is done already in the code for cifs and df now
reports the quota information for the user used to mount the share.

# df --si /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a

Signed-off-by: Sachin Prabhu <[email protected]>
Signed-off-by: Pierguido Lambri <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/smb2pdu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2930,8 +2930,8 @@ copy_fs_info_to_kstatfs(struct smb2_fs_f
kst->f_bsize = le32_to_cpu(pfs_inf->BytesPerSector) *
le32_to_cpu(pfs_inf->SectorsPerAllocationUnit);
kst->f_blocks = le64_to_cpu(pfs_inf->TotalAllocationUnits);
- kst->f_bfree = le64_to_cpu(pfs_inf->ActualAvailableAllocationUnits);
- kst->f_bavail = le64_to_cpu(pfs_inf->CallerAvailableAllocationUnits);
+ kst->f_bfree = kst->f_bavail =
+ le64_to_cpu(pfs_inf->CallerAvailableAllocationUnits);
return;
}



2017-08-28 08:41:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 59/84] timers: Fix excessive granularity of new timers after a nohz idle

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit 2fe59f507a65dbd734b990a11ebc7488f6f87a24 upstream.

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
it is marked not-idle and before the timer tick on this CPU, where a
timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

max avg std
upstream 506.0 1.20 4.68
patched 2.0 1.08 0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

max avg std
upstream 9.0 1.05 0.19
patched 2.0 1.04 0.11

Fixes: Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Tested-by: David Miller <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Stephen Boyd <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: John Stultz <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/timer.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 41 insertions(+), 9 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -201,6 +201,7 @@ struct timer_base {
bool migration_enabled;
bool nohz_active;
bool is_idle;
+ bool must_forward_clk;
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -891,13 +892,19 @@ get_target_base(struct timer_base *base,

static inline void forward_timer_base(struct timer_base *base)
{
- unsigned long jnow = READ_ONCE(jiffies);
+ unsigned long jnow;

/*
- * We only forward the base when it's idle and we have a delta between
- * base clock and jiffies.
+ * We only forward the base when we are idle or have just come out of
+ * idle (must_forward_clk logic), and have a delta between base clock
+ * and jiffies. In the common case, run_timers will take care of it.
*/
- if (!base->is_idle || (long) (jnow - base->clk) < 2)
+ if (likely(!base->must_forward_clk))
+ return;
+
+ jnow = READ_ONCE(jiffies);
+ base->must_forward_clk = base->is_idle;
+ if ((long)(jnow - base->clk) < 2)
return;

/*
@@ -973,6 +980,11 @@ __mod_timer(struct timer_list *timer, un
* same array bucket then just return:
*/
if (timer_pending(timer)) {
+ /*
+ * The downside of this optimization is that it can result in
+ * larger granularity than you would get from adding a new
+ * timer with this expiry.
+ */
if (timer->expires == expires)
return 1;

@@ -983,6 +995,7 @@ __mod_timer(struct timer_list *timer, un
* dequeue/enqueue dance.
*/
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);

clk = base->clk;
idx = calc_wheel_index(expires, clk);
@@ -999,6 +1012,7 @@ __mod_timer(struct timer_list *timer, un
}
} else {
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);
}

timer_stats_timer_set_start_info(timer);
@@ -1028,12 +1042,10 @@ __mod_timer(struct timer_list *timer, un
spin_lock(&base->lock);
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | base->cpu);
+ forward_timer_base(base);
}
}

- /* Try to forward a stale timer base clock */
- forward_timer_base(base);
-
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
@@ -1150,6 +1162,7 @@ void add_timer_on(struct timer_list *tim
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | cpu);
}
+ forward_timer_base(base);

debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
@@ -1538,10 +1551,16 @@ u64 get_next_timer_interrupt(unsigned lo
if (!is_max_delta)
expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
/*
- * If we expect to sleep more than a tick, mark the base idle:
+ * If we expect to sleep more than a tick, mark the base idle.
+ * Also the tick is stopped so any added timer must forward
+ * the base clk itself to keep granularity small. This idle
+ * logic is only maintained for the BASE_STD base, deferrable
+ * timers may still see large granularity skew (by design).
*/
- if ((expires - basem) > TICK_NSEC)
+ if ((expires - basem) > TICK_NSEC) {
+ base->must_forward_clk = true;
base->is_idle = true;
+ }
}
spin_unlock(&base->lock);

@@ -1651,6 +1670,19 @@ static __latent_entropy void run_timer_s
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

+ /*
+ * must_forward_clk must be cleared before running timers so that any
+ * timer functions that call mod_timer will not try to forward the
+ * base. idle trcking / clock forwarding logic is only used with
+ * BASE_STD timers.
+ *
+ * The deferrable base does not do idle tracking at all, so we do
+ * not forward it. This can result in very large variations in
+ * granularity for deferrable timers, but they can be deferred for
+ * long periods due to idle.
+ */
+ base->must_forward_clk = false;
+
__run_timers(base);
if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));


2017-08-28 08:13:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 57/84] perf probe: Fix --funcs to show correct symbols for offline module

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <[email protected]>

commit eebc509b20881b92d62e317b2c073e57c5f200f0 upstream.

Fix --funcs (-F) option to show correct symbols for offline module.
Since previous perf-probe uses machine__findnew_module_map() for offline
module, even if user passes a module file (with full path) which is for
other architecture, perf-probe always tries to load symbol map for
current kernel module.

This fix uses dso__new_map() to load the map from given binary as same
as a map for user applications.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Krister Johansen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/perf/util/probe-event.c | 25 ++++++-------------------
1 file changed, 6 insertions(+), 19 deletions(-)

--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -163,7 +163,7 @@ static struct map *kernel_get_module_map

/* A file path -- this is an offline module */
if (module && strchr(module, '/'))
- return machine__findnew_module_map(host_machine, 0, module);
+ return dso__new_map(module);

if (!module)
module = "kernel";
@@ -173,6 +173,7 @@ static struct map *kernel_get_module_map
if (strncmp(pos->dso->short_name + 1, module,
pos->dso->short_name_len - 2) == 0 &&
module[pos->dso->short_name_len - 2] == '\0') {
+ map__get(pos);
return pos;
}
}
@@ -188,15 +189,6 @@ struct map *get_target_map(const char *t
return kernel_get_module_map(target);
}

-static void put_target_map(struct map *map, bool user)
-{
- if (map && user) {
- /* Only the user map needs to be released */
- map__put(map);
- }
-}
-
-
static int convert_exec_to_group(const char *exec, char **result)
{
char *ptr1, *ptr2, *exec_copy;
@@ -412,7 +404,7 @@ static int find_alternative_probe_point(
}

out:
- put_target_map(map, uprobes);
+ map__put(map);
return ret;

}
@@ -2944,7 +2936,7 @@ static int find_probe_trace_events_from_
}

out:
- put_target_map(map, pev->uprobes);
+ map__put(map);
free(syms);
return ret;

@@ -3437,10 +3429,7 @@ int show_available_funcs(const char *tar
return ret;

/* Get a symbol map */
- if (user)
- map = dso__new_map(target);
- else
- map = kernel_get_module_map(target);
+ map = get_target_map(target, user);
if (!map) {
pr_err("Failed to get a map for %s\n", (target) ? : "kernel");
return -EINVAL;
@@ -3472,9 +3461,7 @@ int show_available_funcs(const char *tar
}

end:
- if (user) {
- map__put(map);
- }
+ map__put(map);
exit_probe_symbol_maps();

return ret;


2017-08-28 08:41:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 58/84] perf/x86/intel/rapl: Make package handling more robust

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit dd86e373e09fb16b83e8adf5c48c421a4ca76468 upstream.

The package management code in RAPL relies on package mapping being
available before a CPU is started. This changed with:

9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")

because the ACPI/BIOS information turned out to be unreliable, but that
left RAPL in broken state. This was not noticed because on a regular boot
all CPUs are online before RAPL is initialized.

A possible fix would be to reintroduce the mess which allocates a package
data structure in CPU prepare and when it turns out to already exist in
starting throw it away later in the CPU online callback. But that's a
horrible hack and not required at all because RAPL becomes functional for
perf only in the CPU online callback. That's correct because user space is
not yet informed about the CPU being onlined, so nothing caan rely on RAPL
being available on that particular CPU.

Move the allocation to the CPU online callback and simplify the hotplug
handling. At this point the package mapping is established and correct.

This also adds a missing check for available package data in the
event_init() function.

Reported-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vince Weaver <[email protected]>
Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
[ jwang: backport to 4.9 fix Null pointer deref during hotplug cpu.]
Signed-off-by: Jack Wang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/events/intel/rapl.c | 58 ++++++++++++++++++-------------------------
include/linux/cpuhotplug.h | 1
2 files changed, 25 insertions(+), 34 deletions(-)

--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -161,7 +161,13 @@ static u64 rapl_timer_ms;

static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
{
- return rapl_pmus->pmus[topology_logical_package_id(cpu)];
+ unsigned int pkgid = topology_logical_package_id(cpu);
+
+ /*
+ * The unsigned check also catches the '-1' return value for non
+ * existent mappings in the topology map.
+ */
+ return pkgid < rapl_pmus->maxpkg ? rapl_pmus->pmus[pkgid] : NULL;
}

static inline u64 rapl_read_counter(struct perf_event *event)
@@ -402,6 +408,8 @@ static int rapl_pmu_event_init(struct pe

/* must be done before validate_group */
pmu = cpu_to_rapl_pmu(event->cpu);
+ if (!pmu)
+ return -EINVAL;
event->cpu = pmu->cpu;
event->pmu_private = pmu;
event->hw.event_base = msr;
@@ -585,6 +593,19 @@ static int rapl_cpu_online(unsigned int
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
int target;

+ if (!pmu) {
+ pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
+ if (!pmu)
+ return -ENOMEM;
+
+ raw_spin_lock_init(&pmu->lock);
+ INIT_LIST_HEAD(&pmu->active_list);
+ pmu->pmu = &rapl_pmus->pmu;
+ pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+ rapl_hrtimer_init(pmu);
+
+ rapl_pmus->pmus[topology_logical_package_id(cpu)] = pmu;
+ }
/*
* Check if there is an online cpu in the package which collects rapl
* events already.
@@ -598,27 +619,6 @@ static int rapl_cpu_online(unsigned int
return 0;
}

-static int rapl_cpu_prepare(unsigned int cpu)
-{
- struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-
- if (pmu)
- return 0;
-
- pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
- if (!pmu)
- return -ENOMEM;
-
- raw_spin_lock_init(&pmu->lock);
- INIT_LIST_HEAD(&pmu->active_list);
- pmu->pmu = &rapl_pmus->pmu;
- pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
- pmu->cpu = -1;
- rapl_hrtimer_init(pmu);
- rapl_pmus->pmus[topology_logical_package_id(cpu)] = pmu;
- return 0;
-}
-
static int rapl_check_hw_unit(bool apply_quirk)
{
u64 msr_rapl_power_unit_bits;
@@ -804,28 +804,21 @@ static int __init rapl_pmu_init(void)
* Install callbacks. Core will call them for each online cpu.
*/

- ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP",
- rapl_cpu_prepare, NULL);
- if (ret)
- goto out;
-
ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
"AP_PERF_X86_RAPL_ONLINE",
rapl_cpu_online, rapl_cpu_offline);
if (ret)
- goto out1;
+ goto out;

ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
if (ret)
- goto out2;
+ goto out1;

rapl_advertise();
return 0;

-out2:
- cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
out1:
- cpuhp_remove_state(CPUHP_PERF_X86_RAPL_PREP);
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
out:
pr_warn("Initialization failed (%d), disabled\n", ret);
cleanup_rapl_pmus();
@@ -836,7 +829,6 @@ module_init(rapl_pmu_init);
static void __exit intel_rapl_exit(void)
{
cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
- cpuhp_remove_state_nocalls(CPUHP_PERF_X86_RAPL_PREP);
perf_pmu_unregister(&rapl_pmus->pmu);
cleanup_rapl_pmus();
}
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -10,7 +10,6 @@ enum cpuhp_state {
CPUHP_PERF_X86_PREPARE,
CPUHP_PERF_X86_UNCORE_PREP,
CPUHP_PERF_X86_AMD_UNCORE_PREP,
- CPUHP_PERF_X86_RAPL_PREP,
CPUHP_PERF_BFIN,
CPUHP_PERF_POWER,
CPUHP_PERF_SUPERH,


2017-08-28 08:41:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 56/84] perf/core: Fix group {cpu,task} validation

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mark Rutland <[email protected]>

commit 64aee2a965cf2954a038b5522f11d2cd2f0f8f3e upstream.

Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.

Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.

For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.

This can be triggered with the below test case, resulting in warnings
from arch backends.

#define _GNU_SOURCE
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}

char watched_char;

struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
};

int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;

/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}

/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}

/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}

/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);

/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}

return 0;
}

Fix this by validating this requirement regardless of whether we're
moving events.

Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Zhou Chengming <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/events/core.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9786,28 +9786,27 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;

/*
- * Do not allow to attach to a group in a different
- * task or CPU context:
+ * Make sure we're both events for the same CPU;
+ * grouping events for different CPUs is broken; since
+ * you can never concurrently schedule them anyhow.
*/
- if (move_group) {
- /*
- * Make sure we're both on the same task, or both
- * per-cpu events.
- */
- if (group_leader->ctx->task != ctx->task)
- goto err_context;
+ if (group_leader->cpu != event->cpu)
+ goto err_context;
+
+ /*
+ * Make sure we're both on the same task, or both
+ * per-CPU events.
+ */
+ if (group_leader->ctx->task != ctx->task)
+ goto err_context;

- /*
- * Make sure we're both events for the same CPU;
- * grouping events for different CPUs is broken; since
- * you can never concurrently schedule them anyhow.
- */
- if (group_leader->cpu != event->cpu)
- goto err_context;
- } else {
- if (group_leader->ctx != ctx)
- goto err_context;
- }
+ /*
+ * Do not allow to attach to a group in a different task
+ * or CPU context. If we're moving SW events, we'll fix
+ * this up later, so allow that.
+ */
+ if (!move_group && group_leader->ctx != ctx)
+ goto err_context;

/*
* Only a group leader can be exclusive or pinned


2017-08-28 08:13:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 81/84] ACPI: ioapic: Clear on-stack resource before using it

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <[email protected]>

commit e3d5092b6756b9e0b08f94bbeafcc7afe19f0996 upstream.

The on-stack resource-window 'win' in setup_res() is not
properly initialized. This causes the pointers in the
embedded 'struct resource' to contain stale addresses.

These pointers (in my case the ->child pointer) later get
propagated to the global iomem_resources list, causing a #GP
exception when the list is traversed in
iomem_map_sanity_check().

Fixes: c183619b63ec (x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug)
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/acpi/ioapic.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/acpi/ioapic.c
+++ b/drivers/acpi/ioapic.c
@@ -45,6 +45,12 @@ static acpi_status setup_res(struct acpi
struct resource *res = data;
struct resource_win win;

+ /*
+ * We might assign this to 'res' later, make sure all pointers are
+ * cleared before the resource is added to the global list
+ */
+ memset(&win, 0, sizeof(win));
+
res->flags = 0;
if (acpi_dev_filter_resource_type(acpi_res, IORESOURCE_MEM))
return AE_OK;


2017-08-28 08:42:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 83/84] ACPI: EC: Fix regression related to wrong ECDT initialization order

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lv Zheng <[email protected]>

commit 98529b9272e06a7767034fb8a32e43cdecda240a upstream.

Commit 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle
EC events) introduced acpi_ec_ecdt_start(), but that function is
invoked before acpi_ec_query_init(), which is too early. This causes
the kernel to crash if an EC event occurs after boot, when ec_query_wq
is not valid:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
...
Workqueue: events acpi_ec_event_handler
task: ffff9f539790dac0 task.stack: ffffb437c0e10000
RIP: 0010:__queue_work+0x32/0x430

Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
is actually a no-op in the majority of cases. However, commit
c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe)
caused the probing of the DSDT EC as the "boot EC" to be skipped when
the ECDT EC is valid and uncovered the bug.

Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
in acpi_ec_init().

Link: https://jira01.devtools.intel.com/browse/LCK-4348
Fixes: 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
Fixes: c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe)
Reported-by: Wang Wendy <[email protected]>
Tested-by: Feng Chenzhou <[email protected]>
Signed-off-by: Lv Zheng <[email protected]>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/acpi/ec.c | 17 +++++++----------
drivers/acpi/internal.h | 1 -
drivers/acpi/scan.c | 1 -
3 files changed, 7 insertions(+), 12 deletions(-)

--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -1728,7 +1728,7 @@ error:
* functioning ECDT EC first in order to handle the events.
* https://bugzilla.kernel.org/show_bug.cgi?id=115021
*/
-int __init acpi_ec_ecdt_start(void)
+static int __init acpi_ec_ecdt_start(void)
{
acpi_handle handle;

@@ -1959,20 +1959,17 @@ static inline void acpi_ec_query_exit(vo
int __init acpi_ec_init(void)
{
int result;
+ int ecdt_fail, dsdt_fail;

/* register workqueue for _Qxx evaluations */
result = acpi_ec_query_init();
if (result)
- goto err_exit;
- /* Now register the driver for the EC */
- result = acpi_bus_register_driver(&acpi_ec_driver);
- if (result)
- goto err_exit;
+ return result;

-err_exit:
- if (result)
- acpi_ec_query_exit();
- return result;
+ /* Drivers must be started after acpi_ec_query_init() */
+ ecdt_fail = acpi_ec_ecdt_start();
+ dsdt_fail = acpi_bus_register_driver(&acpi_ec_driver);
+ return ecdt_fail && dsdt_fail ? -ENODEV : 0;
}

/* EC driver currently not unloadable */
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -185,7 +185,6 @@ typedef int (*acpi_ec_query_func) (void
int acpi_ec_init(void);
int acpi_ec_ecdt_probe(void);
int acpi_ec_dsdt_probe(void);
-int acpi_ec_ecdt_start(void);
void acpi_ec_block_transactions(void);
void acpi_ec_unblock_transactions(void);
int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit,
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -2051,7 +2051,6 @@ int __init acpi_scan_init(void)

acpi_gpe_apply_masked_gpes();
acpi_update_all_gpes();
- acpi_ec_ecdt_start();

acpi_scan_initialized = true;



2017-08-28 08:13:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 55/84] ftrace: Check for null ret_stack on profile function graph entry function

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit a8f0f9e49956a74718874b800251455680085600 upstream.

There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.

The normal function_graph code handles this, but starting with commit
8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611

Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler")
Reported-by: [email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/ftrace.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -876,6 +876,10 @@ static int profile_graph_entry(struct ft

function_profile_call(trace->func, 0, NULL, NULL);

+ /* If function graph is shutting down, ret_stack can be NULL */
+ if (!current->ret_stack)
+ return 0;
+
if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
current->ret_stack[index].subtime = 0;



2017-08-28 08:43:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 82/84] ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: James Morse <[email protected]>

commit 7d64f82cceb21e6d95db312d284f5f195e120154 upstream.

When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Fixes: 81e88fdc432a (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/acpi/apei/ghes.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1072,6 +1072,7 @@ static int ghes_remove(struct platform_d
if (list_empty(&ghes_sci))
unregister_acpi_hed_notifier(&ghes_notifier_sci);
mutex_unlock(&ghes_list_mutex);
+ synchronize_rcu();
break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);


2017-08-28 08:13:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 78/84] ntb: no sleep in ntb_async_tx_submit

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Allen Hubbe <[email protected]>

commit 88931ec3dc11e7dbceb3b0df455693873b508fbe upstream.

Do not sleep in ntb_async_tx_submit, which could deadlock.
This reverts commit "8c874cc140d667f84ae4642bb5b5e0d6396d2ca4"

Fixes: 8c874cc140d6 ("NTB: Address out of DMA descriptor issue with NTB")
Reported-by: Jia-Ju Bai <[email protected]>
Signed-off-by: Allen Hubbe <[email protected]>
Acked-by: Dave Jiang <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ntb/ntb_transport.c | 50 ++++++--------------------------------------
1 file changed, 7 insertions(+), 43 deletions(-)

--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -176,14 +176,12 @@ struct ntb_transport_qp {
u64 rx_err_ver;
u64 rx_memcpy;
u64 rx_async;
- u64 dma_rx_prep_err;
u64 tx_bytes;
u64 tx_pkts;
u64 tx_ring_full;
u64 tx_err_no_buf;
u64 tx_memcpy;
u64 tx_async;
- u64 dma_tx_prep_err;
};

struct ntb_transport_mw {
@@ -256,8 +254,6 @@ enum {
#define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
#define NTB_QP_DEF_NUM_ENTRIES 100
#define NTB_LINK_DOWN_TIMEOUT 10
-#define DMA_RETRIES 20
-#define DMA_OUT_RESOURCE_TO msecs_to_jiffies(50)

static void ntb_transport_rxc_db(unsigned long data);
static const struct ntb_ctx_ops ntb_transport_ops;
@@ -518,12 +514,6 @@ static ssize_t debugfs_read(struct file
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"free tx - \t%u\n",
ntb_transport_tx_free_entry(qp));
- out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "DMA tx prep err - \t%llu\n",
- qp->dma_tx_prep_err);
- out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "DMA rx prep err - \t%llu\n",
- qp->dma_rx_prep_err);

out_offset += snprintf(buf + out_offset, out_count - out_offset,
"\n");
@@ -770,8 +760,6 @@ static void ntb_qp_link_down_reset(struc
qp->tx_err_no_buf = 0;
qp->tx_memcpy = 0;
qp->tx_async = 0;
- qp->dma_tx_prep_err = 0;
- qp->dma_rx_prep_err = 0;
}

static void ntb_qp_link_cleanup(struct ntb_transport_qp *qp)
@@ -1314,7 +1302,6 @@ static int ntb_async_rx_submit(struct nt
struct dmaengine_unmap_data *unmap;
dma_cookie_t cookie;
void *buf = entry->buf;
- int retries = 0;

len = entry->len;
device = chan->device;
@@ -1343,22 +1330,11 @@ static int ntb_async_rx_submit(struct nt

unmap->from_cnt = 1;

- for (retries = 0; retries < DMA_RETRIES; retries++) {
- txd = device->device_prep_dma_memcpy(chan,
- unmap->addr[1],
- unmap->addr[0], len,
- DMA_PREP_INTERRUPT);
- if (txd)
- break;
-
- set_current_state(TASK_INTERRUPTIBLE);
- schedule_timeout(DMA_OUT_RESOURCE_TO);
- }
-
- if (!txd) {
- qp->dma_rx_prep_err++;
+ txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
+ unmap->addr[0], len,
+ DMA_PREP_INTERRUPT);
+ if (!txd)
goto err_get_unmap;
- }

txd->callback_result = ntb_rx_copy_callback;
txd->callback_param = entry;
@@ -1603,7 +1579,6 @@ static int ntb_async_tx_submit(struct nt
struct dmaengine_unmap_data *unmap;
dma_addr_t dest;
dma_cookie_t cookie;
- int retries = 0;

device = chan->device;
dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;
@@ -1625,21 +1600,10 @@ static int ntb_async_tx_submit(struct nt

unmap->to_cnt = 1;

- for (retries = 0; retries < DMA_RETRIES; retries++) {
- txd = device->device_prep_dma_memcpy(chan, dest,
- unmap->addr[0], len,
- DMA_PREP_INTERRUPT);
- if (txd)
- break;
-
- set_current_state(TASK_INTERRUPTIBLE);
- schedule_timeout(DMA_OUT_RESOURCE_TO);
- }
-
- if (!txd) {
- qp->dma_tx_prep_err++;
+ txd = device->device_prep_dma_memcpy(chan, dest, unmap->addr[0], len,
+ DMA_PREP_INTERRUPT);
+ if (!txd)
goto err_get_unmap;
- }

txd->callback_result = ntb_tx_copy_callback;
txd->callback_param = entry;


2017-08-28 08:44:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 79/84] ntb: ntb_test: ensure the link is up before trying to configure the mws

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Logan Gunthorpe <[email protected]>

commit 0eb46345364d7318b11068c46e8a68d5dc10f65e upstream.

After the link tests, there is a race on one side of the test for
the link coming up. It's possible, in some cases, for the test script
to write to the 'peer_trans' files before the link has come up.

To fix this, we simply use the link event file to ensure both sides
see the link as up before continuning.

Signed-off-by: Logan Gunthorpe <[email protected]>
Acked-by: Allen Hubbe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Fixes: a9c59ef77458 ("ntb_test: Add a selftest script for the NTB subsystem")
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/testing/selftests/ntb/ntb_test.sh | 4 ++++
1 file changed, 4 insertions(+)

--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -326,6 +326,10 @@ function ntb_tool_tests()
link_test $LOCAL_TOOL $REMOTE_TOOL
link_test $REMOTE_TOOL $LOCAL_TOOL

+ #Ensure the link is up on both sides before continuing
+ write_file Y $LOCAL_TOOL/link_event
+ write_file Y $REMOTE_TOOL/link_event
+
for PEER_TRANS in $(ls $LOCAL_TOOL/peer_trans*); do
PT=$(basename $PEER_TRANS)
write_file $MW_SIZE $LOCAL_TOOL/$PT


2017-08-28 08:44:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 77/84] NTB: ntb_test: fix bug printing ntb_perf results

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Logan Gunthorpe <[email protected]>

commit 07b0b22b3e58824f70b9188d085d400069ca3240 upstream.

The code mistakenly prints the local perf results for the remote test
so the script reports identical results for both directions. Fix this
by ensuring we print the remote result.

Signed-off-by: Logan Gunthorpe <[email protected]>
Fixes: a9c59ef77458 ("ntb_test: Add a selftest script for the NTB subsystem")
Acked-by: Allen Hubbe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/testing/selftests/ntb/ntb_test.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -305,7 +305,7 @@ function perf_test()
echo "Running remote perf test $WITH DMA"
write_file "" $REMOTE_PERF/run
echo -n " "
- read_file $LOCAL_PERF/run
+ read_file $REMOTE_PERF/run
echo " Passed"

_modprobe -r ntb_perf


2017-08-28 08:13:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 75/84] ntb_transport: fix qp count bug

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Logan Gunthorpe <[email protected]>

commit cb827ee6ccc3e480f0d9c0e8e53eef55be5b0414 upstream.

In cases where there are more mw's than spads/2-2, the mw count gets
reduced to match the limitation. ntb_transport also tries to ensure that
there are fewer qps than mws but uses the full mw count instead of
the reduced one. When this happens, the math in
'ntb_transport_setup_qp_mw' will get confused and result in a kernel
paging request bug.

This patch fixes the bug by reducing qp_count to the reduced mw count
instead of the full mw count.

Signed-off-by: Logan Gunthorpe <[email protected]>
Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Acked-by: Allen Hubbe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ntb/ntb_transport.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1125,8 +1125,8 @@ static int ntb_transport_probe(struct nt
qp_count = ilog2(qp_bitmap);
if (max_num_clients && max_num_clients < qp_count)
qp_count = max_num_clients;
- else if (mw_count < qp_count)
- qp_count = mw_count;
+ else if (nt->mw_count < qp_count)
+ qp_count = nt->mw_count;

qp_bitmap &= BIT_ULL(qp_count) - 1;



2017-08-28 08:45:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 74/84] Clarify (and fix) MAX_LFS_FILESIZE macros

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 0cc3b0ec23ce4c69e1e890ed2b8d2fa932b14aad upstream.

We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus. On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong. We used to
define that value to

(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense. The index is actually the full
32 bit unsigned value, and we can use that whole full page. So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow. So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT. That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume. It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed. But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case. That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Dave Kleikamp <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/fs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -941,9 +941,9 @@ static inline struct file *get_file(stru
/* Page cache limit. The filesystems should put that into their s_maxbytes
limits, otherwise bad things can happen in VM. */
#if BITS_PER_LONG==32
-#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
+#define MAX_LFS_FILESIZE ((loff_t)ULONG_MAX << PAGE_SHIFT)
#elif BITS_PER_LONG==64
-#define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL)
+#define MAX_LFS_FILESIZE ((loff_t)LLONG_MAX)
#endif

#define FL_POSIX 1


2017-08-28 08:45:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 72/84] iio: hid-sensor-trigger: Fix the race with user space powering up sensors

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Pandruvada <[email protected]>

commit f1664eaacec31035450132c46ed2915fd2b2049a upstream.

It has been reported for a while that with iio-sensor-proxy service the
rotation only works after one suspend/resume cycle. This required a wait
in the systemd unit file to avoid race. I found a Yoga 900 where I could
reproduce this.

The problem scenerio is:
- During sensor driver init, enable run time PM and also set a
auto-suspend for 3 seconds.
This result in one runtime resume. But there is a check to avoid
a powerup in this sequence, but rpm is active
- User space iio-sensor-proxy tries to power up the sensor. Since rpm is
active it will simply return. But sensors were not actually
powered up in the prior sequence, so actaully the sensors will not work
- After 3 seconds the auto suspend kicks

If we add a wait in systemd service file to fire iio-sensor-proxy after
3 seconds, then now everything will work as the runtime resume will
actually powerup the sensor as this is a user request.

To avoid this:
- Remove the check to match user requested state, this will cause a
brief powerup, but if the iio-sensor-proxy starts immediately it will
still work as the sensors are ON.
- Also move the autosuspend delay to place when user requested turn off
of sensors, like after user finished raw read or buffer disable

Signed-off-by: Srinivas Pandruvada <[email protected]>
Tested-by: Bastien Nocera <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iio/common/hid-sensors/hid-sensor-trigger.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/iio/common/hid-sensors/hid-sensor-trigger.c
+++ b/drivers/iio/common/hid-sensors/hid-sensor-trigger.c
@@ -36,8 +36,6 @@ static int _hid_sensor_power_state(struc
s32 poll_value = 0;

if (state) {
- if (!atomic_read(&st->user_requested_state))
- return 0;
if (sensor_hub_device_open(st->hsdev))
return -EIO;

@@ -86,6 +84,9 @@ static int _hid_sensor_power_state(struc
&report_val);
}

+ pr_debug("HID_SENSOR %s set power_state %d report_state %d\n",
+ st->pdev->name, state_val, report_val);
+
sensor_hub_get_feature(st->hsdev, st->power_state.report_id,
st->power_state.index,
sizeof(state_val), &state_val);
@@ -107,6 +108,7 @@ int hid_sensor_power_state(struct hid_se
ret = pm_runtime_get_sync(&st->pdev->dev);
else {
pm_runtime_mark_last_busy(&st->pdev->dev);
+ pm_runtime_use_autosuspend(&st->pdev->dev);
ret = pm_runtime_put_autosuspend(&st->pdev->dev);
}
if (ret < 0) {
@@ -201,8 +203,6 @@ int hid_sensor_setup_trigger(struct iio_
/* Default to 3 seconds, but can be changed from sysfs */
pm_runtime_set_autosuspend_delay(&attrb->pdev->dev,
3000);
- pm_runtime_use_autosuspend(&attrb->pdev->dev);
-
return ret;
error_unreg_trigger:
iio_trigger_unregister(trig);


2017-08-28 08:12:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 69/84] binder: Use wake up hint for synchronous transactions.

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Riley Andrews <[email protected]>

commit 00b40d613352c623aaae88a44e5ded7c912909d7 upstream.

Use wake_up_interruptible_sync() to hint to the scheduler binder
transactions are synchronous wakeups. Disable preemption while waking
to avoid ping-ponging on the binder lock.

Signed-off-by: Todd Kjos <[email protected]>
Signed-off-by: Omprakash Dhyade <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1724,8 +1724,12 @@ static void binder_transaction(struct bi
list_add_tail(&t->work.entry, target_list);
tcomplete->type = BINDER_WORK_TRANSACTION_COMPLETE;
list_add_tail(&tcomplete->entry, &thread->todo);
- if (target_wait)
- wake_up_interruptible(target_wait);
+ if (target_wait) {
+ if (reply || !(t->flags & TF_ONE_WAY))
+ wake_up_interruptible_sync(target_wait);
+ else
+ wake_up_interruptible(target_wait);
+ }
return;

err_get_unused_fd_failed:


2017-08-28 08:46:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 70/84] ANDROID: binder: fix proc->tsk check.

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Martijn Coenen <[email protected]>

commit b2a6d1b999a4c13e5997bb864694e77172d45250 upstream.

Commit c4ea41ba195d ("binder: use group leader instead of open thread")'
was incomplete and didn't update a check in binder_mmap(), causing all
mmap() calls into the binder driver to fail.

Signed-off-by: Martijn Coenen <[email protected]>
Tested-by: John Stultz <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2875,7 +2875,7 @@ static int binder_mmap(struct file *filp
const char *failure_string;
struct binder_buffer *buffer;

- if (proc->tsk != current)
+ if (proc->tsk != current->group_leader)
return -EINVAL;

if ((vma->vm_end - vma->vm_start) > SZ_4M)


2017-08-28 08:47:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 67/84] Revert "android: binder: Sanity check at binder ioctl"

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Todd Kjos <[email protected]>

commit a2b18708ee14baec4ef9c0fba96070bba14d0081 upstream.

This reverts commit a906d6931f3ccaf7de805643190765ddd7378e27.

The patch introduced a race in the binder driver. An attempt to fix the
race was submitted in "[PATCH v2] android: binder: fix dangling pointer
comparison", however the conclusion in the discussion for that patch
was that the original patch should be reverted.

The reversion is being done as part of the fine-grained locking
patchset since the patch would need to be refactored when
proc->vmm_vm_mm is removed from struct binder_proc and added
in the binder allocator.

Signed-off-by: Todd Kjos <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder.c | 5 -----
1 file changed, 5 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2760,10 +2760,6 @@ static long binder_ioctl(struct file *fi
/*pr_info("binder_ioctl: %d:%d %x %lx\n",
proc->pid, current->pid, cmd, arg);*/

- if (unlikely(current->mm != proc->vma_vm_mm)) {
- pr_err("current mm mismatch proc mm\n");
- return -EINVAL;
- }
trace_binder_ioctl(cmd, arg);

ret = wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2);
@@ -2978,7 +2974,6 @@ static int binder_open(struct inode *nod
return -ENOMEM;
get_task_struct(current);
proc->tsk = current;
- proc->vma_vm_mm = current->mm;
INIT_LIST_HEAD(&proc->todo);
init_waitqueue_head(&proc->wait);
proc->default_priority = task_nice(current);


2017-08-28 08:12:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 64/84] Bluetooth: hidp: fix possible might sleep error in hidp_session_thread

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <[email protected]>

commit 5da8e47d849d3d37b14129f038782a095b9ad049 upstream.

It looks like hidp_session_thread has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <[email protected]>
Tested-by: AL Yu-Chen Cho <[email protected]>
Tested-by: Rohit Vaswani <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/bluetooth/hidp/core.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -36,6 +36,7 @@
#define VERSION "1.2"

static DECLARE_RWSEM(hidp_session_sem);
+static DECLARE_WAIT_QUEUE_HEAD(hidp_session_wq);
static LIST_HEAD(hidp_session_list);

static unsigned char hidp_keycode[256] = {
@@ -1068,12 +1069,12 @@ static int hidp_session_start_sync(struc
* Wake up session thread and notify it to stop. This is asynchronous and
* returns immediately. Call this whenever a runtime error occurs and you want
* the session to stop.
- * Note: wake_up_process() performs any necessary memory-barriers for us.
+ * Note: wake_up_interruptible() performs any necessary memory-barriers for us.
*/
static void hidp_session_terminate(struct hidp_session *session)
{
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+ wake_up_interruptible(&hidp_session_wq);
}

/*
@@ -1180,7 +1181,9 @@ static void hidp_session_run(struct hidp
struct sock *ctrl_sk = session->ctrl_sock->sk;
struct sock *intr_sk = session->intr_sock->sk;
struct sk_buff *skb;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

+ add_wait_queue(&hidp_session_wq, &wait);
for (;;) {
/*
* This thread can be woken up two ways:
@@ -1188,12 +1191,10 @@ static void hidp_session_run(struct hidp
* session->terminate flag and wakes this thread up.
* - Via modifying the socket state of ctrl/intr_sock. This
* thread is woken up by ->sk_state_changed().
- *
- * Note: set_current_state() performs any necessary
- * memory-barriers for us.
*/
- set_current_state(TASK_INTERRUPTIBLE);

+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();
if (atomic_read(&session->terminate))
break;

@@ -1227,11 +1228,22 @@ static void hidp_session_run(struct hidp
hidp_process_transmit(session, &session->ctrl_transmit,
session->ctrl_sock);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
+ remove_wait_queue(&hidp_session_wq, &wait);

atomic_inc(&session->terminate);
- set_current_state(TASK_RUNNING);
+
+ /* Ensure session->terminate is updated */
+ smp_mb__after_atomic();
+}
+
+static int hidp_session_wake_function(wait_queue_t *wait,
+ unsigned int mode,
+ int sync, void *key)
+{
+ wake_up_interruptible(&hidp_session_wq);
+ return false;
}

/*
@@ -1244,7 +1256,8 @@ static void hidp_session_run(struct hidp
static int hidp_session_thread(void *arg)
{
struct hidp_session *session = arg;
- wait_queue_t ctrl_wait, intr_wait;
+ DEFINE_WAIT_FUNC(ctrl_wait, hidp_session_wake_function);
+ DEFINE_WAIT_FUNC(intr_wait, hidp_session_wake_function);

BT_DBG("session %p", session);

@@ -1254,8 +1267,6 @@ static int hidp_session_thread(void *arg
set_user_nice(current, -15);
hidp_set_timer(session);

- init_waitqueue_entry(&ctrl_wait, current);
- init_waitqueue_entry(&intr_wait, current);
add_wait_queue(sk_sleep(session->ctrl_sock->sk), &ctrl_wait);
add_wait_queue(sk_sleep(session->intr_sock->sk), &intr_wait);
/* This memory barrier is paired with wq_has_sleeper(). See


2017-08-28 08:47:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 66/84] Bluetooth: bnep: fix possible might sleep error in bnep_session

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <[email protected]>

commit 25717382c1dd0ddced2059053e3ca5088665f7a5 upstream.

It looks like bnep_session has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
Reviewed-by: AL Yu-Chen Cho <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/bluetooth/bnep/core.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -484,16 +484,16 @@ static int bnep_session(void *arg)
struct net_device *dev = s->dev;
struct sock *sk = s->sock->sk;
struct sk_buff *skb;
- wait_queue_t wait;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

BT_DBG("");

set_user_nice(current, -15);

- init_waitqueue_entry(&wait, current);
add_wait_queue(sk_sleep(sk), &wait);
while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();

if (atomic_read(&s->terminate))
break;
@@ -515,9 +515,8 @@ static int bnep_session(void *arg)
break;
netif_wake_queue(dev);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
remove_wait_queue(sk_sleep(sk), &wait);

/* Cleanup session */
@@ -666,7 +665,7 @@ int bnep_del_connection(struct bnep_conn
s = __bnep_get_session(req->dst);
if (s) {
atomic_inc(&s->terminate);
- wake_up_process(s->task);
+ wake_up_interruptible(sk_sleep(s->sock->sk));
} else
err = -ENOENT;



2017-08-28 08:12:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 65/84] Bluetooth: cmtp: fix possible might sleep error in cmtp_session

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <[email protected]>

commit f06d977309d09253c744e54e75c5295ecc52b7b4 upstream.

It looks like cmtp_session has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
Reviewed-by: AL Yu-Chen Cho <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/bluetooth/cmtp/core.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -280,16 +280,16 @@ static int cmtp_session(void *arg)
struct cmtp_session *session = arg;
struct sock *sk = session->sock->sk;
struct sk_buff *skb;
- wait_queue_t wait;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

BT_DBG("session %p", session);

set_user_nice(current, -15);

- init_waitqueue_entry(&wait, current);
add_wait_queue(sk_sleep(sk), &wait);
while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();

if (atomic_read(&session->terminate))
break;
@@ -306,9 +306,8 @@ static int cmtp_session(void *arg)

cmtp_process_transmit(session);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
remove_wait_queue(sk_sleep(sk), &wait);

down_write(&cmtp_session_sem);
@@ -393,7 +392,7 @@ int cmtp_add_connection(struct cmtp_conn
err = cmtp_attach_device(session);
if (err < 0) {
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+ wake_up_interruptible(sk_sleep(session->sock->sk));
up_write(&cmtp_session_sem);
return err;
}
@@ -431,7 +430,11 @@ int cmtp_del_connection(struct cmtp_conn

/* Stop session thread */
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+
+ /* Ensure session->terminate is updated */
+ smp_mb__after_atomic();
+
+ wake_up_interruptible(sk_sleep(session->sock->sk));
} else
err = -ENOENT;



2017-08-28 08:12:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 18/84] net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Huy Nguyen <[email protected]>


[ Upstream commit ca3d89a3ebe79367bd41b6b8ba37664478ae2dba ]

enable_4k_uar module parameter was added in patch cited below to
address the backward compatibility issue in SRIOV when the VM has
system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
implementation.

The above compatibility issue does not exist in the non SRIOV case.
In this patch, we always enable 4k uar implementation if SRIOV
is not enabled on mlx4's supported cards.

Fixes: 76e39ccf9c36 ("net/mlx4_core: Fix backward compatibility on VFs")
Signed-off-by: Huy Nguyen <[email protected]>
Reviewed-by: Daniel Jurgens <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -430,7 +430,7 @@ static int mlx4_dev_cap(struct mlx4_dev
/* Virtual PCI function needs to determine UAR page size from
* firmware. Only master PCI function can set the uar page size
*/
- if (enable_4k_uar)
+ if (enable_4k_uar || !dev->persist->num_vfs)
dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
else
dev->uar_page_shift = PAGE_SHIFT;
@@ -2269,7 +2269,7 @@ static int mlx4_init_hca(struct mlx4_dev

dev->caps.max_fmr_maps = (1 << (32 - ilog2(dev->caps.num_mpts))) - 1;

- if (enable_4k_uar) {
+ if (enable_4k_uar || !dev->persist->num_vfs) {
init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
PAGE_SHIFT - DEFAULT_UAR_PAGE_SHIFT;
init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;


2017-08-28 08:48:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 53/84] cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ronnie Sahlberg <[email protected]>

commit d3edede29f74d335f81d95a4588f5f136a9f7dcf upstream.

Add checking for the path component length and verify it is <= the maximum
that the server advertizes via FileFsAttributeInformation.

With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
when users to access an overlong path.

To test this, try to cd into a (non-existing) directory on a CIFS share
that has a too long name:
cd /mnt/aaaaaaaaaaaaaaa...

and it now should show a good error message from the shell:
bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long

rh bz 1153996

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/dir.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -183,15 +183,20 @@ cifs_bp_rename_retry:
}

/*
+ * Don't allow path components longer than the server max.
* Don't allow the separator character in a path component.
* The VFS will not allow "/", but "\" is allowed by posix.
*/
static int
-check_name(struct dentry *direntry)
+check_name(struct dentry *direntry, struct cifs_tcon *tcon)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(direntry->d_sb);
int i;

+ if (unlikely(direntry->d_name.len >
+ tcon->fsAttrInfo.MaxPathNameComponentLength))
+ return -ENAMETOOLONG;
+
if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS)) {
for (i = 0; i < direntry->d_name.len; i++) {
if (direntry->d_name.name[i] == '\\') {
@@ -489,10 +494,6 @@ cifs_atomic_open(struct inode *inode, st
return finish_no_open(file, res);
}

- rc = check_name(direntry);
- if (rc)
- return rc;
-
xid = get_xid();

cifs_dbg(FYI, "parent inode = 0x%p name is: %pd and dentry = 0x%p\n",
@@ -505,6 +506,11 @@ cifs_atomic_open(struct inode *inode, st
}

tcon = tlink_tcon(tlink);
+
+ rc = check_name(direntry, tcon);
+ if (rc)
+ goto out_free_xid;
+
server = tcon->ses->server;

if (server->ops->new_lease_key)
@@ -765,7 +771,7 @@ cifs_lookup(struct inode *parent_dir_ino
}
pTcon = tlink_tcon(tlink);

- rc = check_name(direntry);
+ rc = check_name(direntry, pTcon);
if (rc)
goto lookup_out;



2017-08-28 08:12:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 19/84] irda: do not leak initialized list.dev to userspace

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Colin Ian King <[email protected]>


[ Upstream commit b024d949a3c24255a7ef1a470420eb478949aa4c ]

list.dev has not been initialized and so the copy_to_user is copying
data from the stack back to user space which is a potential
information leak. Fix this ensuring all of list is initialized to
zero.

Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/irda/af_irda.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -2223,7 +2223,7 @@ static int irda_getsockopt(struct socket
{
struct sock *sk = sock->sk;
struct irda_sock *self = irda_sk(sk);
- struct irda_device_list list;
+ struct irda_device_list list = { 0 };
struct irda_device_info *discoveries;
struct irda_ias_set * ias_opt; /* IAS get/query params */
struct ias_object * ias_obj; /* Object in IAS */


2017-08-28 08:49:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 10/84] ptr_ring: use kmalloc_array()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 81fbfe8adaf38d4f5a98c19bebfd41c5d6acaee8 ]

As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.

Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.

Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/ptr_ring.h | 9 +++++----
include/linux/skb_array.h | 3 ++-
2 files changed, 7 insertions(+), 5 deletions(-)

--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -340,9 +340,9 @@ static inline void *ptr_ring_consume_bh(
__PTR_RING_PEEK_CALL_v; \
})

-static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
+static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
{
- return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
+ return kcalloc(size, sizeof(void *), gfp);
}

static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
@@ -417,7 +417,8 @@ static inline int ptr_ring_resize(struct
* In particular if you consume ring in interrupt or BH context, you must
* disable interrupts/BH when doing so.
*/
-static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
+ unsigned int nrings,
int size,
gfp_t gfp, void (*destroy)(void *))
{
@@ -425,7 +426,7 @@ static inline int ptr_ring_resize_multip
void ***queues;
int i;

- queues = kmalloc(nrings * sizeof *queues, gfp);
+ queues = kmalloc_array(nrings, sizeof(*queues), gfp);
if (!queues)
goto noqueues;

--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -162,7 +162,8 @@ static inline int skb_array_resize(struc
}

static inline int skb_array_resize_multiple(struct skb_array **rings,
- int nrings, int size, gfp_t gfp)
+ int nrings, unsigned int size,
+ gfp_t gfp)
{
BUILD_BUG_ON(offsetof(struct skb_array, ring));
return ptr_ring_resize_multiple((struct ptr_ring **)rings,


2017-08-28 08:50:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 15/84] ipv6: reset fn->rr_ptr when replacing route

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Wang <[email protected]>


[ Upstream commit 383143f31d7d3525a1dbff733d52fff917f82f15 ]

syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_fib.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -897,6 +897,8 @@ add:
}
nsiblings = iter->rt6i_nsiblings;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);

if (nsiblings) {
@@ -909,6 +911,8 @@ add:
if (rt6_qualify_for_ecmp(iter)) {
*ins = iter->dst.rt6_next;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);
nsiblings--;
} else {


2017-08-28 08:50:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 50/84] tracing: Fix freeing of filter in create_filter() when set_str is false

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 8b0db1a5bdfcee0dbfa89607672598ae203c9045 upstream.

Performing the following task with kmemleak enabled:

# cd /sys/kernel/tracing/events/irq/irq_handler_entry/
# echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
# echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800b9290308 (size 32):
comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
[<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
[<ffffffff812639c9>] create_filter+0xa9/0x160
[<ffffffff81263bdc>] create_event_filter+0xc/0x10
[<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
[<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
[<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
[<ffffffff8138cf87>] __vfs_write+0xd7/0x380
[<ffffffff8138f421>] vfs_write+0x101/0x260
[<ffffffff8139187b>] SyS_write+0xab/0x130
[<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
[<ffffffffffffffff>] 0xffffffffffffffff

The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 38b78eb85 ("tracing: Factorize filter creation")
Reported-by: Chunyu Hu <[email protected]>
Tested-by: Chunyu Hu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace_events_filter.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1926,6 +1926,10 @@ static int create_filter(struct trace_ev
if (err && set_str)
append_filter_err(ps, filter);
}
+ if (err && !set_str) {
+ free_event_filter(filter);
+ filter = NULL;
+ }
create_filter_finish(ps);

*filterp = filter;


2017-08-28 08:51:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 49/84] tracing: Fix kmemleak in tracing_map_array_free()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chunyu Hu <[email protected]>

commit 475bb3c69ab05df2a6ecef6acc2393703d134180 upstream.

kmemleak reported the below leak when I was doing clear of the hist
trigger. With this patch, the kmeamleak is gone.

unreferenced object 0xffff94322b63d760 (size 32):
comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
hex dump (first 32 bytes):
00 01 00 00 04 00 00 00 08 00 00 00 ff 00 00 00 ................
10 00 00 00 00 00 00 00 80 a8 7a f2 31 94 ff ff ..........z.1...
backtrace:
[<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff9e424cba>] kmem_cache_alloc_trace+0xca/0x1d0
[<ffffffff9e377736>] tracing_map_array_alloc+0x26/0x140
[<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
[<ffffffff9e38b935>] create_hist_data+0x535/0x750
[<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
[<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
[<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
[<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
[<ffffffff9e450b85>] SyS_write+0x55/0xc0
[<ffffffff9e203857>] do_syscall_64+0x67/0x150
[<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff9431f27aa880 (size 128):
comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
hex dump (first 32 bytes):
00 00 8c 2a 32 94 ff ff 00 f0 8b 2a 32 94 ff ff ...*2......*2...
00 e0 8b 2a 32 94 ff ff 00 d0 8b 2a 32 94 ff ff ...*2......*2...
backtrace:
[<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff9e425348>] __kmalloc+0xe8/0x220
[<ffffffff9e3777c1>] tracing_map_array_alloc+0xb1/0x140
[<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
[<ffffffff9e38b935>] create_hist_data+0x535/0x750
[<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
[<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
[<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
[<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
[<ffffffff9e450b85>] SyS_write+0x55/0xc0
[<ffffffff9e203857>] do_syscall_64+0x67/0x150
[<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map")
Signed-off-by: Chunyu Hu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/tracing_map.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -221,16 +221,19 @@ void tracing_map_array_free(struct traci
if (!a)
return;

- if (!a->pages) {
- kfree(a);
- return;
- }
+ if (!a->pages)
+ goto free;

for (i = 0; i < a->n_pages; i++) {
if (!a->pages[i])
break;
free_page((unsigned long)a->pages[i]);
}
+
+ kfree(a->pages);
+
+ free:
+ kfree(a);
}

struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts,


2017-08-28 08:51:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 47/84] drm: rcar-du: Fix H/V sync signal polarity configuration

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Koji Matsuoka <[email protected]>

commit fd1adef3bff0663c5ac31b45bc4a05fafd43d19b upstream.

The VSL and HSL bits in the DSMR register set the corresponding
horizontal and vertical sync signal polarity to active high. The code
got it the wrong way around, fix it.

Signed-off-by: Koji Matsuoka <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Thong Ho <[email protected]>
Signed-off-by: Nhan Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/rcar-du/rcar_du_crtc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
@@ -149,8 +149,8 @@ static void rcar_du_crtc_set_display_tim
rcar_du_group_write(rcrtc->group, rcrtc->index % 2 ? OTAR2 : OTAR, 0);

/* Signal polarities */
- value = ((mode->flags & DRM_MODE_FLAG_PVSYNC) ? 0 : DSMR_VSL)
- | ((mode->flags & DRM_MODE_FLAG_PHSYNC) ? 0 : DSMR_HSL)
+ value = ((mode->flags & DRM_MODE_FLAG_PVSYNC) ? DSMR_VSL : 0)
+ | ((mode->flags & DRM_MODE_FLAG_PHSYNC) ? DSMR_HSL : 0)
| DSMR_DIPM_DISP | DSMR_CSPM;
rcar_du_crtc_write(rcrtc, DSMR, value);



2017-08-28 08:52:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 45/84] drm: rcar-du: Fix crash in encoder failure error path

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Laurent Pinchart <[email protected]>

commit 05ee29e94acf0d4b3998c3f93374952de8f90176 upstream.

When an encoder fails to initialize the driver prints an error message
to the kernel log. The message contains the name of the encoder's DT
node, which is NULL for internal encoders. Use the of_node_full_name()
macro to avoid dereferencing a NULL pointer, print the output number to
add more context to the error, and make sure we still own a reference to
the encoder's DT node by delaying the of_node_put() call.

Signed-off-by: Laurent Pinchart <[email protected]>
Reviewed-by: Gustavo Padovan <[email protected]>
Signed-off-by: Thong Ho <[email protected]>
Signed-off-by: Nhan Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/rcar-du/rcar_du_kms.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -453,13 +453,13 @@ static int rcar_du_encoders_init_one(str
}

ret = rcar_du_encoder_init(rcdu, enc_type, output, encoder, connector);
- of_node_put(encoder);
- of_node_put(connector);
-
if (ret && ret != -EPROBE_DEFER)
dev_warn(rcdu->dev,
- "failed to initialize encoder %s (%d), skipping\n",
- encoder->full_name, ret);
+ "failed to initialize encoder %s on output %u (%d), skipping\n",
+ of_node_full_name(encoder), output, ret);
+
+ of_node_put(encoder);
+ of_node_put(connector);

return ret;
}


2017-08-28 08:53:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 41/84] fork: fix incorrect fput of ->exe_file causing use-after-free

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 2b7e8665b4ff51c034c55df3cff76518d1a9ee3a upstream.

Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's ->exe_file. Since the
->exe_file of the new mm_struct was already set to the old ->exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to ->exe_file which was never
taken.

This caused the struct file to later be freed prematurely.

Fix it by updating mm_init() to NULL out the ->exe_file, in the same
place it clears other things like the list of mmaps.

This bug was found by syzkaller. It can be reproduced using the
following C program:

#define _GNU_SOURCE
#include <pthread.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static void *mmap_thread(void *_arg)
{
for (;;) {
mmap(NULL, 0x1000000, PROT_READ,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
}
}

static void *fork_thread(void *_arg)
{
usleep(rand() % 10000);
fork();
}

int main(void)
{
fork();
fork();
fork();
for (;;) {
if (fork() == 0) {
pthread_t t;

pthread_create(&t, NULL, mmap_thread, NULL);
pthread_create(&t, NULL, fork_thread, NULL);
usleep(rand() % 10000);
syscall(__NR_exit_group, 0);
}
wait(NULL);
}
}

No special kernel config options are needed. It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's ->vm_file being used after it's
already been freed.

Google Bug Id: 64772007

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -766,6 +766,7 @@ static struct mm_struct *mm_init(struct
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+ RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_mm_init(mm);
clear_tlb_flush_pending(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS


2017-08-28 08:11:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 38/84] mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kirill A. Shutemov <[email protected]>

commit 435c0b87d661da83771c30ed775f7c37eed193fb upstream.

/sys/kernel/mm/transparent_hugepage/shmem_enabled controls if we want
to allocate huge pages when allocate pages for private in-kernel shmem
mount.

Unfortunately, as Dan noticed, I've screwed it up and the only way to
make kernel allocate huge page for the mount is to use "force" there.
All other values will be effectively ignored.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5a6e75f8110c ("shmem: prepare huge= mount option and sysfs knob")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/shmem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3810,7 +3810,7 @@ int __init shmem_init(void)
}

#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
- if (has_transparent_hugepage() && shmem_huge < SHMEM_HUGE_DENY)
+ if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
else
shmem_huge = 0; /* just in case it was patched */
@@ -3871,7 +3871,7 @@ static ssize_t shmem_enabled_store(struc
return -EINVAL;

shmem_huge = huge;
- if (shmem_huge < SHMEM_HUGE_DENY)
+ if (shmem_huge > SHMEM_HUGE_DENY)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
return count;
}


2017-08-28 08:53:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 13/84] sctp: fully initialize the IPv6 address in sctp_v6_to_addr()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <[email protected]>


[ Upstream commit 15339e441ec46fbc3bf3486bb1ae4845b0f1bb8d ]

KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
Make sure all fields of an IPv6 address are initialized, which
guarantees that the IPv4 fields are also initialized.

==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================

Signed-off-by: Alexander Potapenko <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/ipv6.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -512,7 +512,9 @@ static void sctp_v6_to_addr(union sctp_a
{
addr->sa.sa_family = AF_INET6;
addr->v6.sin6_port = port;
+ addr->v6.sin6_flowinfo = 0;
addr->v6.sin6_addr = *saddr;
+ addr->v6.sin6_scope_id = 0;
}

/* Compare addresses exactly.


2017-08-28 08:11:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 36/84] ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Sakamoto <[email protected]>

commit 0c264af7be2013266c5b4c644f3f366399ee490a upstream.

When calling 'iso_resource_free()' for uninitialized data, this function
causes NULL pointer dereference due to its 'unit' member. This occurs when
unplugging audio and music units on IEEE 1394 bus at failure of card
registration.

This commit fixes the bug. The bug exists since kernel v4.5.

Fixes: 324540c4e05c ('ALSA: fireface: postpone sound card registration') at v4.12
Fixes: 8865a31e0fd8 ('ALSA: firewire-motu: postpone sound card registration') at v4.12
Fixes: b610386c8afb ('ALSA: firewire-tascam: deleyed registration of sound card') at v4.7
Fixes: 86c8dd7f4da3 ('ALSA: firewire-digi00x: delayed registration of sound card') at v4.7
Fixes: 6c29230e2a5f ('ALSA: oxfw: delayed registration of sound card') at v4.7
Fixes: 7d3c1d5901aa ('ALSA: fireworks: delayed registration of sound card') at v4.7
Fixes: 04a2c73c97eb ('ALSA: bebob: delayed registration of sound card') at v4.7
Fixes: b59fb1900b4f ('ALSA: dice: postpone card registration') at v4.5
Signed-off-by: Takashi Sakamoto <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/firewire/iso-resources.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/sound/firewire/iso-resources.c
+++ b/sound/firewire/iso-resources.c
@@ -210,9 +210,14 @@ EXPORT_SYMBOL(fw_iso_resources_update);
*/
void fw_iso_resources_free(struct fw_iso_resources *r)
{
- struct fw_card *card = fw_parent_device(r->unit)->card;
+ struct fw_card *card;
int bandwidth, channel;

+ /* Not initialized. */
+ if (r->unit == NULL)
+ return;
+ card = fw_parent_device(r->unit)->card;
+
mutex_lock(&r->mutex);

if (r->allocated) {


2017-08-28 08:54:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 37/84] ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Brodkin <[email protected]>

commit 7d79cee2c6540ea64dd917a14e2fd63d4ac3d3c0 upstream.

It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner

Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.

Reported-by: Vladimir Kondratiev <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
[vgupta: PAR40 regs only written if PAE40 exist]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arc/include/asm/cache.h | 2 ++
arch/arc/mm/cache.c | 13 +++++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)

--- a/arch/arc/include/asm/cache.h
+++ b/arch/arc/include/asm/cache.h
@@ -89,7 +89,9 @@ extern unsigned long perip_base, perip_e
#define ARC_REG_SLC_FLUSH 0x904
#define ARC_REG_SLC_INVALIDATE 0x905
#define ARC_REG_SLC_RGN_START 0x914
+#define ARC_REG_SLC_RGN_START1 0x915
#define ARC_REG_SLC_RGN_END 0x916
+#define ARC_REG_SLC_RGN_END1 0x917

/* Bit val in SLC_CONTROL */
#define SLC_CTRL_IM 0x040
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -562,6 +562,7 @@ noinline void slc_op(phys_addr_t paddr,
static DEFINE_SPINLOCK(lock);
unsigned long flags;
unsigned int ctrl;
+ phys_addr_t end;

spin_lock_irqsave(&lock, flags);

@@ -591,8 +592,16 @@ noinline void slc_op(phys_addr_t paddr,
* END needs to be setup before START (latter triggers the operation)
* END can't be same as START, so add (l2_line_sz - 1) to sz
*/
- write_aux_reg(ARC_REG_SLC_RGN_END, (paddr + sz + l2_line_sz - 1));
- write_aux_reg(ARC_REG_SLC_RGN_START, paddr);
+ end = paddr + sz + l2_line_sz - 1;
+ if (is_pae40_enabled())
+ write_aux_reg(ARC_REG_SLC_RGN_END1, upper_32_bits(end));
+
+ write_aux_reg(ARC_REG_SLC_RGN_END, lower_32_bits(end));
+
+ if (is_pae40_enabled())
+ write_aux_reg(ARC_REG_SLC_RGN_START1, upper_32_bits(paddr));
+
+ write_aux_reg(ARC_REG_SLC_RGN_START, lower_32_bits(paddr));

while (read_aux_reg(ARC_REG_SLC_CTRL) & SLC_CTRL_BUSY);



2017-08-28 08:11:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 34/84] ALSA: core: Fix unexpected error at replacing user TLV

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 88c54cdf61f508ebcf8da2d819f5dfc03e954d1d upstream.

When user tries to replace the user-defined control TLV, the kernel
checks the change of its content via memcmp(). The problem is that
the kernel passes the return value from memcmp() as is. memcmp()
gives a non-zero negative value depending on the comparison result,
and this shall be recognized as an error code.

The patch covers that corner-case, return 1 properly for the changed
TLV.

Fixes: 8aa9b586e420 ("[ALSA] Control API - more robust TLV implementation")
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/core/control.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -1156,7 +1156,7 @@ static int snd_ctl_elem_user_tlv(struct
mutex_lock(&ue->card->user_ctl_lock);
change = ue->tlv_data_size != size;
if (!change)
- change = memcmp(ue->tlv_data, new_data, size);
+ change = memcmp(ue->tlv_data, new_data, size) != 0;
kfree(ue->tlv_data);
ue->tlv_data = new_data;
ue->tlv_data_size = size;


2017-08-28 08:54:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 35/84] ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit bbba6f9d3da357bbabc6fda81e99ff5584500e76 upstream.

Lenovo G50-70 (17aa:3978) with Conexant codec chip requires the
similar workaround for the inverted stereo dmic like other Lenovo
models.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1020657
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_conexant.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -854,6 +854,7 @@ static const struct snd_pci_quirk cxt506
SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x3978, "Lenovo G50-70", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK_VENDOR(0x17aa, "Thinkpad", CXT_FIXUP_THINKPAD_ACPI),
SND_PCI_QUIRK(0x1c06, 0x2011, "Lemote A1004", CXT_PINCFG_LEMOTE_A1004),


2017-08-28 08:55:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 32/84] KVM: x86: block guest protection keys unless the host has them enabled

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <[email protected]>

commit c469268cd523245cc58255f6696e0c295485cb0b upstream.

If the host has protection keys disabled, we cannot read and write the
guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0. Block
the PKU cpuid bit in that case.

This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.

Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kvm/cpuid.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -456,7 +456,7 @@ static inline int __do_cpuid_ent(struct
entry->ecx &= kvm_cpuid_7_0_ecx_x86_features;
cpuid_mask(&entry->ecx, CPUID_7_ECX);
/* PKU is not yet implemented for shadow paging. */
- if (!tdp_enabled)
+ if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
} else {
entry->ebx = 0;


2017-08-28 08:55:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 30/84] KVM: s390: sthyi: fix sthyi inline assembly

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <[email protected]>

commit 4a4eefcd0e49f9f339933324c1bde431186a0a7d upstream.

The sthyi inline assembly misses register r3 within the clobber
list. The sthyi instruction will always write a return code to
register "R2+1", which in this case would be r3. Due to that we may
have register corruption and see host crashes or data corruption
depending on how gcc decided to allocate and use registers during
compile time.

Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/s390/kvm/sthyi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/s390/kvm/sthyi.c
+++ b/arch/s390/kvm/sthyi.c
@@ -394,7 +394,7 @@ static int sthyi(u64 vaddr)
"srl %[cc],28\n"
: [cc] "=d" (cc)
: [code] "d" (code), [addr] "a" (addr)
- : "memory", "cc");
+ : "3", "memory", "cc");
return cc;
}



2017-08-28 08:11:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 27/84] Input: trackpoint - add new trackpoint firmware ID

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Aaron Ma <[email protected]>

commit ec667683c532c93fb41e100e5d61a518971060e2 upstream.

Synaptics add new TP firmware ID: 0x2 and 0x3, for now both lower 2 bits
are indicated as TP. Change the constant to bitwise values.

This makes trackpoint to be recognized on Lenovo Carbon X1 Gen5 instead
of it being identified as "PS/2 Generic Mouse".

Signed-off-by: Aaron Ma <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/trackpoint.c | 3 ++-
drivers/input/mouse/trackpoint.h | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/input/mouse/trackpoint.c
+++ b/drivers/input/mouse/trackpoint.c
@@ -265,7 +265,8 @@ static int trackpoint_start_protocol(str
if (ps2_command(&psmouse->ps2dev, param, MAKE_PS2_CMD(0, 2, TP_READ_ID)))
return -1;

- if (param[0] != TP_MAGIC_IDENT)
+ /* add new TP ID. */
+ if (!(param[0] & TP_MAGIC_IDENT))
return -1;

if (firmware_id)
--- a/drivers/input/mouse/trackpoint.h
+++ b/drivers/input/mouse/trackpoint.h
@@ -21,8 +21,9 @@
#define TP_COMMAND 0xE2 /* Commands start with this */

#define TP_READ_ID 0xE1 /* Sent for device identification */
-#define TP_MAGIC_IDENT 0x01 /* Sent after a TP_READ_ID followed */
+#define TP_MAGIC_IDENT 0x03 /* Sent after a TP_READ_ID followed */
/* by the firmware ID */
+ /* Firmware ID includes 0x1, 0x2, 0x3 */


/*


2017-08-28 08:56:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 29/84] Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masaki Ota <[email protected]>

commit 4a646580f793d19717f7e034c8d473b509c27d49 upstream.

Fixed the issue that two finger scroll does not work correctly
on V8 protocol. The cause is that V8 protocol X-coordinate decode
is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.

Mote notes:
the problem manifests itself by the commit e7348396c6d5 ("Input: ALPS
- fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
protocol was applied. Although the culprit must have been present
beforehand, the two-finger scroll worked casually even with the
wrongly reported values by some reason. It got broken by the commit
above just because it changed x_max value, and this made libinput
correctly figuring the MT events. Since the X coord is reported as
falsely doubled, the events on the right-half side go outside the
boundary, thus they are no longer handled. This resulted as a broken
two-finger scroll.

One finger event is decoded differently, and it didn't suffer from
this problem. The problem was only about MT events. --tiwai

Fixes: e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
Signed-off-by: Masaki Ota <[email protected]>
Tested-by: Takashi Iwai <[email protected]>
Tested-by: Paul Donohue <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/alps.c | 41 +++++++++++++++++++++++++++++++----------
drivers/input/mouse/alps.h | 8 ++++++++
2 files changed, 39 insertions(+), 10 deletions(-)

--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -1212,14 +1212,24 @@ static int alps_decode_ss4_v2(struct alp

case SS4_PACKET_ID_TWO:
if (priv->flags & ALPS_BUTTONPAD) {
- f->mt[0].x = SS4_BTL_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_BTL_MF_X_V2(p, 1);
+ }
f->mt[0].y = SS4_BTL_MF_Y_V2(p, 0);
- f->mt[1].x = SS4_BTL_MF_X_V2(p, 1);
f->mt[1].y = SS4_BTL_MF_Y_V2(p, 1);
} else {
- f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
+ }
f->mt[0].y = SS4_STD_MF_Y_V2(p, 0);
- f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
f->mt[1].y = SS4_STD_MF_Y_V2(p, 1);
}
f->pressure = SS4_MF_Z_V2(p, 0) ? 0x30 : 0;
@@ -1236,16 +1246,27 @@ static int alps_decode_ss4_v2(struct alp

case SS4_PACKET_ID_MULTI:
if (priv->flags & ALPS_BUTTONPAD) {
- f->mt[2].x = SS4_BTL_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ } else {
+ f->mt[2].x = SS4_BTL_MF_X_V2(p, 0);
+ f->mt[3].x = SS4_BTL_MF_X_V2(p, 1);
+ }
+
f->mt[2].y = SS4_BTL_MF_Y_V2(p, 0);
- f->mt[3].x = SS4_BTL_MF_X_V2(p, 1);
f->mt[3].y = SS4_BTL_MF_Y_V2(p, 1);
no_data_x = SS4_MFPACKET_NO_AX_BL;
no_data_y = SS4_MFPACKET_NO_AY_BL;
} else {
- f->mt[2].x = SS4_STD_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
+ }
f->mt[2].y = SS4_STD_MF_Y_V2(p, 0);
- f->mt[3].x = SS4_STD_MF_X_V2(p, 1);
f->mt[3].y = SS4_STD_MF_Y_V2(p, 1);
no_data_x = SS4_MFPACKET_NO_AX;
no_data_y = SS4_MFPACKET_NO_AY;
@@ -2535,8 +2556,8 @@ static int alps_set_defaults_ss4_v2(stru

memset(otp, 0, sizeof(otp));

- if (alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]) ||
- alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]))
+ if (alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]) ||
+ alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]))
return -1;

alps_update_device_area_ss4_v2(otp, priv);
--- a/drivers/input/mouse/alps.h
+++ b/drivers/input/mouse/alps.h
@@ -91,6 +91,10 @@ enum SS4_PACKET_ID {
((_b[1 + _i * 3] << 5) & 0x1F00) \
)

+#define SS4_PLUS_STD_MF_X_V2(_b, _i) (((_b[0 + (_i) * 3] << 4) & 0x0070) | \
+ ((_b[1 + (_i) * 3] << 4) & 0x0F80) \
+ )
+
#define SS4_STD_MF_Y_V2(_b, _i) (((_b[1 + (_i) * 3] << 3) & 0x0010) | \
((_b[2 + (_i) * 3] << 5) & 0x01E0) | \
((_b[2 + (_i) * 3] << 4) & 0x0E00) \
@@ -100,6 +104,10 @@ enum SS4_PACKET_ID {
((_b[0 + (_i) * 3] >> 3) & 0x0010) \
)

+#define SS4_PLUS_BTL_MF_X_V2(_b, _i) (SS4_PLUS_STD_MF_X_V2(_b, _i) | \
+ ((_b[0 + (_i) * 3] >> 4) & 0x0008) \
+ )
+
#define SS4_BTL_MF_Y_V2(_b, _i) (SS4_STD_MF_Y_V2(_b, _i) | \
((_b[0 + (_i) * 3] >> 3) & 0x0008) \
)


2017-08-28 08:56:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 28/84] Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: KT Liao <[email protected]>

commit 1d2226e45040ed4aee95b633cbd64702bf7fc2a1 upstream.

Add ELAN0602 to the list of known ACPI IDs to enable support for ELAN
touchpads found in Lenovo Yoga310.

Signed-off-by: KT Liao <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/elan_i2c_core.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/input/mouse/elan_i2c_core.c
+++ b/drivers/input/mouse/elan_i2c_core.c
@@ -1234,6 +1234,7 @@ static const struct acpi_device_id elan_
{ "ELAN0000", 0 },
{ "ELAN0100", 0 },
{ "ELAN0600", 0 },
+ { "ELAN0602", 0 },
{ "ELAN0605", 0 },
{ "ELAN0608", 0 },
{ "ELAN0605", 0 },


2017-08-28 08:57:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 24/84] bpf, verifier: fix alu ops against map_value{, _adj} register types

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>


[ Upstream commit fce366a9dd0ddc47e7ce05611c266e8574a45116 ]

While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:

i) BPF_AND on r0 (map_value_adj):

0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xbf842a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (57) r0 &= 8
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
9: (95) exit

from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns

ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):

0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xc24eee00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (04) (u32) r0 += (u32) 0
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
9: (95) exit

from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns

Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with 484611357c19 ("bpf: allow access
into map value arrays") perform BPF_ADD only on the destination
register that is PTR_TO_MAP_VALUE_ADJ.

Only for UNKNOWN_VALUE register types such operations make sense,
f.e. with unknown memory content fetched initially from a constant
offset from the map value memory into a register. That register is
then later tested against lower / upper bounds, so that the verifier
can then do the tracking of min_value / max_value, and properly
check once that UNKNOWN_VALUE register is added to the destination
register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
original use-case is solving. Note, tracking on what is being
added is done through adjust_reg_min_max_vals() and later access
to the map value enforced with these boundaries and the given offset
from the insn through check_map_access_adj().

Tests will fail for non-root environment due to prohibited pointer
arithmetic, in particular in check_alu_op(), we bail out on the
is_pointer_value() check on the dst_reg (which is false in root
case as we allow for pointer arithmetic via env->allow_ptr_leaks).

Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
mode BPF_ADD. The test_verifier suite runs fine after the patch
and it also rejects mentioned test cases.

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1870,6 +1870,7 @@ static int check_alu_op(struct bpf_verif
* register as unknown.
*/
if (env->allow_ptr_leaks &&
+ BPF_CLASS(insn->code) == BPF_ALU64 && opcode == BPF_ADD &&
(dst_reg->type == PTR_TO_MAP_VALUE ||
dst_reg->type == PTR_TO_MAP_VALUE_ADJ))
dst_reg->type = PTR_TO_MAP_VALUE_ADJ;


2017-08-28 08:57:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 22/84] bpf, verifier: add additional patterns to evaluate_reg_imm_alu

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: John Fastabend <[email protected]>


[ Upstream commit 43188702b3d98d2792969a3377a30957f05695e6 ]

Currently the verifier does not track imm across alu operations when
the source register is of unknown type. This adds additional pattern
matching to catch this and track imm. We've seen LLVM generating this
pattern while working on cilium.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1467,6 +1467,65 @@ static int evaluate_reg_alu(struct bpf_v
return 0;
}

+static int evaluate_reg_imm_alu_unknown(struct bpf_verifier_env *env,
+ struct bpf_insn *insn)
+{
+ struct bpf_reg_state *regs = env->cur_state.regs;
+ struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
+ struct bpf_reg_state *src_reg = &regs[insn->src_reg];
+ u8 opcode = BPF_OP(insn->code);
+ s64 imm_log2 = __ilog2_u64((long long)dst_reg->imm);
+
+ /* BPF_X code with src_reg->type UNKNOWN_VALUE here. */
+ if (src_reg->imm > 0 && dst_reg->imm) {
+ switch (opcode) {
+ case BPF_ADD:
+ /* dreg += sreg
+ * where both have zero upper bits. Adding them
+ * can only result making one more bit non-zero
+ * in the larger value.
+ * Ex. 0xffff (imm=48) + 1 (imm=63) = 0x10000 (imm=47)
+ * 0xffff (imm=48) + 0xffff = 0x1fffe (imm=47)
+ */
+ dst_reg->imm = min(src_reg->imm, 63 - imm_log2);
+ dst_reg->imm--;
+ break;
+ case BPF_AND:
+ /* dreg &= sreg
+ * AND can not extend zero bits only shrink
+ * Ex. 0x00..00ffffff
+ * & 0x0f..ffffffff
+ * ----------------
+ * 0x00..00ffffff
+ */
+ dst_reg->imm = max(src_reg->imm, 63 - imm_log2);
+ break;
+ case BPF_OR:
+ /* dreg |= sreg
+ * OR can only extend zero bits
+ * Ex. 0x00..00ffffff
+ * | 0x0f..ffffffff
+ * ----------------
+ * 0x0f..00ffffff
+ */
+ dst_reg->imm = min(src_reg->imm, 63 - imm_log2);
+ break;
+ case BPF_SUB:
+ case BPF_MUL:
+ case BPF_RSH:
+ case BPF_LSH:
+ /* These may be flushed out later */
+ default:
+ mark_reg_unknown_value(regs, insn->dst_reg);
+ }
+ } else {
+ mark_reg_unknown_value(regs, insn->dst_reg);
+ }
+
+ dst_reg->type = UNKNOWN_VALUE;
+ return 0;
+}
+
static int evaluate_reg_imm_alu(struct bpf_verifier_env *env,
struct bpf_insn *insn)
{
@@ -1475,6 +1534,9 @@ static int evaluate_reg_imm_alu(struct b
struct bpf_reg_state *src_reg = &regs[insn->src_reg];
u8 opcode = BPF_OP(insn->code);

+ if (BPF_SRC(insn->code) == BPF_X && src_reg->type == UNKNOWN_VALUE)
+ return evaluate_reg_imm_alu_unknown(env, insn);
+
/* dst_reg->type == CONST_IMM here, simulate execution of 'add' insn.
* Don't care about overflow or negative values, just add them
*/


2017-08-28 08:58:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 20/84] net: sched: fix NULL pointer dereference when action calls some targets

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 4f8a881acc9d1adaf1e552349a0b1df28933a04c ]

As we know in some target's checkentry it may dereference par.entryinfo
to check entry stuff inside. But when sched action calls xt_check_target,
par.entryinfo is set with NULL. It would cause kernel panic when calling
some targets.

It can be reproduce with:
# tc qd add dev eth1 ingress handle ffff:
# tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
-j ECN --ecn-tcp-remove

It could also crash kernel when using target CLUSTERIP or TPROXY.

By now there's no proper value for par.entryinfo in ipt_init_target,
but it can not be set with NULL. This patch is to void all these
panics by setting it with an ipt_entry obj with all members = 0.

Note that this issue has been there since the very beginning.

Signed-off-by: Xin Long <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/act_ipt.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -41,6 +41,7 @@ static int ipt_init_target(struct xt_ent
{
struct xt_tgchk_param par;
struct xt_target *target;
+ struct ipt_entry e = {};
int ret = 0;

target = xt_request_find_target(AF_INET, t->u.user.name,
@@ -51,6 +52,7 @@ static int ipt_init_target(struct xt_ent
t->u.kernel.target = target;
memset(&par, 0, sizeof(par));
par.table = table;
+ par.entryinfo = &e;
par.target = target;
par.targinfo = t->data;
par.hook_mask = hook;


2017-08-28 08:59:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 07/84] net_sched: remove warning from qdisc_hash_add

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <[email protected]>


[ Upstream commit c90e95147c27b1780e76c6e8fea1b5c78d7d387f ]

It was added in commit e57a784d8cae ("pkt_sched: set root qdisc
before change() in attach_default_qdiscs()") to hide duplicates
from "tc qdisc show" for incative deivices.

After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
it triggered when classful qdisc is added to inactive device because
default qdiscs are added before switching root qdisc.

Anyway after commit ea3274695353 ("net: sched: avoid duplicates in
qdisc dump") duplicates are filtered right in dumper.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_api.c | 3 ---
1 file changed, 3 deletions(-)

--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -277,9 +277,6 @@ static struct Qdisc *qdisc_match_from_ro
void qdisc_hash_add(struct Qdisc *q)
{
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
- struct Qdisc *root = qdisc_dev(q)->qdisc;
-
- WARN_ON_ONCE(root == &noop_qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
}


2017-08-28 08:59:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 05/84] ipv4: fix NULL dereference in free_fib_info_rcu()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 187e5b3ac84d3421d2de3aca949b2791fbcad554 ]

If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :

m = fi->fib_metrics;
if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
kfree(m);

Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.

Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.

This was reported by syzkaller team.

Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/fib_semantics.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1044,15 +1044,17 @@ struct fib_info *fib_create_info(struct
fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL);
if (!fi)
goto failure;
- fib_info_cnt++;
if (cfg->fc_mx) {
fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL);
- if (!fi->fib_metrics)
- goto failure;
+ if (unlikely(!fi->fib_metrics)) {
+ kfree(fi);
+ return ERR_PTR(err);
+ }
atomic_set(&fi->fib_metrics->refcnt, 1);
- } else
+ } else {
fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics;
-
+ }
+ fib_info_cnt++;
fi->fib_net = net;
fi->fib_protocol = cfg->fc_protocol;
fi->fib_scope = cfg->fc_scope;


2017-08-28 09:00:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 02/84] af_key: do not use GFP_KERNEL in atomic contexts

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 36f41f8fc6d8aa9f8c9072d66ff7cf9055f5e69b ]

pfkey_broadcast() might be called from non process contexts,
we can not use GFP_KERNEL in these cases [1].

This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in
af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
section.

[1] : syzkaller reported :

in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
3 locks held by syzkaller183439/2932:
#0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
#1: (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
#2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
#2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
__might_sleep+0x95/0x190 kernel/sched/core.c:5947
slab_pre_alloc_hook mm/slab.h:416 [inline]
slab_alloc mm/slab.c:3383 [inline]
kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
dump_sp+0x3d6/0x500 net/key/af_key.c:2685
xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
pfkey_process+0x606/0x710 net/key/af_key.c:2814
pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x755/0x890 net/socket.c:2035
__sys_sendmsg+0xe5/0x210 net/socket.c:2069
SYSC_sendmsg net/socket.c:2080 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2076
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x445d79
RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000

Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: David Ahern <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/key/af_key.c | 48 ++++++++++++++++++++++++++----------------------
1 file changed, 26 insertions(+), 22 deletions(-)

--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -228,7 +228,7 @@ static int pfkey_broadcast_one(struct sk
#define BROADCAST_ONE 1
#define BROADCAST_REGISTERED 2
#define BROADCAST_PROMISC_ONLY 4
-static int pfkey_broadcast(struct sk_buff *skb,
+static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
int broadcast_flags, struct sock *one_sk,
struct net *net)
{
@@ -278,7 +278,7 @@ static int pfkey_broadcast(struct sk_buf
rcu_read_unlock();

if (one_sk != NULL)
- err = pfkey_broadcast_one(skb, &skb2, GFP_KERNEL, one_sk);
+ err = pfkey_broadcast_one(skb, &skb2, allocation, one_sk);

kfree_skb(skb2);
kfree_skb(skb);
@@ -311,7 +311,7 @@ static int pfkey_do_dump(struct pfkey_so
hdr = (struct sadb_msg *) pfk->dump.skb->data;
hdr->sadb_msg_seq = 0;
hdr->sadb_msg_errno = rc;
- pfkey_broadcast(pfk->dump.skb, BROADCAST_ONE,
+ pfkey_broadcast(pfk->dump.skb, GFP_ATOMIC, BROADCAST_ONE,
&pfk->sk, sock_net(&pfk->sk));
pfk->dump.skb = NULL;
}
@@ -355,7 +355,7 @@ static int pfkey_error(const struct sadb
hdr->sadb_msg_len = (sizeof(struct sadb_msg) /
sizeof(uint64_t));

- pfkey_broadcast(skb, BROADCAST_ONE, sk, sock_net(sk));
+ pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ONE, sk, sock_net(sk));

return 0;
}
@@ -1396,7 +1396,7 @@ static int pfkey_getspi(struct sock *sk,

xfrm_state_put(x);

- pfkey_broadcast(resp_skb, BROADCAST_ONE, sk, net);
+ pfkey_broadcast(resp_skb, GFP_KERNEL, BROADCAST_ONE, sk, net);

return 0;
}
@@ -1483,7 +1483,7 @@ static int key_notify_sa(struct xfrm_sta
hdr->sadb_msg_seq = c->seq;
hdr->sadb_msg_pid = c->portid;

- pfkey_broadcast(skb, BROADCAST_ALL, NULL, xs_net(x));
+ pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL, xs_net(x));

return 0;
}
@@ -1596,7 +1596,7 @@ static int pfkey_get(struct sock *sk, st
out_hdr->sadb_msg_reserved = 0;
out_hdr->sadb_msg_seq = hdr->sadb_msg_seq;
out_hdr->sadb_msg_pid = hdr->sadb_msg_pid;
- pfkey_broadcast(out_skb, BROADCAST_ONE, sk, sock_net(sk));
+ pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk, sock_net(sk));

return 0;
}
@@ -1701,8 +1701,8 @@ static int pfkey_register(struct sock *s
return -ENOBUFS;
}

- pfkey_broadcast(supp_skb, BROADCAST_REGISTERED, sk, sock_net(sk));
-
+ pfkey_broadcast(supp_skb, GFP_KERNEL, BROADCAST_REGISTERED, sk,
+ sock_net(sk));
return 0;
}

@@ -1720,7 +1720,8 @@ static int unicast_flush_resp(struct soc
hdr->sadb_msg_errno = (uint8_t) 0;
hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t));

- return pfkey_broadcast(skb, BROADCAST_ONE, sk, sock_net(sk));
+ return pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ONE, sk,
+ sock_net(sk));
}

static int key_notify_sa_flush(const struct km_event *c)
@@ -1741,7 +1742,7 @@ static int key_notify_sa_flush(const str
hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t));
hdr->sadb_msg_reserved = 0;

- pfkey_broadcast(skb, BROADCAST_ALL, NULL, c->net);
+ pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net);

return 0;
}
@@ -1798,7 +1799,7 @@ static int dump_sa(struct xfrm_state *x,
out_hdr->sadb_msg_pid = pfk->dump.msg_portid;

if (pfk->dump.skb)
- pfkey_broadcast(pfk->dump.skb, BROADCAST_ONE,
+ pfkey_broadcast(pfk->dump.skb, GFP_ATOMIC, BROADCAST_ONE,
&pfk->sk, sock_net(&pfk->sk));
pfk->dump.skb = out_skb;

@@ -1886,7 +1887,7 @@ static int pfkey_promisc(struct sock *sk
new_hdr->sadb_msg_errno = 0;
}

- pfkey_broadcast(skb, BROADCAST_ALL, NULL, sock_net(sk));
+ pfkey_broadcast(skb, GFP_KERNEL, BROADCAST_ALL, NULL, sock_net(sk));
return 0;
}

@@ -2219,7 +2220,7 @@ static int key_notify_policy(struct xfrm
out_hdr->sadb_msg_errno = 0;
out_hdr->sadb_msg_seq = c->seq;
out_hdr->sadb_msg_pid = c->portid;
- pfkey_broadcast(out_skb, BROADCAST_ALL, NULL, xp_net(xp));
+ pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ALL, NULL, xp_net(xp));
return 0;

}
@@ -2439,7 +2440,7 @@ static int key_pol_get_resp(struct sock
out_hdr->sadb_msg_errno = 0;
out_hdr->sadb_msg_seq = hdr->sadb_msg_seq;
out_hdr->sadb_msg_pid = hdr->sadb_msg_pid;
- pfkey_broadcast(out_skb, BROADCAST_ONE, sk, xp_net(xp));
+ pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_ONE, sk, xp_net(xp));
err = 0;

out:
@@ -2695,7 +2696,7 @@ static int dump_sp(struct xfrm_policy *x
out_hdr->sadb_msg_pid = pfk->dump.msg_portid;

if (pfk->dump.skb)
- pfkey_broadcast(pfk->dump.skb, BROADCAST_ONE,
+ pfkey_broadcast(pfk->dump.skb, GFP_ATOMIC, BROADCAST_ONE,
&pfk->sk, sock_net(&pfk->sk));
pfk->dump.skb = out_skb;

@@ -2752,7 +2753,7 @@ static int key_notify_policy_flush(const
hdr->sadb_msg_satype = SADB_SATYPE_UNSPEC;
hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t));
hdr->sadb_msg_reserved = 0;
- pfkey_broadcast(skb_out, BROADCAST_ALL, NULL, c->net);
+ pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net);
return 0;

}
@@ -2814,7 +2815,7 @@ static int pfkey_process(struct sock *sk
void *ext_hdrs[SADB_EXT_MAX];
int err;

- pfkey_broadcast(skb_clone(skb, GFP_KERNEL),
+ pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL,
BROADCAST_PROMISC_ONLY, NULL, sock_net(sk));

memset(ext_hdrs, 0, sizeof(ext_hdrs));
@@ -3036,7 +3037,8 @@ static int key_notify_sa_expire(struct x
out_hdr->sadb_msg_seq = 0;
out_hdr->sadb_msg_pid = 0;

- pfkey_broadcast(out_skb, BROADCAST_REGISTERED, NULL, xs_net(x));
+ pfkey_broadcast(out_skb, GFP_ATOMIC, BROADCAST_REGISTERED, NULL,
+ xs_net(x));
return 0;
}

@@ -3226,7 +3228,8 @@ static int pfkey_send_acquire(struct xfr
xfrm_ctx->ctx_len);
}

- return pfkey_broadcast(skb, BROADCAST_REGISTERED, NULL, xs_net(x));
+ return pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_REGISTERED, NULL,
+ xs_net(x));
}

static struct xfrm_policy *pfkey_compile_policy(struct sock *sk, int opt,
@@ -3424,7 +3427,8 @@ static int pfkey_send_new_mapping(struct
n_port->sadb_x_nat_t_port_port = sport;
n_port->sadb_x_nat_t_port_reserved = 0;

- return pfkey_broadcast(skb, BROADCAST_REGISTERED, NULL, xs_net(x));
+ return pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_REGISTERED, NULL,
+ xs_net(x));
}

#ifdef CONFIG_NET_KEY_MIGRATE
@@ -3616,7 +3620,7 @@ static int pfkey_send_migrate(const stru
}

/* broadcast migrate message to sockets */
- pfkey_broadcast(skb, BROADCAST_ALL, NULL, &init_net);
+ pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL, &init_net);

return 0;



2017-08-28 09:00:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 01/84] sparc64: remove unnecessary log message

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

[ Upstream commit 6170a506899aee3dd4934c928426505e47b1b466 ]

There is no need to log message if ATU hvapi couldn't get register.
Unlike PCI hvapi, ATU hvapi registration failure is not hard error.
Even if ATU hvapi registration fails (on system with ATU or without
ATU) system continues with legacy IOMMU. So only log message when
ATU hvapi successfully get registered.

Signed-off-by: Tushar Dave <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/kernel/pci_sun4v.c | 2 --
1 file changed, 2 deletions(-)

--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -1240,8 +1240,6 @@ static int pci_sun4v_probe(struct platfo
* ATU group, but ATU hcalls won't be available.
*/
hv_atu = false;
- pr_err(PFX "Could not register hvapi ATU err=%d\n",
- err);
} else {
pr_info(PFX "Registered hvapi ATU major[%lu] minor[%lu]\n",
vatu_major, vatu_minor);


2017-08-28 19:39:47

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/84] 4.9.46-stable review

On 08/28/2017 02:04 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.46 release.
> There are 84 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 30 08:05:10 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.46-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-08-29 00:10:49

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/84] 4.9.46-stable review

On 08/28/2017 01:04 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.46 release.
> There are 84 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 30 08:05:10 UTC 2017.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2017-08-29 15:16:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/84] 4.9.46-stable review

On Tue, Aug 29, 2017 at 05:32:16PM +0530, Sumit Semwal wrote:
> Hi Greg,
>
> On 28 August 2017 at 13:34, Greg Kroah-Hartman
> <[email protected]> wrote:
> > This is the start of the stable review cycle for the 4.9.46 release.
> > There are 84 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed Aug 30 08:05:10 UTC 2017.
> > Anything received after that time might be too late.
>
> For ARM64, built and boot-tested with defconfig on hikey; no regressions noted.

Thanks for testing 2 of these and letting me know.

greg k-h

2017-08-29 12:02:41

by Sumit Semwal

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/84] 4.9.46-stable review

Hi Greg,

On 28 August 2017 at 13:34, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.9.46 release.
> There are 84 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 30 08:05:10 UTC 2017.
> Anything received after that time might be too late.

For ARM64, built and boot-tested with defconfig on hikey; no regressions noted.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.46-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Best,
Sumit.