2018-01-01 14:38:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 000/146] 4.14.11-stable review

This is the start of the stable review cycle for the 4.14.11 release.
There are 146 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.11-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.14.11-rc1

Johan Hovold <[email protected]>
tty: fix tty_ldisc_receive_buf() documentation

Linus Torvalds <[email protected]>
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)

Thomas Gleixner <[email protected]>
x86/ldt: Make LDT pgtable free conditional

Thomas Gleixner <[email protected]>
x86/ldt: Plug memory leak in error path

Andy Lutomirski <[email protected]>
x86/espfix/64: Fix espfix double-fault handling on 5-level systems

Linus Torvalds <[email protected]>
x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

Thomas Gleixner <[email protected]>
x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()

Thomas Gleixner <[email protected]>
x86/smpboot: Remove stale TLB flush invocations

Thomas Gleixner <[email protected]>
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()

Sushmita Susheelendra <[email protected]>
staging: android: ion: Fix dma direction for dma_sync_sg_for_cpu/device

Sudeep Holla <[email protected]>
drivers: base: cacheinfo: fix cache type for non-architected system cache

Johan Hovold <[email protected]>
phy: tegra: fix device-tree node lookups

Todd Kjos <[email protected]>
binder: fix proc->files use-after-free

Thomas Gleixner <[email protected]>
timers: Reinitialize per cpu bases on hotplug

Thomas Gleixner <[email protected]>
timers: Invoke timer_start_debug() where it makes sense

Anna-Maria Gleixner <[email protected]>
timers: Use deferrable base independent of base::nohz_active

Daniel Thompson <[email protected]>
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201

Mathias Nyman <[email protected]>
USB: Fix off by one in type-specific length check of BOS SSP capability

Oliver Neukum <[email protected]>
usb: add RESET_RESUME for ELSA MicroLink 56K

Dmitry Fleytman Dmitry Fleytman <[email protected]>
usb: Add device quirk for Logitech HD Pro Webcam C925e

SZ Lin (林上智) <[email protected]>
USB: serial: option: adding support for YUGA CLM920-NC5

Daniele Palmas <[email protected]>
USB: serial: option: add support for Telit ME910 PID 0x1101

Reinhard Speyerer <[email protected]>
USB: serial: qcserial: add Sierra Wireless EM7565

Max Schulze <[email protected]>
USB: serial: ftdi_sio: add id for Airbus DS P8GR

Johan Hovold <[email protected]>
USB: chipidea: msm: fix ulpi-node lookup

Shuah Khan <[email protected]>
usbip: vhci: stop printing kernel pointer addresses in messages

Shuah Khan <[email protected]>
usbip: stub: stop printing kernel pointer addresses in messages

Shuah Khan <[email protected]>
usbip: prevent leaking socket pointer address in messages

Juan Zea <[email protected]>
usbip: fix usbip bind writing random string after command in match_busid

Jan Engelhardt <[email protected]>
sparc64: repair calling incorrect hweight function from stubs

Willem de Bruijn <[email protected]>
skbuff: in skb_copy_ubufs unclone before releasing zerocopy

Willem de Bruijn <[email protected]>
skbuff: skb_copy_ubufs must release uarg even without user frags

Willem de Bruijn <[email protected]>
skbuff: orphan frags before zerocopy clone

Saeed Mahameed <[email protected]>
Revert "mlx5: move affinity hints assignments to generic code"

Nicolas Dichtel <[email protected]>
ipv6: set all.accept_dad to 0 by default

Phil Sutter <[email protected]>
ipv4: fib: Fix metrics match when deleting a route

Russell King <[email protected]>
phylink: ensure AN is enabled

Russell King <[email protected]>
phylink: ensure the PHY interface mode is appropriately set

Calvin Owens <[email protected]>
bnxt_en: Fix sources of spurious netpoll warnings

Jiri Pirko <[email protected]>
net: sched: fix static key imbalance in case of ingress/clsact_init error

Alexey Kodanev <[email protected]>
vxlan: restore dev->mtu setting based on lower device

Kamal Heib <[email protected]>
net/mlx5: FPGA, return -EINVAL if size is zero

Eric Dumazet <[email protected]>
tcp: refresh tcp_mstamp from timers callbacks

Ido Schimmel <[email protected]>
ipv6: Honor specified parameters in fibmatch lookup

Zhao Qiang <[email protected]>
net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well.

Wei Wang <[email protected]>
tcp: fix potential underestimation on rcv_rtt

Yuval Mintz <[email protected]>
mlxsw: spectrum: Disable MAC learning for ovs port

Parthasarathy Bhuvaragan <[email protected]>
tipc: fix hanging poll() for stream sockets

Xin Long <[email protected]>
sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams

Julian Wiedmann <[email protected]>
s390/qeth: fix error handling in checksum cmd callback

Florian Fainelli <[email protected]>
net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY

Bert Kenward <[email protected]>
sfc: pass valid pointers from efx_enqueue_unwind

Eric Garver <[email protected]>
openvswitch: Fix pop_vlan action for double tagged frames

Moni Shoua <[email protected]>
net/mlx5: Fix error flow in CREATE_QP command

Gal Pressman <[email protected]>
net/mlx5e: Prevent possible races in VXLAN control flow

Gal Pressman <[email protected]>
net/mlx5e: Add refcount to VXLAN structure

Gal Pressman <[email protected]>
net/mlx5e: Fix features check of IPv6 traffic

Gal Pressman <[email protected]>
net/mlx5e: Fix possible deadlock of VXLAN lock

Eran Ben Elisha <[email protected]>
net/mlx5: Fix rate limit packet pacing naming and struct

Yousuk Seung <[email protected]>
tcp: invalidate rate samples during SACK reneging

Willem de Bruijn <[email protected]>
sock: free skb in skb_complete_tx_timestamp on error

Grygorii Strashko <[email protected]>
net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround

Eric W. Biederman <[email protected]>
net: Fix double free and memory corruption in get_net_ns_by_id()

Nikolay Aleksandrov <[email protected]>
net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks

Ido Schimmel <[email protected]>
ipv4: Fix use-after-free when flushing FIB tables

Alexey Kodanev <[email protected]>
ip6_gre: fix device features for ioctl setup

Nikita V. Shirokov <[email protected]>
adding missing rcu_read_unlock in ipxip6_rcv

Tonghao Zhang <[email protected]>
sctp: Replace use of sockets_allocated with specified macro.

Tobias Jordan <[email protected]>
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case

Mohamed Ghannam <[email protected]>
net: ipv4: fix for a race condition in raw_sendmsg

Julian Wiedmann <[email protected]>
s390/qeth: update takeover IPs after configuration change

Julian Wiedmann <[email protected]>
s390/qeth: lock IP table while applying takeover changes

Julian Wiedmann <[email protected]>
s390/qeth: don't apply takeover changes to RXIP

Julian Wiedmann <[email protected]>
s390/qeth: apply takeover changes when mode is toggled

Neal Cardwell <[email protected]>
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo

Neal Cardwell <[email protected]>
tcp_bbr: reset full pipe detection on loss recovery undo

Brian King <[email protected]>
tg3: Fix rx hang on MTU change with 5717/5719

Christoph Paasch <[email protected]>
tcp md5sig: Use skb's saddr when replying to an incoming segment

Neal Cardwell <[email protected]>
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit

Avinash Repaka <[email protected]>
RDS: Check cmsg_len before dereferencing CMSG_DATA

Michael S. Tsirkin <[email protected]>
ptr_ring: add barriers

Shaohua Li <[email protected]>
net: reevalulate autoflowlabel setting after sysctl setting

Sebastian Sjoholm <[email protected]>
net: qmi_wwan: add Sierra EM7565 1199:9091

Kevin Cernekee <[email protected]>
netlink: Add netns check on taps

Kevin Cernekee <[email protected]>
net: igmp: Use correct source address on IGMPv3 reports

Fugang Duan <[email protected]>
net: fec: unmap the xmit buffer that are not transferred by DMA

Eric Dumazet <[email protected]>
ipv6: mcast: better catch silly mtu values

Eric Dumazet <[email protected]>
ipv4: igmp: guard against silly MTU values

Linus Torvalds <[email protected]>
kbuild: add '-fno-stack-check' to kernel build options

Ming Lei <[email protected]>
block: don't let passthrough IO go into .make_request_fn()

Jens Axboe <[email protected]>
block: fix blk_rq_append_bio

Joel Fernandes <[email protected]>
cpufreq: schedutil: Use idle_calls counter of the remote CPU

Takashi Iwai <[email protected]>
ALSA: hda - Fix missing COEF init for ALC225/295/299

Hui Wang <[email protected]>
ALSA: hda - fix headset mic detection issue on a Dell machine

Hui Wang <[email protected]>
ALSA: hda - change the location for one mic on a Lenovo machine

Hui Wang <[email protected]>
ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines

Takashi Iwai <[email protected]>
ALSA: hda: Drop useless WARN_ON()

Moni Shoua <[email protected]>
IB/core: Verify that QP is security enabled in create and destroy

Moni Shoua <[email protected]>
IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()

Majd Dibbiny <[email protected]>
IB/mlx5: Serialize access to the VMA list

Michael J. Ruhl <[email protected]>
IB/hfi: Only read capability registers if the capability exists

Christophe Leroy <[email protected]>
gpio: fix "gpio-line-names" property retrieval

Andrew F. Davis <[email protected]>
ASoC: tlv320aic31xx: Fix GPIO1 register definition

Johan Hovold <[email protected]>
ASoC: twl4030: fix child-node lookup

Maciej S. Szmigiero <[email protected]>
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure

Johan Hovold <[email protected]>
ASoC: da7218: fix fix child-node lookup

Ben Hutchings <[email protected]>
ASoC: wm_adsp: Fix validation of firmware and coeff lengths

Srinivas Kandagatla <[email protected]>
ASoC: codecs: msm8916-wcd: Fix supported formats

Steve Wise <[email protected]>
iw_cxgb4: Only validate the MSN for successful completions

Steven Rostedt (VMware) <[email protected]>
ring-buffer: Do no reuse reader page if still in use

Steven Rostedt (VMware) <[email protected]>
ring-buffer: Mask out the info bits when returning buffer page length

Thomas Gleixner <[email protected]>
x86/ldt: Make the LDT mapping RO

Thomas Gleixner <[email protected]>
x86/mm/dump_pagetables: Allow dumping current pagetables

Thomas Gleixner <[email protected]>
x86/mm/dump_pagetables: Check user space page table for WX pages

Borislav Petkov <[email protected]>
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy

Dave Hansen <[email protected]>
x86/mm/pti: Add Kconfig

Vlastimil Babka <[email protected]>
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled

Peter Zijlstra <[email protected]>
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming

Dave Hansen <[email protected]>
x86/mm: Use INVPCID for __native_flush_tlb_single()

Peter Zijlstra <[email protected]>
x86/mm: Optimize RESTORE_CR3

Peter Zijlstra <[email protected]>
x86/mm: Use/Fix PCID to optimize user/kernel switches

Dave Hansen <[email protected]>
x86/mm: Abstract switching CR3

Dave Hansen <[email protected]>
x86/mm: Allow flushing for future ASID switches

Andy Lutomirski <[email protected]>
x86/pti: Map the vsyscall page if needed

Andy Lutomirski <[email protected]>
x86/pti: Put the LDT in its own PGD if PTI is on

Andy Lutomirski <[email protected]>
x86/mm/64: Make a full PGD-entry size hole in the memory map

Hugh Dickins <[email protected]>
x86/events/intel/ds: Map debug buffers in cpu_entry_area

Thomas Gleixner <[email protected]>
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area

Andy Lutomirski <[email protected]>
x86/mm/pti: Map ESPFIX into user space

Thomas Gleixner <[email protected]>
x86/mm/pti: Share entry text PMD

Thomas Gleixner <[email protected]>
x86/entry: Align entry text section to PMD boundary

Andy Lutomirski <[email protected]>
x86/mm/pti: Share cpu_entry_area with user space page tables

Thomas Gleixner <[email protected]>
x86/mm/pti: Force entry through trampoline when PTI active

Andy Lutomirski <[email protected]>
x86/mm/pti: Add functions to clone kernel PMDs

Dave Hansen <[email protected]>
x86/mm/pti: Populate user PGD

Dave Hansen <[email protected]>
x86/mm/pti: Allocate a separate user PGD

Dave Hansen <[email protected]>
x86/mm/pti: Allow NX poison to be set in p4d/pgd

Dave Hansen <[email protected]>
x86/mm/pti: Add mapping helper functions

Borislav Petkov <[email protected]>
x86/pti: Add the pti= cmdline option and documentation

Thomas Gleixner <[email protected]>
x86/mm/pti: Add infrastructure for page table isolation

Dave Hansen <[email protected]>
x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching

Dave Hansen <[email protected]>
x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y

Thomas Gleixner <[email protected]>
x86/cpufeatures: Add X86_BUG_CPU_INSECURE

Jing Xia <[email protected]>
tracing: Fix crash when it fails to alloc ring buffer

Steven Rostedt (VMware) <[email protected]>
tracing: Fix possible double free on failure of allocating trace buffer

Steven Rostedt (VMware) <[email protected]>
tracing: Remove extra zeroing out of the ring buffer page


-------------

Diffstat:

Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/x86/x86_64/mm.txt | 5 +-
Makefile | 7 +-
arch/sparc/lib/hweight.S | 4 +-
arch/x86/boot/compressed/pagetable.c | 3 +
arch/x86/entry/calling.h | 145 ++++++++
arch/x86/entry/entry_64.S | 48 ++-
arch/x86/entry/entry_64_compat.S | 24 +-
arch/x86/entry/vsyscall/vsyscall_64.c | 6 +-
arch/x86/events/intel/ds.c | 130 ++++---
arch/x86/events/perf_event.h | 23 +-
arch/x86/include/asm/cpu_entry_area.h | 13 +
arch/x86/include/asm/cpufeatures.h | 4 +-
arch/x86/include/asm/desc.h | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/intel_ds.h | 36 ++
arch/x86/include/asm/mmu_context.h | 59 +++-
arch/x86/include/asm/pgalloc.h | 11 +
arch/x86/include/asm/pgtable.h | 30 +-
arch/x86/include/asm/pgtable_64.h | 92 +++++
arch/x86/include/asm/pgtable_64_types.h | 8 +-
arch/x86/include/asm/processor-flags.h | 5 +
arch/x86/include/asm/processor.h | 23 +-
arch/x86/include/asm/pti.h | 14 +
arch/x86/include/asm/tlbflush.h | 208 +++++++++--
arch/x86/include/asm/vsyscall.h | 1 +
arch/x86/include/uapi/asm/processor-flags.h | 7 +-
arch/x86/kernel/asm-offsets.c | 4 +
arch/x86/kernel/cpu/common.c | 9 +-
arch/x86/kernel/dumpstack.c | 6 +-
arch/x86/kernel/head_64.S | 30 +-
arch/x86/kernel/ldt.c | 151 +++++++-
arch/x86/kernel/machine_kexec_32.c | 4 +-
arch/x86/kernel/smpboot.c | 9 -
arch/x86/kernel/tls.c | 11 +-
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/vmlinux.lds.S | 8 +
arch/x86/mm/Makefile | 7 +-
arch/x86/mm/cpu_entry_area.c | 27 ++
arch/x86/mm/debug_pagetables.c | 80 ++++-
arch/x86/mm/dump_pagetables.c | 43 ++-
arch/x86/mm/init.c | 80 +++--
arch/x86/mm/pgtable.c | 5 +-
arch/x86/mm/pti.c | 387 +++++++++++++++++++++
arch/x86/mm/tlb.c | 58 ++-
arch/x86/platform/efi/efi_64.c | 5 +-
block/blk-map.c | 38 +-
block/bounce.c | 6 +-
drivers/android/binder.c | 44 ++-
drivers/base/cacheinfo.c | 13 +
drivers/gpio/gpiolib-acpi.c | 2 +-
drivers/gpio/gpiolib-devprop.c | 17 +-
drivers/gpio/gpiolib-of.c | 3 +-
drivers/gpio/gpiolib.h | 3 +-
drivers/infiniband/core/security.c | 3 +
drivers/infiniband/core/uverbs_cmd.c | 4 +-
drivers/infiniband/core/verbs.c | 3 +-
drivers/infiniband/hw/cxgb4/cq.c | 6 +-
drivers/infiniband/hw/hfi1/hfi.h | 1 -
drivers/infiniband/hw/hfi1/pcie.c | 30 +-
drivers/infiniband/hw/mlx5/main.c | 8 +
drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 +
drivers/net/dsa/bcm_sf2.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +-
drivers/net/ethernet/broadcom/tg3.c | 4 +-
drivers/net/ethernet/freescale/fec_main.c | 6 +
drivers/net/ethernet/marvell/mvmdio.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 48 +--
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c | 6 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 75 +++-
drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/rl.c | 22 +-
drivers/net/ethernet/mellanox/mlx5/core/vxlan.c | 64 ++--
drivers/net/ethernet/mellanox/mlx5/core/vxlan.h | 1 +
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 18 +
drivers/net/ethernet/sfc/tx.c | 5 +-
drivers/net/phy/marvell.c | 2 +-
drivers/net/phy/micrel.c | 1 +
drivers/net/phy/phylink.c | 2 +
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/vxlan.c | 5 +
drivers/phy/tegra/xusb.c | 58 +--
drivers/s390/net/qeth_core.h | 6 +-
drivers/s390/net/qeth_core_main.c | 15 +-
drivers/s390/net/qeth_l3.h | 2 +-
drivers/s390/net/qeth_l3_main.c | 36 +-
drivers/s390/net/qeth_l3_sys.c | 75 ++--
drivers/scsi/osd/osd_initiator.c | 4 +-
drivers/staging/android/ion/ion.c | 4 +-
drivers/target/target_core_pscsi.c | 4 +-
drivers/tty/n_tty.c | 4 +-
drivers/tty/tty_buffer.c | 2 +-
drivers/usb/chipidea/ci_hdrc_msm.c | 2 +-
drivers/usb/core/config.c | 2 +-
drivers/usb/core/quirks.c | 6 +-
drivers/usb/host/xhci-pci.c | 3 +
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 +
drivers/usb/serial/option.c | 17 +
drivers/usb/serial/qcserial.c | 3 +
drivers/usb/usbip/stub_dev.c | 3 +-
drivers/usb/usbip/stub_main.c | 5 +-
drivers/usb/usbip/stub_rx.c | 7 +-
drivers/usb/usbip/stub_tx.c | 6 +-
drivers/usb/usbip/usbip_common.c | 16 +-
drivers/usb/usbip/vhci_hcd.c | 12 +-
drivers/usb/usbip/vhci_rx.c | 23 +-
drivers/usb/usbip/vhci_tx.c | 3 +-
include/linux/blkdev.h | 23 +-
include/linux/cpuhotplug.h | 2 +-
include/linux/ipv6.h | 3 +-
include/linux/mlx5/driver.h | 1 +
include/linux/mlx5/mlx5_ifc.h | 8 +-
include/linux/pti.h | 11 +
include/linux/ptr_ring.h | 9 +
include/linux/tcp.h | 3 +-
include/linux/tick.h | 1 +
include/linux/timer.h | 4 +-
include/net/ip.h | 1 +
include/net/tcp.h | 2 +-
init/main.c | 3 +
kernel/cpu.c | 4 +-
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/time/tick-sched.c | 32 +-
kernel/time/timer.c | 35 +-
kernel/trace/ring_buffer.c | 12 +-
kernel/trace/trace.c | 13 +-
net/bridge/br_netlink.c | 11 +-
net/core/net_namespace.c | 2 +-
net/core/skbuff.c | 17 +-
net/ipv4/devinet.c | 2 +-
net/ipv4/fib_frontend.c | 9 +-
net/ipv4/fib_semantics.c | 8 +-
net/ipv4/igmp.c | 44 ++-
net/ipv4/ip_tunnel.c | 4 +-
net/ipv4/raw.c | 15 +-
net/ipv4/tcp.c | 1 +
net/ipv4/tcp_bbr.c | 12 +-
net/ipv4/tcp_input.c | 20 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv4/tcp_rate.c | 10 +-
net/ipv4/tcp_timer.c | 2 +
net/ipv6/addrconf.c | 2 +-
net/ipv6/af_inet6.c | 1 -
net/ipv6/ip6_gre.c | 57 +--
net/ipv6/ip6_output.c | 12 +-
net/ipv6/ip6_tunnel.c | 2 +-
net/ipv6/ipv6_sockglue.c | 1 +
net/ipv6/mcast.c | 25 +-
net/ipv6/route.c | 19 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/netlink/af_netlink.c | 3 +
net/openvswitch/flow.c | 15 +-
net/rds/send.c | 3 +
net/sched/sch_ingress.c | 9 +-
net/sctp/socket.c | 10 +-
net/tipc/socket.c | 2 +-
security/Kconfig | 10 +
sound/hda/hdac_i915.c | 2 +-
sound/pci/hda/patch_conexant.c | 29 ++
sound/pci/hda/patch_realtek.c | 14 +-
sound/soc/codecs/da7218.c | 2 +-
sound/soc/codecs/msm8916-wcd-analog.c | 2 +-
sound/soc/codecs/msm8916-wcd-digital.c | 4 +-
sound/soc/codecs/tlv320aic31xx.h | 2 +-
sound/soc/codecs/twl4030.c | 4 +-
sound/soc/codecs/wm_adsp.c | 12 +-
sound/soc/fsl/fsl_ssi.c | 18 +-
tools/testing/selftests/x86/ldt_gdt.c | 3 +-
tools/usb/usbip/src/utils.c | 9 +-
172 files changed, 2568 insertions(+), 672 deletions(-)



2018-01-01 14:39:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 010/146] x86/mm/pti: Allow NX poison to be set in p4d/pgd

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 1c4de1ff4fe50453b968579ee86fac3da80dd783 upstream.

With PAGE_TABLE_ISOLATION the user portion of the kernel page tables is
poisoned with the NX bit so if the entry code exits with the kernel page
tables selected in CR3, userspace crashes.

But doing so trips the p4d/pgd_bad() checks. Make sure it does not do
that.

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -846,7 +846,12 @@ static inline pud_t *pud_offset(p4d_t *p

static inline int p4d_bad(p4d_t p4d)
{
- return (p4d_flags(p4d) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
+ unsigned long ignore_flags = _KERNPG_TABLE | _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (p4d_flags(p4d) & ~ignore_flags) != 0;
}
#endif /* CONFIG_PGTABLE_LEVELS > 3 */

@@ -880,7 +885,12 @@ static inline p4d_t *p4d_offset(pgd_t *p

static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ unsigned long ignore_flags = _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}

static inline int pgd_none(pgd_t pgd)


2018-01-01 14:39:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 012/146] x86/mm/pti: Populate user PGD

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit fc2fbc8512ed08d1de7720936fd7d2e4ce02c3a2 upstream.

In clone_pgd_range() copy the init user PGDs which cover the kernel half of
the address space, so a process has all the required kernel mappings
visible.

[ tglx: Split out from the big kaiser dump and folded Andys simplification ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1125,7 +1125,14 @@ static inline int pud_write(pud_t pud)
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+ /* Clone the user space pgd as well */
+ memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
+ count * sizeof(pgd_t));
+#endif
}

#define PTE_SHIFT ilog2(PTRS_PER_PTE)


2018-01-01 14:39:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 014/146] x86/mm/pti: Force entry through trampoline when PTI active

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 8d4b067895791ab9fdb1aadfc505f64d71239dd2 upstream.

Force the entry through the trampoline only when PTI is active. Otherwise
go through the normal entry code.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/common.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1339,7 +1339,10 @@ void syscall_init(void)
(entry_SYSCALL_64_trampoline - _entry_trampoline);

wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ else
+ wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);

#ifdef CONFIG_IA32_EMULATION
wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);


2018-01-01 14:39:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 015/146] x86/mm/pti: Share cpu_entry_area with user space page tables

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit f7cfbee91559ca7e3e961a00ffac921208a115ad upstream.

Share the cpu entry area so the user space and kernel space page tables
have the same P4D page.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/pti.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -265,6 +265,29 @@ pti_clone_pmds(unsigned long start, unsi
}

/*
+ * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
+ * next-level entry on 5-level systems.
+ */
+static void __init pti_clone_p4d(unsigned long addr)
+{
+ p4d_t *kernel_p4d, *user_p4d;
+ pgd_t *kernel_pgd;
+
+ user_p4d = pti_user_pagetable_walk_p4d(addr);
+ kernel_pgd = pgd_offset_k(addr);
+ kernel_p4d = p4d_offset(kernel_pgd, addr);
+ *user_p4d = *kernel_p4d;
+}
+
+/*
+ * Clone the CPU_ENTRY_AREA into the user space visible page table.
+ */
+static void __init pti_clone_user_shared(void)
+{
+ pti_clone_p4d(CPU_ENTRY_AREA_BASE);
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)
@@ -273,4 +296,6 @@ void __init pti_init(void)
return;

pr_info("enabled\n");
+
+ pti_clone_user_shared();
}


2018-01-01 14:39:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 016/146] x86/entry: Align entry text section to PMD boundary

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 2f7412ba9c6af5ab16bdbb4a3fdb1dcd2b4fd3c2 upstream.

The (irq)entry text must be visible in the user space page tables. To allow
simple PMD based sharing, make the entry text PMD aligned.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/vmlinux.lds.S | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -61,11 +61,17 @@ jiffies_64 = jiffies;
. = ALIGN(HPAGE_SIZE); \
__end_rodata_hpage_align = .;

+#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
+#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
+
#else

#define X64_ALIGN_RODATA_BEGIN
#define X64_ALIGN_RODATA_END

+#define ALIGN_ENTRY_TEXT_BEGIN
+#define ALIGN_ENTRY_TEXT_END
+
#endif

PHDRS {
@@ -102,8 +108,10 @@ SECTIONS
CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
+ ALIGN_ENTRY_TEXT_BEGIN
ENTRY_TEXT
IRQENTRY_TEXT
+ ALIGN_ENTRY_TEXT_END
SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)


2018-01-01 14:39:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 017/146] x86/mm/pti: Share entry text PMD

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 6dc72c3cbca0580642808d677181cad4c6433893 upstream.

Share the entry text PMD of the kernel mapping with the user space
mapping. If large pages are enabled this is a single PMD entry and at the
point where it is copied into the user page table the RW bit has not been
cleared yet. Clear it right away so the user space visible map becomes RX.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/pti.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -288,6 +288,15 @@ static void __init pti_clone_user_shared
}

/*
+ * Clone the populated PMDs of the entry and irqentry text and force it RO.
+ */
+static void __init pti_clone_entry_text(void)
+{
+ pti_clone_pmds((unsigned long) __entry_text_start,
+ (unsigned long) __irqentry_text_end, _PAGE_RW);
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)
@@ -298,4 +307,5 @@ void __init pti_init(void)
pr_info("enabled\n");

pti_clone_user_shared();
+ pti_clone_entry_text();
}


2018-01-01 14:39:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 018/146] x86/mm/pti: Map ESPFIX into user space

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 4b6bbe95b87966ba08999574db65c93c5e925a36 upstream.

Map the ESPFIX pages into user space when PTI is enabled.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/pti.c | 11 +++++++++++
1 file changed, 11 insertions(+)

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -288,6 +288,16 @@ static void __init pti_clone_user_shared
}

/*
+ * Clone the ESPFIX P4D into the user space visinble page table
+ */
+static void __init pti_setup_espfix64(void)
+{
+#ifdef CONFIG_X86_ESPFIX64
+ pti_clone_p4d(ESPFIX_BASE_ADDR);
+#endif
+}
+
+/*
* Clone the populated PMDs of the entry and irqentry text and force it RO.
*/
static void __init pti_clone_entry_text(void)
@@ -308,4 +318,5 @@ void __init pti_init(void)

pti_clone_user_shared();
pti_clone_entry_text();
+ pti_setup_espfix64();
}


2018-01-01 14:39:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 019/146] x86/cpu_entry_area: Add debugstore entries to cpu_entry_area

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 10043e02db7f8a4161f76434931051e7d797a5f6 upstream.

The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
addresses which must be visible in any execution context.

So it is required to make these mappings visible to user space when kernel
page table isolation is active.

Provide enough room for the buffer mappings in the cpu_entry_area so the
buffers are available in the user space visible page tables.

At the point where the kernel side entry area is populated there is no
buffer available yet, but the kernel PMD must be populated. To achieve this
set the entries for these buffers to non present.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/events/intel/ds.c | 5 ++--
arch/x86/events/perf_event.h | 21 +------------------
arch/x86/include/asm/cpu_entry_area.h | 13 ++++++++++++
arch/x86/include/asm/intel_ds.h | 36 ++++++++++++++++++++++++++++++++++
arch/x86/mm/cpu_entry_area.c | 27 +++++++++++++++++++++++++
5 files changed, 81 insertions(+), 21 deletions(-)

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -8,11 +8,12 @@

#include "../perf_event.h"

+/* Waste a full page so it can be mapped into the cpu_entry_area */
+DEFINE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
/* The size of a BTS record in bytes: */
#define BTS_RECORD_SIZE 24

-#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
-#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
#define PEBS_FIXUP_SIZE PAGE_SIZE

/*
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -14,6 +14,8 @@

#include <linux/perf_event.h>

+#include <asm/intel_ds.h>
+
/* To enable MSR tracing please use the generic trace points. */

/*
@@ -77,8 +79,6 @@ struct amd_nb {
struct event_constraint event_constraints[X86_PMC_IDX_MAX];
};

-/* The maximal number of PEBS events: */
-#define MAX_PEBS_EVENTS 8
#define PEBS_COUNTER_MASK ((1ULL << MAX_PEBS_EVENTS) - 1)

/*
@@ -95,23 +95,6 @@ struct amd_nb {
PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \
PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)

-/*
- * A debug store configuration.
- *
- * We only support architectures that use 64bit fields.
- */
-struct debug_store {
- u64 bts_buffer_base;
- u64 bts_index;
- u64 bts_absolute_maximum;
- u64 bts_interrupt_threshold;
- u64 pebs_buffer_base;
- u64 pebs_index;
- u64 pebs_absolute_maximum;
- u64 pebs_interrupt_threshold;
- u64 pebs_event_reset[MAX_PEBS_EVENTS];
-};
-
#define PEBS_REGS \
(PERF_REG_X86_AX | \
PERF_REG_X86_BX | \
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -5,6 +5,7 @@

#include <linux/percpu-defs.h>
#include <asm/processor.h>
+#include <asm/intel_ds.h>

/*
* cpu_entry_area is a percpu region that contains things needed by the CPU
@@ -40,6 +41,18 @@ struct cpu_entry_area {
*/
char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
#endif
+#ifdef CONFIG_CPU_SUP_INTEL
+ /*
+ * Per CPU debug store for Intel performance monitoring. Wastes a
+ * full page at the moment.
+ */
+ struct debug_store cpu_debug_store;
+ /*
+ * The actual PEBS/BTS buffers must be mapped to user space
+ * Reserve enough fixmap PTEs.
+ */
+ struct debug_store_buffers cpu_debug_buffers;
+#endif
};

#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area))
--- /dev/null
+++ b/arch/x86/include/asm/intel_ds.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_INTEL_DS_H
+#define _ASM_INTEL_DS_H
+
+#include <linux/percpu-defs.h>
+
+#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
+#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
+
+/* The maximal number of PEBS events: */
+#define MAX_PEBS_EVENTS 8
+
+/*
+ * A debug store configuration.
+ *
+ * We only support architectures that use 64bit fields.
+ */
+struct debug_store {
+ u64 bts_buffer_base;
+ u64 bts_index;
+ u64 bts_absolute_maximum;
+ u64 bts_interrupt_threshold;
+ u64 pebs_buffer_base;
+ u64 pebs_index;
+ u64 pebs_absolute_maximum;
+ u64 pebs_interrupt_threshold;
+ u64 pebs_event_reset[MAX_PEBS_EVENTS];
+} __aligned(PAGE_SIZE);
+
+DECLARE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
+struct debug_store_buffers {
+ char bts_buffer[BTS_BUFFER_SIZE];
+ char pebs_buffer[PEBS_BUFFER_SIZE];
+};
+
+#endif
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -38,6 +38,32 @@ cea_map_percpu_pages(void *cea_vaddr, vo
cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
}

+static void percpu_setup_debug_store(int cpu)
+{
+#ifdef CONFIG_CPU_SUP_INTEL
+ int npages;
+ void *cea;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return;
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_store;
+ npages = sizeof(struct debug_store) / PAGE_SIZE;
+ BUILD_BUG_ON(sizeof(struct debug_store) % PAGE_SIZE != 0);
+ cea_map_percpu_pages(cea, &per_cpu(cpu_debug_store, cpu), npages,
+ PAGE_KERNEL);
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers;
+ /*
+ * Force the population of PMDs for not yet allocated per cpu
+ * memory like debug store buffers.
+ */
+ npages = sizeof(struct debug_store_buffers) / PAGE_SIZE;
+ for (; npages; npages--, cea += PAGE_SIZE)
+ cea_set_pte(cea, 0, PAGE_NONE);
+#endif
+}
+
/* Setup the fixmap mappings only once per-processor */
static void __init setup_cpu_entry_area(int cpu)
{
@@ -109,6 +135,7 @@ static void __init setup_cpu_entry_area(
cea_set_pte(&get_cpu_entry_area(cpu)->entry_trampoline,
__pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
#endif
+ percpu_setup_debug_store(cpu);
}

static __init void setup_cpu_entry_area_ptes(void)


2018-01-01 14:39:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 020/146] x86/events/intel/ds: Map debug buffers in cpu_entry_area

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <[email protected]>

commit c1961a4631daef4aeabee8e368b1b13e8f173c91 upstream.

The BTS and PEBS buffers both have their virtual addresses programmed into
the hardware. This means that any access to them is performed via the page
tables. The times that the hardware accesses these are entirely dependent
on how the performance monitoring hardware events are set up. In other
words, there is no way for the kernel to tell when the hardware might
access these buffers.

To avoid perf crashes, place 'debug_store' allocate pages and map them into
the cpu_entry_area.

The PEBS fixup buffer does not need this treatment.

[ tglx: Got rid of the kaiser_add_mapping() complication ]

Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/events/intel/ds.c | 125 +++++++++++++++++++++++++++----------------
arch/x86/events/perf_event.h | 2
2 files changed, 82 insertions(+), 45 deletions(-)

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -3,6 +3,7 @@
#include <linux/types.h>
#include <linux/slab.h>

+#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
#include <asm/insn.h>

@@ -280,17 +281,52 @@ void fini_debug_store_on_cpu(int cpu)

static DEFINE_PER_CPU(void *, insn_buffer);

-static int alloc_pebs_buffer(int cpu)
+static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ phys_addr_t pa;
+ size_t msz = 0;
+
+ pa = virt_to_phys(addr);
+ for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
+ cea_set_pte(cea, pa, prot);
+}
+
+static void ds_clear_cea(void *cea, size_t size)
+{
+ size_t msz = 0;
+
+ for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
+ cea_set_pte(cea, 0, PAGE_NONE);
+}
+
+static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)
+{
+ unsigned int order = get_order(size);
int node = cpu_to_node(cpu);
- int max;
- void *buffer, *ibuffer;
+ struct page *page;
+
+ page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+ return page ? page_address(page) : NULL;
+}
+
+static void dsfree_pages(const void *buffer, size_t size)
+{
+ if (buffer)
+ free_pages((unsigned long)buffer, get_order(size));
+}
+
+static int alloc_pebs_buffer(int cpu)
+{
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ size_t bsiz = x86_pmu.pebs_buffer_size;
+ int max, node = cpu_to_node(cpu);
+ void *buffer, *ibuffer, *cea;

if (!x86_pmu.pebs)
return 0;

- buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
+ buffer = dsalloc_pages(bsiz, GFP_KERNEL, cpu);
if (unlikely(!buffer))
return -ENOMEM;

@@ -301,25 +337,27 @@ static int alloc_pebs_buffer(int cpu)
if (x86_pmu.intel_cap.pebs_format < 2) {
ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
if (!ibuffer) {
- kfree(buffer);
+ dsfree_pages(buffer, bsiz);
return -ENOMEM;
}
per_cpu(insn_buffer, cpu) = ibuffer;
}
-
- max = x86_pmu.pebs_buffer_size / x86_pmu.pebs_record_size;
-
- ds->pebs_buffer_base = (u64)(unsigned long)buffer;
+ hwev->ds_pebs_vaddr = buffer;
+ /* Update the cpu entry area mapping */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer;
+ ds->pebs_buffer_base = (unsigned long) cea;
+ ds_update_cea(cea, buffer, bsiz, PAGE_KERNEL);
ds->pebs_index = ds->pebs_buffer_base;
- ds->pebs_absolute_maximum = ds->pebs_buffer_base +
- max * x86_pmu.pebs_record_size;
-
+ max = x86_pmu.pebs_record_size * (bsiz / x86_pmu.pebs_record_size);
+ ds->pebs_absolute_maximum = ds->pebs_buffer_base + max;
return 0;
}

static void release_pebs_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *cea;

if (!ds || !x86_pmu.pebs)
return;
@@ -327,73 +365,70 @@ static void release_pebs_buffer(int cpu)
kfree(per_cpu(insn_buffer, cpu));
per_cpu(insn_buffer, cpu) = NULL;

- kfree((void *)(unsigned long)ds->pebs_buffer_base);
+ /* Clear the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer;
+ ds_clear_cea(cea, x86_pmu.pebs_buffer_size);
ds->pebs_buffer_base = 0;
+ dsfree_pages(hwev->ds_pebs_vaddr, x86_pmu.pebs_buffer_size);
+ hwev->ds_pebs_vaddr = NULL;
}

static int alloc_bts_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
- int node = cpu_to_node(cpu);
- int max, thresh;
- void *buffer;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *buffer, *cea;
+ int max;

if (!x86_pmu.bts)
return 0;

- buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+ buffer = dsalloc_pages(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, cpu);
if (unlikely(!buffer)) {
WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
return -ENOMEM;
}
-
- max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
- thresh = max / 16;
-
- ds->bts_buffer_base = (u64)(unsigned long)buffer;
+ hwev->ds_bts_vaddr = buffer;
+ /* Update the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer;
+ ds->bts_buffer_base = (unsigned long) cea;
+ ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL);
ds->bts_index = ds->bts_buffer_base;
- ds->bts_absolute_maximum = ds->bts_buffer_base +
- max * BTS_RECORD_SIZE;
- ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
- thresh * BTS_RECORD_SIZE;
-
+ max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE);
+ ds->bts_absolute_maximum = ds->bts_buffer_base + max;
+ ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16);
return 0;
}

static void release_bts_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *cea;

if (!ds || !x86_pmu.bts)
return;

- kfree((void *)(unsigned long)ds->bts_buffer_base);
+ /* Clear the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer;
+ ds_clear_cea(cea, BTS_BUFFER_SIZE);
ds->bts_buffer_base = 0;
+ dsfree_pages(hwev->ds_bts_vaddr, BTS_BUFFER_SIZE);
+ hwev->ds_bts_vaddr = NULL;
}

static int alloc_ds_buffer(int cpu)
{
- int node = cpu_to_node(cpu);
- struct debug_store *ds;
-
- ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
- if (unlikely(!ds))
- return -ENOMEM;
+ struct debug_store *ds = &get_cpu_entry_area(cpu)->cpu_debug_store;

+ memset(ds, 0, sizeof(*ds));
per_cpu(cpu_hw_events, cpu).ds = ds;
-
return 0;
}

static void release_ds_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
-
- if (!ds)
- return;
-
per_cpu(cpu_hw_events, cpu).ds = NULL;
- kfree(ds);
}

void release_ds_buffers(void)
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -199,6 +199,8 @@ struct cpu_hw_events {
* Intel DebugStore bits
*/
struct debug_store *ds;
+ void *ds_pebs_vaddr;
+ void *ds_bts_vaddr;
u64 pebs_enabled;
int n_pebs;
int n_large_pebs;


2018-01-01 14:40:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 022/146] x86/pti: Put the LDT in its own PGD if PTI is on

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit f55f0501cbf65ec41cca5058513031b711730b1d upstream.

With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
The LDT is per process, i.e. per mm.

An earlier approach mapped the LDT on context switch into a fixmap area,
but that's a big overhead and exhausted the fixmap space when NR_CPUS got
big.

Take advantage of the fact that there is an address space hole which
provides a completely unused pgd. Use this pgd to manage per-mm LDT
mappings.

This has a down side: the LDT isn't (currently) randomized, and an attack
that can write the LDT is instant root due to call gates (thanks, AMD, for
leaving call gates in AMD64 but designing them wrong so they're only useful
for exploits). This can be mitigated by making the LDT read-only or
randomizing the mapping, either of which is strightforward on top of this
patch.

This will significantly slow down LDT users, but that shouldn't matter for
important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
old libc implementations.

[ tglx: Cleaned it up. ]

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/x86/x86_64/mm.txt | 3
arch/x86/include/asm/mmu_context.h | 59 ++++++++++++-
arch/x86/include/asm/pgtable_64_types.h | 4
arch/x86/include/asm/processor.h | 23 +++--
arch/x86/kernel/ldt.c | 139 +++++++++++++++++++++++++++++++-
arch/x86/mm/dump_pagetables.c | 9 ++
6 files changed, 220 insertions(+), 17 deletions(-)

--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
+fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI
fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
@@ -29,7 +30,7 @@ Virtual memory map with 5 level page tab
hole caused by [56:63] sign extension
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -50,10 +50,33 @@ struct ldt_struct {
* call gates. On native, we could merge the ldt_struct and LDT
* allocations, but it's not worth trying to optimize.
*/
- struct desc_struct *entries;
- unsigned int nr_entries;
+ struct desc_struct *entries;
+ unsigned int nr_entries;
+
+ /*
+ * If PTI is in use, then the entries array is not mapped while we're
+ * in user mode. The whole array will be aliased at the addressed
+ * given by ldt_slot_va(slot). We use two slots so that we can allocate
+ * and map, and enable a new LDT without invalidating the mapping
+ * of an older, still-in-use LDT.
+ *
+ * slot will be -1 if this LDT doesn't have an alias mapping.
+ */
+ int slot;
};

+/* This is a multiple of PAGE_SIZE. */
+#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
+
+static inline void *ldt_slot_va(int slot)
+{
+#ifdef CONFIG_X86_64
+ return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
+#else
+ BUG();
+#endif
+}
+
/*
* Used for LDT copy/destruction.
*/
@@ -64,6 +87,7 @@ static inline void init_new_context_ldt(
}
int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm);
void destroy_context_ldt(struct mm_struct *mm);
+void ldt_arch_exit_mmap(struct mm_struct *mm);
#else /* CONFIG_MODIFY_LDT_SYSCALL */
static inline void init_new_context_ldt(struct mm_struct *mm) { }
static inline int ldt_dup_context(struct mm_struct *oldmm,
@@ -71,7 +95,8 @@ static inline int ldt_dup_context(struct
{
return 0;
}
-static inline void destroy_context_ldt(struct mm_struct *mm) {}
+static inline void destroy_context_ldt(struct mm_struct *mm) { }
+static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { }
#endif

static inline void load_mm_ldt(struct mm_struct *mm)
@@ -96,10 +121,31 @@ static inline void load_mm_ldt(struct mm
* that we can see.
*/

- if (unlikely(ldt))
- set_ldt(ldt->entries, ldt->nr_entries);
- else
+ if (unlikely(ldt)) {
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ if (WARN_ON_ONCE((unsigned long)ldt->slot > 1)) {
+ /*
+ * Whoops -- either the new LDT isn't mapped
+ * (if slot == -1) or is mapped into a bogus
+ * slot (if slot > 1).
+ */
+ clear_LDT();
+ return;
+ }
+
+ /*
+ * If page table isolation is enabled, ldt->entries
+ * will not be mapped in the userspace pagetables.
+ * Tell the CPU to access the LDT through the alias
+ * at ldt_slot_va(ldt->slot).
+ */
+ set_ldt(ldt_slot_va(ldt->slot), ldt->nr_entries);
+ } else {
+ set_ldt(ldt->entries, ldt->nr_entries);
+ }
+ } else {
clear_LDT();
+ }
#else
clear_LDT();
#endif
@@ -194,6 +240,7 @@ static inline int arch_dup_mmap(struct m
static inline void arch_exit_mmap(struct mm_struct *mm)
{
paravirt_arch_exit_mmap(mm);
+ ldt_arch_exit_mmap(mm);
}

#ifdef CONFIG_X86_64
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -82,10 +82,14 @@ typedef struct { pteval_t pte; } pte_t;
# define VMALLOC_SIZE_TB _AC(12800, UL)
# define __VMALLOC_BASE _AC(0xffa0000000000000, UL)
# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
+# define LDT_PGD_ENTRY _AC(-112, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#else
# define VMALLOC_SIZE_TB _AC(32, UL)
# define __VMALLOC_BASE _AC(0xffffc90000000000, UL)
# define __VMEMMAP_BASE _AC(0xffffea0000000000, UL)
+# define LDT_PGD_ENTRY _AC(-4, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#endif

#ifdef CONFIG_RANDOMIZE_MEMORY
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -851,13 +851,22 @@ static inline void spin_lock_prefetch(co

#else
/*
- * User space process size. 47bits minus one guard page. The guard
- * page is necessary on Intel CPUs: if a SYSCALL instruction is at
- * the highest possible canonical userspace address, then that
- * syscall will enter the kernel with a non-canonical return
- * address, and SYSRET will explode dangerously. We avoid this
- * particular problem by preventing anything from being mapped
- * at the maximum canonical address.
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)

--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -24,6 +24,7 @@
#include <linux/uaccess.h>

#include <asm/ldt.h>
+#include <asm/tlb.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
@@ -51,13 +52,11 @@ static void refresh_ldt_segments(void)
static void flush_ldt(void *__mm)
{
struct mm_struct *mm = __mm;
- mm_context_t *pc;

if (this_cpu_read(cpu_tlbstate.loaded_mm) != mm)
return;

- pc = &mm->context;
- set_ldt(pc->ldt->entries, pc->ldt->nr_entries);
+ load_mm_ldt(mm);

refresh_ldt_segments();
}
@@ -94,10 +93,121 @@ static struct ldt_struct *alloc_ldt_stru
return NULL;
}

+ /* The new LDT isn't aliased for PTI yet. */
+ new_ldt->slot = -1;
+
new_ldt->nr_entries = num_entries;
return new_ldt;
}

+/*
+ * If PTI is enabled, this maps the LDT into the kernelmode and
+ * usermode tables for the given mm.
+ *
+ * There is no corresponding unmap function. Even if the LDT is freed, we
+ * leave the PTEs around until the slot is reused or the mm is destroyed.
+ * This is harmless: the LDT is always in ordinary memory, and no one will
+ * access the freed slot.
+ *
+ * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
+ * it useful, and the flush would slow down modify_ldt().
+ */
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ bool is_vmalloc, had_top_level_entry;
+ unsigned long va;
+ spinlock_t *ptl;
+ pgd_t *pgd;
+ int i;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return 0;
+
+ /*
+ * Any given ldt_struct should have map_ldt_struct() called at most
+ * once.
+ */
+ WARN_ON(ldt->slot != -1);
+
+ /*
+ * Did we already have the top level entry allocated? We can't
+ * use pgd_none() for this because it doens't do anything on
+ * 4-level page table kernels.
+ */
+ pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ had_top_level_entry = (pgd->pgd != 0);
+
+ is_vmalloc = is_vmalloc_addr(ldt->entries);
+
+ for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+ unsigned long offset = i << PAGE_SHIFT;
+ const void *src = (char *)ldt->entries + offset;
+ unsigned long pfn;
+ pte_t pte, *ptep;
+
+ va = (unsigned long)ldt_slot_va(slot) + offset;
+ pfn = is_vmalloc ? vmalloc_to_pfn(src) :
+ page_to_pfn(virt_to_page(src));
+ /*
+ * Treat the PTI LDT range as a *userspace* range.
+ * get_locked_pte() will allocate all needed pagetables
+ * and account for them in this mm.
+ */
+ ptep = get_locked_pte(mm, va, &ptl);
+ if (!ptep)
+ return -ENOMEM;
+ pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL));
+ set_pte_at(mm, va, ptep, pte);
+ pte_unmap_unlock(ptep, ptl);
+ }
+
+ if (mm->context.ldt) {
+ /*
+ * We already had an LDT. The top-level entry should already
+ * have been allocated and synchronized with the usermode
+ * tables.
+ */
+ WARN_ON(!had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
+ } else {
+ /*
+ * This is the first time we're mapping an LDT for this process.
+ * Sync the pgd to the usermode tables.
+ */
+ WARN_ON(had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
+ set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+ }
+ }
+
+ va = (unsigned long)ldt_slot_va(slot);
+ flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
+
+ ldt->slot = slot;
+#endif
+ return 0;
+}
+
+static void free_ldt_pgtables(struct mm_struct *mm)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ struct mmu_gather tlb;
+ unsigned long start = LDT_BASE_ADDR;
+ unsigned long end = start + (1UL << PGDIR_SHIFT);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ free_pgd_range(&tlb, start, end, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+#endif
+}
+
/* After calling this, the LDT is immutable. */
static void finalize_ldt_struct(struct ldt_struct *ldt)
{
@@ -156,6 +266,12 @@ int ldt_dup_context(struct mm_struct *ol
new_ldt->nr_entries * LDT_ENTRY_SIZE);
finalize_ldt_struct(new_ldt);

+ retval = map_ldt_struct(mm, new_ldt, 0);
+ if (retval) {
+ free_ldt_pgtables(mm);
+ free_ldt_struct(new_ldt);
+ goto out_unlock;
+ }
mm->context.ldt = new_ldt;

out_unlock:
@@ -174,6 +290,11 @@ void destroy_context_ldt(struct mm_struc
mm->context.ldt = NULL;
}

+void ldt_arch_exit_mmap(struct mm_struct *mm)
+{
+ free_ldt_pgtables(mm);
+}
+
static int read_ldt(void __user *ptr, unsigned long bytecount)
{
struct mm_struct *mm = current->mm;
@@ -287,6 +408,18 @@ static int write_ldt(void __user *ptr, u
new_ldt->entries[ldt_info.entry_number] = ldt;
finalize_ldt_struct(new_ldt);

+ /*
+ * If we are using PTI, map the new LDT into the userspace pagetables.
+ * If there is already an LDT, use the other slot so that other CPUs
+ * will continue to use the old LDT until install_ldt() switches
+ * them over to the new LDT.
+ */
+ error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0);
+ if (error) {
+ free_ldt_struct(old_ldt);
+ goto out_unlock;
+ }
+
install_ldt(mm, new_ldt);
free_ldt_struct(old_ldt);
error = 0;
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,18 @@ enum address_markers_idx {
USER_SPACE_NR = 0,
KERNEL_SPACE_NR,
LOW_KERNEL_NR,
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
VMALLOC_START_NR,
VMEMMAP_START_NR,
#ifdef CONFIG_KASAN
KASAN_SHADOW_START_NR,
KASAN_SHADOW_END_NR,
#endif
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
CPU_ENTRY_AREA_NR,
#ifdef CONFIG_X86_ESPFIX64
ESPFIX_START_NR,
@@ -82,6 +88,9 @@ static struct addr_marker address_marker
[KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" },
[KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" },
#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ [LDT_NR] = { LDT_BASE_ADDR, "LDT remap" },
+#endif
[CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" },
#ifdef CONFIG_X86_ESPFIX64
[ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 },


2018-01-01 14:40:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 025/146] x86/mm: Abstract switching CR3

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 48e111982cda033fec832c6b0592c2acedd85d04 upstream.

In preparation to adding additional PCID flushing, abstract the
loading of a new ASID into CR3.

[ PeterZ: Split out from big combo patch ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/tlb.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -100,6 +100,24 @@ static void choose_new_asid(struct mm_st
*need_flush = true;
}

+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+{
+ unsigned long new_mm_cr3;
+
+ if (need_flush) {
+ new_mm_cr3 = build_cr3(pgdir, new_asid);
+ } else {
+ new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+ }
+
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
void leave_mm(int cpu)
{
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
@@ -230,7 +248,7 @@ void switch_mm_irqs_off(struct mm_struct
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(build_cr3(next->pgd, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, true);

/*
* NB: This gets called via leave_mm() in the idle path
@@ -243,7 +261,7 @@ void switch_mm_irqs_off(struct mm_struct
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(build_cr3_noflush(next->pgd, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, false);

/* See above wrt _rcuidle. */
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);


2018-01-01 14:40:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 003/146] tracing: Fix crash when it fails to alloc ring buffer

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jing Xia <[email protected]>

commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.

Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:

instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)

// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)

// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);

//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <[email protected]>
Signed-off-by: Chunyan Zhang <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7604,7 +7604,9 @@ static int allocate_trace_buffers(struct
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;


2018-01-01 14:40:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 004/146] x86/cpufeatures: Add X86_BUG_CPU_INSECURE

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit a89f040fa34ec9cd682aed98b8f04e3c47d998bd upstream.

Many x86 CPUs leak information to user space due to missing isolation of
user space and kernel space page tables. There are many well documented
ways to exploit that.

The upcoming software migitation of isolating the user and kernel space
page tables needs a misfeature flag so code can be made runtime
conditional.

Add the BUG bits which indicates that the CPU is affected and add a feature
bit which indicates that the software migitation is enabled.

Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be
made later.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/cpufeatures.h | 3 ++-
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/kernel/cpu/common.c | 4 ++++
3 files changed, 13 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -201,7 +201,7 @@
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
-
+#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
@@ -340,5 +340,6 @@
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */
+#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */

#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
# define DISABLE_LA57 (1<<(X86_FEATURE_LA57 & 31))
#endif

+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define DISABLE_PTI 0
+#else
+# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -54,7 +60,7 @@
#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
-#define DISABLED_MASK7 0
+#define DISABLED_MASK7 (DISABLE_PTI)
#define DISABLED_MASK8 0
#define DISABLED_MASK9 (DISABLE_MPX)
#define DISABLED_MASK10 0
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -898,6 +898,10 @@ static void __init early_identify_cpu(st
}

setup_force_cpu_cap(X86_FEATURE_ALWAYS);
+
+ /* Assume for now that ALL x86 CPUs are insecure */
+ setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32


2018-01-01 14:40:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 006/146] x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 8a09317b895f073977346779df52f67c1056d81d upstream.

PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it
enters the kernel and switch back when it exits. This essentially needs to
be done before leaving assembly code.

This is extra challenging because the switching context is tricky: the
registers that can be clobbered can vary. It is also hard to store things
on the stack because there is an established ABI (ptregs) or the stack is
entirely unsafe to use.

Establish a set of macros that allow changing to the user and kernel CR3
values.

Interactions with SWAPGS:

Previous versions of the PAGE_TABLE_ISOLATION code relied on having
per-CPU scratch space to save/restore a register that can be used for the
CR3 MOV. The %GS register is used to index into our per-CPU space, so
SWAPGS *had* to be done before the CR3 switch. That scratch space is gone
now, but the semantic that SWAPGS must be done before the CR3 MOV is
retained. This is good to keep because it is not that hard to do and it
allows to do things like add per-CPU debugging information.

What this does in the NMI code is worth pointing out. NMIs can interrupt
*any* context and they can also be nested with NMIs interrupting other
NMIs. The comments below ".Lnmi_from_kernel" explain the format of the
stack during this situation. Changing the format of this stack is hard.
Instead of storing the old CR3 value on the stack, this depends on the
*regular* register save/restore mechanism and then uses %r14 to keep CR3
during the NMI. It is callee-saved and will not be clobbered by the C NMI
handlers that get called.

[ PeterZ: ESPFIX optimization ]

Based-on-code-from: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/calling.h | 66 +++++++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 45 +++++++++++++++++++++++---
arch/x86/entry/entry_64_compat.S | 24 +++++++++++++-
3 files changed, 128 insertions(+), 7 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,6 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/jump_label.h>
#include <asm/unwind_hints.h>
+#include <asm/cpufeatures.h>
+#include <asm/page_types.h>

/*

@@ -187,6 +189,70 @@ For 32-bit we have the following convent
#endif
.endm

+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */
+#define PTI_SWITCH_MASK (1<<PAGE_SHIFT)
+
+.macro ADJUST_KERNEL_CR3 reg:req
+ /* Clear "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+ andq $(~PTI_SWITCH_MASK), \reg
+.endm
+
+.macro ADJUST_USER_CR3 reg:req
+ /* Move CR3 up a page to the user page tables: */
+ orq $(PTI_SWITCH_MASK), \reg
+.endm
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ mov %cr3, \scratch_reg
+ ADJUST_KERNEL_CR3 \scratch_reg
+ mov \scratch_reg, %cr3
+.endm
+
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ mov %cr3, \scratch_reg
+ ADJUST_USER_CR3 \scratch_reg
+ mov \scratch_reg, %cr3
+.endm
+
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ movq %cr3, \scratch_reg
+ movq \scratch_reg, \save_reg
+ /*
+ * Is the switch bit zero? This means the address is
+ * up in real PAGE_TABLE_ISOLATION patches in a moment.
+ */
+ testq $(PTI_SWITCH_MASK), \scratch_reg
+ jz .Ldone_\@
+
+ ADJUST_KERNEL_CR3 \scratch_reg
+ movq \scratch_reg, %cr3
+
+.Ldone_\@:
+.endm
+
+.macro RESTORE_CR3 save_reg:req
+ /*
+ * The CR3 write could be avoided when not changing its value,
+ * but would require a CR3 read *and* a scratch register.
+ */
+ movq \save_reg, %cr3
+.endm
+
+#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+.endm
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.endm
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+.endm
+.macro RESTORE_CR3 save_reg:req
+.endm
+
+#endif
+
#endif /* CONFIG_X86_64 */

/*
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -164,6 +164,9 @@ ENTRY(entry_SYSCALL_64_trampoline)
/* Stash the user RSP. */
movq %rsp, RSP_SCRATCH

+ /* Note: using %rsp as a scratch reg. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
/* Load the top of the task stack into RSP */
movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp

@@ -203,6 +206,10 @@ ENTRY(entry_SYSCALL_64)
*/

swapgs
+ /*
+ * This path is not taken when PAGE_TABLE_ISOLATION is disabled so it
+ * is not required to switch CR3.
+ */
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

@@ -399,6 +406,7 @@ syscall_return_via_sysret:
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
+ SWITCH_TO_USER_CR3 scratch_reg=%rdi

popq %rdi
popq %rsp
@@ -736,6 +744,8 @@ GLOBAL(swapgs_restore_regs_and_return_to
* We can do future final exit work right here.
*/

+ SWITCH_TO_USER_CR3 scratch_reg=%rdi
+
/* Restore RDI. */
popq %rdi
SWAPGS
@@ -818,7 +828,9 @@ native_irq_return_ldt:
*/

pushq %rdi /* Stash user RDI */
- SWAPGS
+ SWAPGS /* to kernel GS */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */
+
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* user RAX */
movq (1*8)(%rsp), %rax /* user RIP */
@@ -834,7 +846,6 @@ native_irq_return_ldt:
/* Now RAX == RSP. */

andl $0xffff0000, %eax /* RAX = (RSP & 0xffff0000) */
- popq %rdi /* Restore user RDI */

/*
* espfix_stack[31:16] == 0. The page tables are set up such that
@@ -845,7 +856,11 @@ native_irq_return_ldt:
* still points to an RO alias of the ESPFIX stack.
*/
orq PER_CPU_VAR(espfix_stack), %rax
- SWAPGS
+
+ SWITCH_TO_USER_CR3 scratch_reg=%rdi /* to user CR3 */
+ SWAPGS /* to user GS */
+ popq %rdi /* Restore user RDI */
+
movq %rax, %rsp
UNWIND_HINT_IRET_REGS offset=8

@@ -945,6 +960,8 @@ ENTRY(switch_to_thread_stack)
UNWIND_HINT_FUNC

pushq %rdi
+ /* Need to switch before accessing the thread stack. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
@@ -1244,7 +1261,11 @@ ENTRY(paranoid_entry)
js 1f /* negative -> in kernel */
SWAPGS
xorl %ebx, %ebx
-1: ret
+
+1:
+ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+
+ ret
END(paranoid_entry)

/*
@@ -1266,6 +1287,7 @@ ENTRY(paranoid_exit)
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
+ RESTORE_CR3 save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
.Lparanoid_exit_no_swapgs:
@@ -1293,6 +1315,8 @@ ENTRY(error_entry)
* from user mode due to an IRET fault.
*/
SWAPGS
+ /* We have user CR3. Change to kernel CR3. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax

.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
@@ -1339,6 +1363,7 @@ ENTRY(error_entry)
* .Lgs_change's error handler with kernel gsbase.
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
jmp .Lerror_entry_done

.Lbstep_iret:
@@ -1348,10 +1373,11 @@ ENTRY(error_entry)

.Lerror_bad_iret:
/*
- * We came from an IRET to user mode, so we have user gsbase.
- * Switch to kernel gsbase:
+ * We came from an IRET to user mode, so we have user
+ * gsbase and CR3. Switch to kernel gsbase and CR3:
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax

/*
* Pretend that the exception came from user mode: set up pt_regs
@@ -1383,6 +1409,10 @@ END(error_exit)
/*
* Runs on exception stack. Xen PV does not go through this path at all,
* so we can use real assembly here.
+ *
+ * Registers:
+ * %r14: Used to save/restore the CR3 of the interrupted context
+ * when PAGE_TABLE_ISOLATION is in use. Do not clobber.
*/
ENTRY(nmi)
UNWIND_HINT_IRET_REGS
@@ -1446,6 +1476,7 @@ ENTRY(nmi)

swapgs
cld
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT_IRET_REGS base=%rdx offset=8
@@ -1698,6 +1729,8 @@ end_repeat_nmi:
movq $-1, %rsi
call do_nmi

+ RESTORE_CR3 save_reg=%r14
+
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,6 +49,10 @@
ENTRY(entry_SYSENTER_compat)
/* Interrupts are off on entry. */
SWAPGS
+
+ /* We are about to clobber %rsp anyway, clobbering here is OK */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

/*
@@ -216,6 +220,12 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
pushq $0 /* pt_regs->r15 = 0 */

/*
+ * We just saved %rdi so it is safe to clobber. It is not
+ * preserved during the C calls inside TRACE_IRQS_OFF anyway.
+ */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
+ /*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
*/
@@ -256,10 +266,22 @@ sysret32_from_system_call:
* when the system call started, which is already known to user
* code. We zero R8-R10 to avoid info leaks.
*/
+ movq RSP-ORIG_RAX(%rsp), %rsp
+
+ /*
+ * The original userspace %rsp (RSP-ORIG_RAX(%rsp)) is stored
+ * on the process stack which is not mapped to userspace and
+ * not readable after we SWITCH_TO_USER_CR3. Delay the CR3
+ * switch until after after the last reference to the process
+ * stack.
+ *
+ * %r8 is zeroed before the sysret, thus safe to clobber.
+ */
+ SWITCH_TO_USER_CR3 scratch_reg=%r8
+
xorq %r8, %r8
xorq %r9, %r9
xorq %r10, %r10
- movq RSP-ORIG_RAX(%rsp), %rsp
swapgs
sysretl
END(entry_SYSCALL_compat)


2018-01-01 14:40:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 007/146] x86/mm/pti: Add infrastructure for page table isolation

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream.

Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/admin-guide/kernel-parameters.txt | 2
arch/x86/boot/compressed/pagetable.c | 3
arch/x86/entry/calling.h | 7 ++
arch/x86/include/asm/pti.h | 14 ++++
arch/x86/mm/Makefile | 7 +-
arch/x86/mm/init.c | 2
arch/x86/mm/pti.c | 84 ++++++++++++++++++++++++
include/linux/pti.h | 11 +++
init/main.c | 3
9 files changed, 130 insertions(+), 3 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2685,6 +2685,8 @@
steal time is computed, but won't influence scheduler
behaviour

+ nopti [X86-64] Disable kernel page table isolation
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.

nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -23,6 +23,9 @@
*/
#undef CONFIG_AMD_MEM_ENCRYPT

+/* No PAGE_TABLE_ISOLATION support needed either: */
+#undef CONFIG_PAGE_TABLE_ISOLATION
+
#include "misc.h"

/* These actually do the work of building the kernel identity maps. */
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -205,18 +205,23 @@ For 32-bit we have the following convent
.endm

.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
ADJUST_KERNEL_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm

.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
ADJUST_USER_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm

.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
movq %cr3, \scratch_reg
movq \scratch_reg, \save_reg
/*
@@ -233,11 +238,13 @@ For 32-bit we have the following convent
.endm

.macro RESTORE_CR3 save_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
/*
* The CR3 write could be avoided when not changing its value,
* but would require a CR3 read *and* a scratch register.
*/
movq \save_reg, %cr3
+.Lend_\@:
.endm

#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
--- /dev/null
+++ b/arch/x86/include/asm/pti.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _ASM_X86_PTI_H
+#define _ASM_X86_PTI_H
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern void pti_init(void);
+extern void pti_check_boottime_disable(void);
+#else
+static inline void pti_check_boottime_disable(void) { }
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_PTI_H */
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -43,9 +43,10 @@ obj-$(CONFIG_AMD_NUMA) += amdtopology.o
obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_NUMA_EMU) += numa_emulation.o

-obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
-obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
-obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
+obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o

obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -20,6 +20,7 @@
#include <asm/kaslr.h>
#include <asm/hypervisor.h>
#include <asm/cpufeature.h>
+#include <asm/pti.h>

/*
* We need to define the tracepoints somewhere, and tlb.c
@@ -630,6 +631,7 @@ void __init init_mem_mapping(void)
{
unsigned long end;

+ pti_check_boottime_disable();
probe_page_size_mask();
setup_pcid();

--- /dev/null
+++ b/arch/x86/mm/pti.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * This code is based in part on work published here:
+ *
+ * https://github.com/IAIK/KAISER
+ *
+ * The original work was written by and and signed off by for the Linux
+ * kernel by:
+ *
+ * Signed-off-by: Richard Fellner <[email protected]>
+ * Signed-off-by: Moritz Lipp <[email protected]>
+ * Signed-off-by: Daniel Gruss <[email protected]>
+ * Signed-off-by: Michael Schwarz <[email protected]>
+ *
+ * Major changes to the original code by: Dave Hansen <[email protected]>
+ * Mostly rewritten by Thomas Gleixner <[email protected]> and
+ * Andy Lutomirsky <[email protected]>
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+#include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/cmdline.h>
+#include <asm/pti.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/desc.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+
+static void __init pti_print_if_insecure(const char *reason)
+{
+ if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ pr_info("%s\n", reason);
+}
+
+void __init pti_check_boottime_disable(void)
+{
+ if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
+ pti_print_if_insecure("disabled on XEN PV.");
+ return;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti")) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+
+ if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ return;
+
+ setup_force_cpu_cap(X86_FEATURE_PTI);
+}
+
+/*
+ * Initialize kernel page table isolation
+ */
+void __init pti_init(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ pr_info("enabled\n");
+}
--- /dev/null
+++ b/include/linux/pti.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _INCLUDE_PTI_H
+#define _INCLUDE_PTI_H
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#include <asm/pti.h>
+#else
+static inline void pti_init(void) { }
+#endif
+
+#endif
--- a/init/main.c
+++ b/init/main.c
@@ -75,6 +75,7 @@
#include <linux/slab.h>
#include <linux/perf_event.h>
#include <linux/ptrace.h>
+#include <linux/pti.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/sched_clock.h>
@@ -506,6 +507,8 @@ static void __init mm_init(void)
ioremap_huge_init();
/* Should be run before the first non-init thread is created */
init_espfix_bsp();
+ /* Should be run after espfix64 is set up. */
+ pti_init();
}

asmlinkage __visible void __init start_kernel(void)


2018-01-01 14:40:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 009/146] x86/mm/pti: Add mapping helper functions

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 61e9b3671007a5da8127955a1a3bda7e0d5f42e8 upstream.

Add the pagetable helper functions do manage the separate user space page
tables.

[ tglx: Split out from the big combo kaiser patch. Folded Andys
simplification and made it out of line as Boris suggested ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 6 ++
arch/x86/include/asm/pgtable_64.h | 92 ++++++++++++++++++++++++++++++++++++++
arch/x86/mm/pti.c | 41 ++++++++++++++++
3 files changed, 138 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -909,7 +909,11 @@ static inline int pgd_none(pgd_t pgd)
* pgd_offset() returns a (pgd_t *)
* pgd_index() is used get the offset into the pgd page's array of pgd_t's;
*/
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index((address)))
+#define pgd_offset_pgd(pgd, address) (pgd + pgd_index((address)))
+/*
+ * a shortcut to get a pgd_t in a given mm
+ */
+#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address))
/*
* a shortcut which implies the use of the kernel's pgd, instead
* of a process's
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,9 +131,97 @@ static inline pud_t native_pudp_get_and_
#endif
}

+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and
+ * the user one is in the last 4k. To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr |= BIT(bit);
+ return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr &= ~BIT(bit);
+ return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+ return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+ return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+ return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+ return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+ unsigned long ptr = (unsigned long)__ptr;
+
+ return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
+}
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return pgd;
+ return __pti_set_user_pgd(pgdp, pgd);
+}
+#else
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+#endif
+
static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
{
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
+ p4dp->pgd = pti_set_user_pgd(&p4dp->pgd, p4d.pgd);
+#else
*p4dp = p4d;
+#endif
}

static inline void native_p4d_clear(p4d_t *p4d)
@@ -147,7 +235,11 @@ static inline void native_p4d_clear(p4d_

static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ *pgdp = pti_set_user_pgd(pgdp, pgd);
+#else
*pgdp = pgd;
+#endif
}

static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -96,6 +96,47 @@ enable:
setup_force_cpu_cap(X86_FEATURE_PTI);
}

+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ /*
+ * Changes to the high (kernel) portion of the kernelmode page
+ * tables are not automatically propagated to the usermode tables.
+ *
+ * Users should keep in mind that, unlike the kernelmode tables,
+ * there is no vmalloc_fault equivalent for the usermode tables.
+ * Top-level entries added to init_mm's usermode pgd after boot
+ * will not be automatically propagated to other mms.
+ */
+ if (!pgdp_maps_userspace(pgdp))
+ return pgd;
+
+ /*
+ * The user page tables get the full PGD, accessible from
+ * userspace:
+ */
+ kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
+
+ /*
+ * If this is normal user memory, make it NX in the kernel
+ * pagetables so that, if we somehow screw up and return to
+ * usermode with the kernel CR3 loaded, we'll get a page fault
+ * instead of allowing user code to execute with the wrong CR3.
+ *
+ * As exceptions, we don't set NX if:
+ * - _PAGE_USER is not set. This could be an executable
+ * EFI runtime mapping or something similar, and the kernel
+ * may execute from it
+ * - we don't have NX support
+ * - we're clearing the PGD (i.e. the new pgd is not present).
+ */
+ if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) &&
+ (__supported_pte_mask & _PAGE_NX))
+ pgd.pgd |= _PAGE_NX;
+
+ /* return the copy of the PGD we want the kernel to use: */
+ return pgd;
+}
+
/*
* Initialize kernel page table isolation
*/


2018-01-01 14:40:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 028/146] x86/mm: Use INVPCID for __native_flush_tlb_single()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 6cff64b86aaaa07f89f50498055a20e45754b0c1 upstream.

This uses INVPCID to shoot down individual lines of the user mapping
instead of marking the entire user map as invalid. This
could/might/possibly be faster.

This for sure needs tlb_single_page_flush_ceiling to be redetermined;
esp. since INVPCID is _slow_.

A detailed performance analysis is available here:

https://lkml.kernel.org/r/[email protected]

[ Peterz: Split out from big combo patch ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/tlbflush.h | 23 ++++++++++++-
arch/x86/mm/init.c | 64 +++++++++++++++++++++----------------
3 files changed, 60 insertions(+), 28 deletions(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */
#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */
#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */

#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -85,6 +85,18 @@ static inline u16 kern_pcid(u16 asid)
return asid + 1;
}

+/*
+ * The user PCID is just the kernel one, plus the "switch bit".
+ */
+static inline u16 user_pcid(u16 asid)
+{
+ u16 ret = kern_pcid(asid);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ ret |= 1 << X86_CR3_PTI_SWITCH_BIT;
+#endif
+ return ret;
+}
+
struct pgd_t;
static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
{
@@ -335,6 +347,8 @@ static inline void __native_flush_tlb_gl
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
@@ -368,7 +382,14 @@ static inline void __native_flush_tlb_si
if (!static_cpu_has(X86_FEATURE_PTI))
return;

- invalidate_user_asid(loaded_mm_asid);
+ /*
+ * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
+ * Just use invalidate_user_asid() in case we are called early.
+ */
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
+ invalidate_user_asid(loaded_mm_asid);
+ else
+ invpcid_flush_one(user_pcid(loaded_mm_asid), addr);
}

/*
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -203,34 +203,44 @@ static void __init probe_page_size_mask(

static void setup_pcid(void)
{
-#ifdef CONFIG_X86_64
- if (boot_cpu_has(X86_FEATURE_PCID)) {
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- /*
- * This can't be cr4_set_bits_and_update_boot() --
- * the trampoline code can't handle CR4.PCIDE and
- * it wouldn't do any good anyway. Despite the name,
- * cr4_set_bits_and_update_boot() doesn't actually
- * cause the bits in question to remain set all the
- * way through the secondary boot asm.
- *
- * Instead, we brute-force it and set CR4.PCIDE
- * manually in start_secondary().
- */
- cr4_set_bits(X86_CR4_PCIDE);
- } else {
- /*
- * flush_tlb_all(), as currently implemented, won't
- * work if PCID is on but PGE is not. Since that
- * combination doesn't exist on real hardware, there's
- * no reason to try to fully support it, but it's
- * polite to avoid corrupting data if we're on
- * an improperly configured VM.
- */
- setup_clear_cpu_cap(X86_FEATURE_PCID);
- }
+ if (!IS_ENABLED(CONFIG_X86_64))
+ return;
+
+ if (!boot_cpu_has(X86_FEATURE_PCID))
+ return;
+
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
+ /*
+ * This can't be cr4_set_bits_and_update_boot() -- the
+ * trampoline code can't handle CR4.PCIDE and it wouldn't
+ * do any good anyway. Despite the name,
+ * cr4_set_bits_and_update_boot() doesn't actually cause
+ * the bits in question to remain set all the way through
+ * the secondary boot asm.
+ *
+ * Instead, we brute-force it and set CR4.PCIDE manually in
+ * start_secondary().
+ */
+ cr4_set_bits(X86_CR4_PCIDE);
+
+ /*
+ * INVPCID's single-context modes (2/3) only work if we set
+ * X86_CR4_PCIDE, *and* we INVPCID support. It's unusable
+ * on systems that have X86_CR4_PCIDE clear, or that have
+ * no INVPCID support at all.
+ */
+ if (boot_cpu_has(X86_FEATURE_INVPCID))
+ setup_force_cpu_cap(X86_FEATURE_INVPCID_SINGLE);
+ } else {
+ /*
+ * flush_tlb_all(), as currently implemented, won't work if
+ * PCID is on but PGE is not. Since that combination
+ * doesn't exist on real hardware, there's no reason to try
+ * to fully support it, but it's polite to avoid corrupting
+ * data if we're on an improperly configured VM.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
}
-#endif
}

#ifdef CONFIG_X86_32


2018-01-01 14:40:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 040/146] ASoC: wm_adsp: Fix validation of firmware and coeff lengths

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <[email protected]>

commit 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 upstream.

The checks for whether another region/block header could be present
are subtracting the size from the current offset. Obviously we should
instead subtract the offset from the size.

The checks for whether the region/block data fit in the file are
adding the data size to the current offset and header size, without
checking for integer overflow. Rearrange these so that overflow is
impossible.

Signed-off-by: Ben Hutchings <[email protected]>
Acked-by: Charles Keepax <[email protected]>
Tested-by: Charles Keepax <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/wm_adsp.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- a/sound/soc/codecs/wm_adsp.c
+++ b/sound/soc/codecs/wm_adsp.c
@@ -1733,7 +1733,7 @@ static int wm_adsp_load(struct wm_adsp *
le64_to_cpu(footer->timestamp));

while (pos < firmware->size &&
- pos - firmware->size > sizeof(*region)) {
+ sizeof(*region) < firmware->size - pos) {
region = (void *)&(firmware->data[pos]);
region_name = "Unknown";
reg = 0;
@@ -1782,8 +1782,8 @@ static int wm_adsp_load(struct wm_adsp *
regions, le32_to_cpu(region->len), offset,
region_name);

- if ((pos + le32_to_cpu(region->len) + sizeof(*region)) >
- firmware->size) {
+ if (le32_to_cpu(region->len) >
+ firmware->size - pos - sizeof(*region)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, regions, region_name,
@@ -2253,7 +2253,7 @@ static int wm_adsp_load_coeff(struct wm_

blocks = 0;
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*blk)) {
+ sizeof(*blk) < firmware->size - pos) {
blk = (void *)(&firmware->data[pos]);

type = le16_to_cpu(blk->type);
@@ -2327,8 +2327,8 @@ static int wm_adsp_load_coeff(struct wm_
}

if (reg) {
- if ((pos + le32_to_cpu(blk->len) + sizeof(*blk)) >
- firmware->size) {
+ if (le32_to_cpu(blk->len) >
+ firmware->size - pos - sizeof(*blk)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, blocks, region_name,


2018-01-01 14:40:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 041/146] ASoC: da7218: fix fix child-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit bc6476d6c1edcb9b97621b5131bd169aa81f27db upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

To make things worse, the parent codec node was also prematurely freed.

Fixes: 4d50934abd22 ("ASoC: da7218: Add da7218 codec driver")
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Adam Thomson <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/da7218.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/soc/codecs/da7218.c
+++ b/sound/soc/codecs/da7218.c
@@ -2520,7 +2520,7 @@ static struct da7218_pdata *da7218_of_to
}

if (da7218->dev_id == DA7218_DEV_ID) {
- hpldet_np = of_find_node_by_name(np, "da7218_hpldet");
+ hpldet_np = of_get_child_by_name(np, "da7218_hpldet");
if (!hpldet_np)
return pdata;



2018-01-01 14:41:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 042/146] ASoC: fsl_ssi: AC97 ops need regmap, clock and cleaning up on failure

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Maciej S. Szmigiero <[email protected]>

commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream.

AC'97 ops (register read / write) need SSI regmap and clock, so they have
to be set after them.

We also need to set these ops back to NULL if we fail the probe.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Acked-by: Nicolin Chen <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/fsl/fsl_ssi.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1452,12 +1452,6 @@ static int fsl_ssi_probe(struct platform
sizeof(fsl_ssi_ac97_dai));

fsl_ac97_data = ssi_private;
-
- ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
- if (ret) {
- dev_err(&pdev->dev, "could not set AC'97 ops\n");
- return ret;
- }
} else {
/* Initialize this copy of the CPU DAI driver structure */
memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template,
@@ -1568,6 +1562,14 @@ static int fsl_ssi_probe(struct platform
return ret;
}

+ if (fsl_ssi_is_ac97(ssi_private)) {
+ ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "could not set AC'97 ops\n");
+ goto error_ac97_ops;
+ }
+ }
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component,
&ssi_private->cpu_dai_drv, 1);
if (ret) {
@@ -1651,6 +1653,10 @@ error_sound_card:
fsl_ssi_debugfs_remove(&ssi_private->dbg_stats);

error_asoc_register:
+ if (fsl_ssi_is_ac97(ssi_private))
+ snd_soc_set_ac97_ops(NULL);
+
+error_ac97_ops:
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);



2018-01-01 14:41:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 045/146] gpio: fix "gpio-line-names" property retrieval

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christophe Leroy <[email protected]>

commit 822703354774ec935169cbbc8d503236bcb54fda upstream.

Following commit 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names()
to use device property accessors"), "gpio-line-names" DT property is
not retrieved anymore when chip->parent is not set by the driver.
This is due to OF based property reads having been replaced by device
based property reads.

This patch fixes that by making use of
fwnode_property_read_string_array() instead of
device_property_read_string_array() and handing over either
of_fwnode_handle(chip->of_node) or dev_fwnode(chip->parent)
to that function.

Fixes: 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors")
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpio/gpiolib-acpi.c | 2 +-
drivers/gpio/gpiolib-devprop.c | 17 +++++++----------
drivers/gpio/gpiolib-of.c | 3 ++-
drivers/gpio/gpiolib.h | 3 ++-
4 files changed, 12 insertions(+), 13 deletions(-)

--- a/drivers/gpio/gpiolib-acpi.c
+++ b/drivers/gpio/gpiolib-acpi.c
@@ -1074,7 +1074,7 @@ void acpi_gpiochip_add(struct gpio_chip
}

if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip, dev_fwnode(chip->parent));

acpi_gpiochip_request_regions(acpi_gpio);
acpi_gpiochip_scan_gpios(acpi_gpio);
--- a/drivers/gpio/gpiolib-devprop.c
+++ b/drivers/gpio/gpiolib-devprop.c
@@ -19,30 +19,27 @@
/**
* devprop_gpiochip_set_names - Set GPIO line names using device properties
* @chip: GPIO chip whose lines should be named, if possible
+ * @fwnode: Property Node containing the gpio-line-names property
*
* Looks for device property "gpio-line-names" and if it exists assigns
* GPIO line names for the chip. The memory allocated for the assigned
* names belong to the underlying firmware node and should not be released
* by the caller.
*/
-void devprop_gpiochip_set_names(struct gpio_chip *chip)
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode)
{
struct gpio_device *gdev = chip->gpiodev;
const char **names;
int ret, i;

- if (!chip->parent) {
- dev_warn(&gdev->dev, "GPIO chip parent is NULL\n");
- return;
- }
-
- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
NULL, 0);
if (ret < 0)
return;

if (ret != gdev->ngpio) {
- dev_warn(chip->parent,
+ dev_warn(&gdev->dev,
"names %d do not match number of GPIOs %d\n", ret,
gdev->ngpio);
return;
@@ -52,10 +49,10 @@ void devprop_gpiochip_set_names(struct g
if (!names)
return;

- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
names, gdev->ngpio);
if (ret < 0) {
- dev_warn(chip->parent, "failed to read GPIO line names\n");
+ dev_warn(&gdev->dev, "failed to read GPIO line names\n");
kfree(names);
return;
}
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -493,7 +493,8 @@ int of_gpiochip_add(struct gpio_chip *ch

/* If the chip defines names itself, these take precedence */
if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip,
+ of_fwnode_handle(chip->of_node));

of_node_get(chip->of_node);

--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -224,7 +224,8 @@ static inline int gpio_chip_hwgpio(const
return desc - &desc->gdev->descs[0];
}

-void devprop_gpiochip_set_names(struct gpio_chip *chip);
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode);

/* With descriptor prefix */



2018-01-01 14:41:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 046/146] IB/hfi: Only read capability registers if the capability exists

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael J. Ruhl <[email protected]>

commit 4c009af473b2026caaa26107e34d7cc68dad7756 upstream.

During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump. Some of these registers are not available
in some circumstances (i.e. Virtual machines).

This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.

Delete unnecessary register read/write, and only access register if
the capability exists.

Fixes: a618b7e40af2 ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Michael J. Ruhl <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/hfi1/hfi.h | 1 -
drivers/infiniband/hw/hfi1/pcie.c | 30 ++++++++++++------------------
2 files changed, 12 insertions(+), 19 deletions(-)

--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1129,7 +1129,6 @@ struct hfi1_devdata {
u16 pcie_lnkctl;
u16 pcie_devctl2;
u32 pci_msix0;
- u32 pci_lnkctl3;
u32 pci_tph2;

/*
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -411,15 +411,12 @@ int restore_pci_variables(struct hfi1_de
if (ret)
goto error;

- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;

error:
@@ -469,15 +466,12 @@ int save_pci_variables(struct hfi1_devda
if (ret)
goto error;

- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- &dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ &dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;

error:


2018-01-01 14:41:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 047/146] IB/mlx5: Serialize access to the VMA list

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Majd Dibbiny <[email protected]>

commit ad9a3668a434faca1339789ed2f043d679199309 upstream.

User-space applications can do mmap and munmap directly at
any time.

Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.

Fixes: 7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Majd Dibbiny <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/mlx5/main.c | 8 ++++++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++++
2 files changed, 12 insertions(+)

--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1415,6 +1415,7 @@ static struct ib_ucontext *mlx5_ib_alloc
}

INIT_LIST_HEAD(&context->vma_private_list);
+ mutex_init(&context->vma_private_list_mutex);
INIT_LIST_HEAD(&context->db_page_list);
mutex_init(&context->db_page_mutex);

@@ -1576,7 +1577,9 @@ static void mlx5_ib_vma_close(struct vm
* mlx5_ib_disassociate_ucontext().
*/
mlx5_ib_vma_priv_data->vma = NULL;
+ mutex_lock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
list_del(&mlx5_ib_vma_priv_data->list);
+ mutex_unlock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
kfree(mlx5_ib_vma_priv_data);
}

@@ -1596,10 +1599,13 @@ static int mlx5_ib_set_vma_data(struct v
return -ENOMEM;

vma_prv->vma = vma;
+ vma_prv->vma_private_list_mutex = &ctx->vma_private_list_mutex;
vma->vm_private_data = vma_prv;
vma->vm_ops = &mlx5_ib_vm_ops;

+ mutex_lock(&ctx->vma_private_list_mutex);
list_add(&vma_prv->list, vma_head);
+ mutex_unlock(&ctx->vma_private_list_mutex);

return 0;
}
@@ -1642,6 +1648,7 @@ static void mlx5_ib_disassociate_ucontex
* mlx5_ib_vma_close.
*/
down_write(&owning_mm->mmap_sem);
+ mutex_lock(&context->vma_private_list_mutex);
list_for_each_entry_safe(vma_private, n, &context->vma_private_list,
list) {
vma = vma_private->vma;
@@ -1656,6 +1663,7 @@ static void mlx5_ib_disassociate_ucontex
list_del(&vma_private->list);
kfree(vma_private);
}
+ mutex_unlock(&context->vma_private_list_mutex);
up_write(&owning_mm->mmap_sem);
mmput(owning_mm);
put_task_struct(owning_process);
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -115,6 +115,8 @@ enum {
struct mlx5_ib_vma_private_data {
struct list_head list;
struct vm_area_struct *vma;
+ /* protect vma_private_list add/del */
+ struct mutex *vma_private_list_mutex;
};

struct mlx5_ib_ucontext {
@@ -129,6 +131,8 @@ struct mlx5_ib_ucontext {
/* Transport Domain number */
u32 tdn;
struct list_head vma_private_list;
+ /* protect vma_private_list add/del */
+ struct mutex vma_private_list_mutex;

unsigned long upd_xlt_page;
/* protect ODP/KSM */


2018-01-01 14:41:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 049/146] IB/core: Verify that QP is security enabled in create and destroy

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Moni Shoua <[email protected]>

commit 4a50881bbac309e6f0684816a180bc3c14e1485d upstream.

The XRC target QP create flow sets up qp_sec only if there is an IB link with
LSM security enabled. However, several other related uAPI entry points blindly
follow the qp_sec NULL pointer, resulting in a possible oops.

Check for NULL before using qp_sec.

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Reviewed-by: Daniel Jurgens <[email protected]>
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/core/security.c | 3 +++
drivers/infiniband/core/verbs.c | 3 ++-
2 files changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -386,6 +386,9 @@ int ib_open_shared_qp_security(struct ib
if (ret)
return ret;

+ if (!qp->qp_sec)
+ return 0;
+
mutex_lock(&real_qp->qp_sec->mutex);
ret = check_qp_port_pkey_settings(real_qp->qp_sec->ports_pkeys,
qp->qp_sec);
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1400,7 +1400,8 @@ int ib_close_qp(struct ib_qp *qp)
spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);

atomic_dec(&real_qp->usecnt);
- ib_close_shared_qp_security(qp->qp_sec);
+ if (qp->qp_sec)
+ ib_close_shared_qp_security(qp->qp_sec);
kfree(qp);

return 0;


2018-01-01 14:41:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 050/146] ALSA: hda: Drop useless WARN_ON()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream.

Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915
pointer leftover in error path") cleared hdac_acomp pointer, the
WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give
a false-positive warning, as the function gets called no matter
whether the component is registered or not. For fixing it, let's get
rid of the spurious WARN_ON().

Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path")
Reported-by: Kouta Okamoto <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/hda/hdac_i915.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/hda/hdac_i915.c
+++ b/sound/hda/hdac_i915.c
@@ -325,7 +325,7 @@ static int hdac_component_master_match(s
*/
int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops)
{
- if (WARN_ON(!hdac_acomp))
+ if (!hdac_acomp)
return -ENODEV;

hdac_acomp->audio_ops = aops;


2018-01-01 14:41:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 051/146] ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hui Wang <[email protected]>

commit 322f74ede933b3e2cb78768b6a6fdbfbf478a0c1 upstream.

There is a headset jack on the front panel, when we plug a headset
into it, the headset mic can't trigger unsol events, and
read_pin_sense() can't detect its presence too. So add this fixup
to fix this issue.

Signed-off-by: Hui Wang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_conexant.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -271,6 +271,8 @@ enum {
CXT_FIXUP_HP_SPECTRE,
CXT_FIXUP_HP_GATE_MIC,
CXT_FIXUP_MUTE_LED_GPIO,
+ CXT_FIXUP_HEADSET_MIC,
+ CXT_FIXUP_HP_MIC_NO_PRESENCE,
};

/* for hda_fixup_thinkpad_acpi() */
@@ -350,6 +352,18 @@ static void cxt_fixup_headphone_mic(stru
}
}

+static void cxt_fixup_headset_mic(struct hda_codec *codec,
+ const struct hda_fixup *fix, int action)
+{
+ struct conexant_spec *spec = codec->spec;
+
+ switch (action) {
+ case HDA_FIXUP_ACT_PRE_PROBE:
+ spec->parse_flags |= HDA_PINCFG_HEADSET_MIC;
+ break;
+ }
+}
+
/* OPLC XO 1.5 fixup */

/* OLPC XO-1.5 supports DC input mode (e.g. for use with analog sensors)
@@ -880,6 +894,19 @@ static const struct hda_fixup cxt_fixups
.type = HDA_FIXUP_FUNC,
.v.func = cxt_fixup_mute_led_gpio,
},
+ [CXT_FIXUP_HEADSET_MIC] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = cxt_fixup_headset_mic,
+ },
+ [CXT_FIXUP_HP_MIC_NO_PRESENCE] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x1a, 0x02a1113c },
+ { }
+ },
+ .chained = true,
+ .chain_id = CXT_FIXUP_HEADSET_MIC,
+ },
};

static const struct snd_pci_quirk cxt5045_fixups[] = {
@@ -934,6 +961,8 @@ static const struct snd_pci_quirk cxt506
SND_PCI_QUIRK(0x103c, 0x8115, "HP Z1 Gen3", CXT_FIXUP_HP_GATE_MIC),
SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO),
SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO),
+ SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN),
SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO),
SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410),


2018-01-01 14:41:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 052/146] ALSA: hda - change the location for one mic on a Lenovo machine

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hui Wang <[email protected]>

commit 8da5bbfc7cbba909f4f32d5e1dda3750baa5d853 upstream.

There are two front mics on this machine, and current driver assign
the same name Mic to both of them, but pulseaudio can't handle them.
As a workaround, we change the location for one of them, then the
driver will assign "Front Mic" and "Mic" for them.

Signed-off-by: Hui Wang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6305,6 +6305,7 @@ static const struct snd_pci_quirk alc269
SND_PCI_QUIRK(0x17aa, 0x30bb, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x30e2, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x310c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
+ SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
SND_PCI_QUIRK(0x17aa, 0x3112, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI),
SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC),


2018-01-01 14:41:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 054/146] ALSA: hda - Fix missing COEF init for ALC225/295/299

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 44be77c590f381bc629815ac789b8b15ecc4ddcf upstream.

There was a long-standing problem on HP Spectre X360 with Kabylake
where it lacks of the front speaker output in some situations. Also
there are other products showing the similar behavior. The culprit
seems to be the missing COEF setup on ALC codecs, ALC225/295/299,
which are all compatible.

This patch adds the proper COEF setup (to initialize idx 0x67 / bits
0x3000) for addressing the issue.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195457
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -324,8 +324,12 @@ static void alc_fill_eapd_coef(struct hd
case 0x10ec0292:
alc_update_coef_idx(codec, 0x4, 1<<15, 0);
break;
- case 0x10ec0215:
case 0x10ec0225:
+ case 0x10ec0295:
+ case 0x10ec0299:
+ alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000);
+ /* fallthrough */
+ case 0x10ec0215:
case 0x10ec0233:
case 0x10ec0236:
case 0x10ec0255:
@@ -336,10 +340,8 @@ static void alc_fill_eapd_coef(struct hd
case 0x10ec0286:
case 0x10ec0288:
case 0x10ec0285:
- case 0x10ec0295:
case 0x10ec0298:
case 0x10ec0289:
- case 0x10ec0299:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
case 0x10ec0275:


2018-01-01 14:41:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 056/146] block: fix blk_rq_append_bio

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jens Axboe <[email protected]>

commit 0abc2a10389f0c9070f76ca906c7382788036b93 upstream.

Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.

Fixes: caa4b02476e3 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <[email protected]>
Reported-by: Michele Ballabio <[email protected]>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
block/blk-map.c | 38 +++++++++++++++++++++----------------
drivers/scsi/osd/osd_initiator.c | 4 ++-
drivers/target/target_core_pscsi.c | 4 +--
include/linux/blkdev.h | 2 -
4 files changed, 28 insertions(+), 20 deletions(-)

--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -12,22 +12,29 @@
#include "blk.h"

/*
- * Append a bio to a passthrough request. Only works can be merged into
- * the request based on the driver constraints.
+ * Append a bio to a passthrough request. Only works if the bio can be merged
+ * into the request based on the driver constraints.
*/
-int blk_rq_append_bio(struct request *rq, struct bio *bio)
+int blk_rq_append_bio(struct request *rq, struct bio **bio)
{
- blk_queue_bounce(rq->q, &bio);
+ struct bio *orig_bio = *bio;
+
+ blk_queue_bounce(rq->q, bio);

if (!rq->bio) {
- blk_rq_bio_prep(rq->q, rq, bio);
+ blk_rq_bio_prep(rq->q, rq, *bio);
} else {
- if (!ll_back_merge_fn(rq->q, rq, bio))
+ if (!ll_back_merge_fn(rq->q, rq, *bio)) {
+ if (orig_bio != *bio) {
+ bio_put(*bio);
+ *bio = orig_bio;
+ }
return -EINVAL;
+ }

- rq->biotail->bi_next = bio;
- rq->biotail = bio;
- rq->__data_len += bio->bi_iter.bi_size;
+ rq->biotail->bi_next = *bio;
+ rq->biotail = *bio;
+ rq->__data_len += (*bio)->bi_iter.bi_size;
}

return 0;
@@ -80,14 +87,12 @@ static int __blk_rq_map_user_iov(struct
* We link the bounce buffer in and could have to traverse it
* later so we have to get a ref to prevent it from being freed
*/
- ret = blk_rq_append_bio(rq, bio);
- bio_get(bio);
+ ret = blk_rq_append_bio(rq, &bio);
if (ret) {
- bio_endio(bio);
__blk_rq_unmap_user(orig_bio);
- bio_put(bio);
return ret;
}
+ bio_get(bio);

return 0;
}
@@ -220,7 +225,7 @@ int blk_rq_map_kern(struct request_queue
int reading = rq_data_dir(rq) == READ;
unsigned long addr = (unsigned long) kbuf;
int do_copy = 0;
- struct bio *bio;
+ struct bio *bio, *orig_bio;
int ret;

if (len > (queue_max_hw_sectors(q) << 9))
@@ -243,10 +248,11 @@ int blk_rq_map_kern(struct request_queue
if (do_copy)
rq->rq_flags |= RQF_COPY_USER;

- ret = blk_rq_append_bio(rq, bio);
+ orig_bio = bio;
+ ret = blk_rq_append_bio(rq, &bio);
if (unlikely(ret)) {
/* request is too big */
- bio_put(bio);
+ bio_put(orig_bio);
return ret;
}

--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -1576,7 +1576,9 @@ static struct request *_make_request(str
return req;

for_each_bio(bio) {
- ret = blk_rq_append_bio(req, bio);
+ struct bio *bounce_bio = bio;
+
+ ret = blk_rq_append_bio(req, &bounce_bio);
if (ret)
return ERR_PTR(ret);
}
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -920,7 +920,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);

- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
@@ -938,7 +938,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct
}

if (bio) {
- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,7 +952,7 @@ extern int blk_rq_prep_clone(struct requ
extern void blk_rq_unprep_clone(struct request *rq);
extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
struct request *rq);
-extern int blk_rq_append_bio(struct request *rq, struct bio *bio);
+extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
extern void blk_delay_queue(struct request_queue *, unsigned long);
extern void blk_queue_split(struct request_queue *, struct bio **);
extern void blk_recount_segments(struct request_queue *, struct bio *);


2018-01-01 14:42:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 057/146] block: dont let passthrough IO go into .make_request_fn()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ming Lei <[email protected]>

commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed upstream.

Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries
to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
but ignores that passthrough I/O can use blk_queue_bounce() too.
Especially, passthrough IO may not be sector-aligned, and the check
of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
then cause the bio splitted, and the original passthrough bio is submited
to generic_make_request().

This patch fixes this issue by checking if the bio is passthrough IO,
and use bio_kmalloc() to allocate the cloned passthrough bio.

Cc: NeilBrown <[email protected]>
Fixes: a8821f3f3("block: Improvements to bounce-buffer handling")
Tested-by: Michele Ballabio <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
block/bounce.c | 6 ++++--
include/linux/blkdev.h | 21 +++++++++++++++++++--
2 files changed, 23 insertions(+), 4 deletions(-)

--- a/block/bounce.c
+++ b/block/bounce.c
@@ -200,6 +200,7 @@ static void __blk_queue_bounce(struct re
unsigned i = 0;
bool bounce = false;
int sectors = 0;
+ bool passthrough = bio_is_passthrough(*bio_orig);

bio_for_each_segment(from, *bio_orig, iter) {
if (i++ < BIO_MAX_PAGES)
@@ -210,13 +211,14 @@ static void __blk_queue_bounce(struct re
if (!bounce)
return;

- if (sectors < bio_sectors(*bio_orig)) {
+ if (!passthrough && sectors < bio_sectors(*bio_orig)) {
bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split);
bio_chain(bio, *bio_orig);
generic_make_request(*bio_orig);
*bio_orig = bio;
}
- bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);
+ bio = bio_clone_bioset(*bio_orig, GFP_NOIO, passthrough ? NULL :
+ bounce_bio_set);

bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -241,14 +241,24 @@ struct request {
struct request *next_rq;
};

+static inline bool blk_op_is_scsi(unsigned int op)
+{
+ return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT;
+}
+
+static inline bool blk_op_is_private(unsigned int op)
+{
+ return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
+}
+
static inline bool blk_rq_is_scsi(struct request *rq)
{
- return req_op(rq) == REQ_OP_SCSI_IN || req_op(rq) == REQ_OP_SCSI_OUT;
+ return blk_op_is_scsi(req_op(rq));
}

static inline bool blk_rq_is_private(struct request *rq)
{
- return req_op(rq) == REQ_OP_DRV_IN || req_op(rq) == REQ_OP_DRV_OUT;
+ return blk_op_is_private(req_op(rq));
}

static inline bool blk_rq_is_passthrough(struct request *rq)
@@ -256,6 +266,13 @@ static inline bool blk_rq_is_passthrough
return blk_rq_is_scsi(rq) || blk_rq_is_private(rq);
}

+static inline bool bio_is_passthrough(struct bio *bio)
+{
+ unsigned op = bio_op(bio);
+
+ return blk_op_is_scsi(op) || blk_op_is_private(op);
+}
+
static inline unsigned short req_get_ioprio(struct request *req)
{
return req->ioprio;


2018-01-01 14:42:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 060/146] ipv6: mcast: better catch silly mtu values

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit b9b312a7a451e9c098921856e7cfbc201120e1a7 ]

syzkaller reported crashes in IPv6 stack [1]

Xin Long found that lo MTU was set to silly values.

IPv6 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in mld code where it is assumed
the mtu is suitable.

Fix this by reading device mtu once and checking IPv6 minimal MTU.

[1]
skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801db307508 EFLAGS: 00010286
RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
skb_over_panic net/core/skbuff.c:109 [inline]
skb_put+0x181/0x1c0 net/core/skbuff.c:1694
add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
mld_send_cr net/ipv6/mcast.c:1903 [inline]
mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
expire_timers kernel/time/timer.c:1357 [inline]
__run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
__do_softirq+0x29d/0xbb2 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d3/0x210 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:540 [inline]
smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Tested-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/mcast.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1682,16 +1682,16 @@ static int grec_size(struct ifmcaddr6 *p
}

static struct sk_buff *add_grhead(struct sk_buff *skb, struct ifmcaddr6 *pmc,
- int type, struct mld2_grec **ppgr)
+ int type, struct mld2_grec **ppgr, unsigned int mtu)
{
- struct net_device *dev = pmc->idev->dev;
struct mld2_report *pmr;
struct mld2_grec *pgr;

- if (!skb)
- skb = mld_newpack(pmc->idev, dev->mtu);
- if (!skb)
- return NULL;
+ if (!skb) {
+ skb = mld_newpack(pmc->idev, mtu);
+ if (!skb)
+ return NULL;
+ }
pgr = skb_put(skb, sizeof(struct mld2_grec));
pgr->grec_type = type;
pgr->grec_auxwords = 0;
@@ -1714,10 +1714,15 @@ static struct sk_buff *add_grec(struct s
struct mld2_grec *pgr = NULL;
struct ip6_sf_list *psf, *psf_next, *psf_prev, **psf_list;
int scount, stotal, first, isquery, truncate;
+ unsigned int mtu;

if (pmc->mca_flags & MAF_NOREPORT)
return skb;

+ mtu = READ_ONCE(dev->mtu);
+ if (mtu < IPV6_MIN_MTU)
+ return skb;
+
isquery = type == MLD2_MODE_IS_INCLUDE ||
type == MLD2_MODE_IS_EXCLUDE;
truncate = type == MLD2_MODE_IS_EXCLUDE ||
@@ -1738,7 +1743,7 @@ static struct sk_buff *add_grec(struct s
AVAILABLE(skb) < grec_size(pmc, type, gdeleted, sdeleted)) {
if (skb)
mld_sendpack(skb);
- skb = mld_newpack(idev, dev->mtu);
+ skb = mld_newpack(idev, mtu);
}
}
first = 1;
@@ -1774,12 +1779,12 @@ static struct sk_buff *add_grec(struct s
pgr->grec_nsrcs = htons(scount);
if (skb)
mld_sendpack(skb);
- skb = mld_newpack(idev, dev->mtu);
+ skb = mld_newpack(idev, mtu);
first = 1;
scount = 0;
}
if (first) {
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
first = 0;
}
if (!skb)
@@ -1814,7 +1819,7 @@ empty_source:
mld_sendpack(skb);
skb = NULL; /* add_grhead will get a new one */
}
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
}
}
if (pgr)


2018-01-01 14:42:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 062/146] net: igmp: Use correct source address on IGMPv3 reports

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kevin Cernekee <[email protected]>


[ Upstream commit a46182b00290839fa3fa159d54fd3237bd8669f0 ]

Closing a multicast socket after the final IPv4 address is deleted
from an interface can generate a membership report that uses the
source IP from a different interface. The following test script, run
from an isolated netns, reproduces the issue:

#!/bin/bash

ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link set dummy0 up
ip link set dummy1 up
ip addr add 10.1.1.1/24 dev dummy0
ip addr add 192.168.99.99/24 dev dummy1

tcpdump -U -i dummy0 &
socat EXEC:"sleep 2" \
UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &

sleep 1
ip addr del 10.1.1.1/24 dev dummy0
sleep 5
kill %tcpdump

RFC 3376 specifies that the report must be sent with a valid IP source
address from the destination subnet, or from address 0.0.0.0. Add an
extra check to make sure this is the case.

Signed-off-by: Kevin Cernekee <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/igmp.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -89,6 +89,7 @@
#include <linux/rtnetlink.h>
#include <linux/times.h>
#include <linux/pkt_sched.h>
+#include <linux/byteorder/generic.h>

#include <net/net_namespace.h>
#include <net/arp.h>
@@ -321,6 +322,23 @@ igmp_scount(struct ip_mc_list *pmc, int
return scount;
}

+/* source address selection per RFC 3376 section 4.2.13 */
+static __be32 igmpv3_get_srcaddr(struct net_device *dev,
+ const struct flowi4 *fl4)
+{
+ struct in_device *in_dev = __in_dev_get_rcu(dev);
+
+ if (!in_dev)
+ return htonl(INADDR_ANY);
+
+ for_ifa(in_dev) {
+ if (inet_ifa_match(fl4->saddr, ifa))
+ return fl4->saddr;
+ } endfor_ifa(in_dev);
+
+ return htonl(INADDR_ANY);
+}
+
static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
{
struct sk_buff *skb;
@@ -368,7 +386,7 @@ static struct sk_buff *igmpv3_newpack(st
pip->frag_off = htons(IP_DF);
pip->ttl = 1;
pip->daddr = fl4.daddr;
- pip->saddr = fl4.saddr;
+ pip->saddr = igmpv3_get_srcaddr(dev, &fl4);
pip->protocol = IPPROTO_IGMP;
pip->tot_len = 0; /* filled in later */
ip_select_ident(net, skb, NULL);


2018-01-01 14:42:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 063/146] netlink: Add netns check on taps

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kevin Cernekee <[email protected]>


[ Upstream commit 93c647643b48f0131f02e45da3bd367d80443291 ]

Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity. Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.

Test case:

vpnns -- bash -c "ip link add nlmon0 type nlmon; \
ip link set nlmon0 up; \
tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
spi 0x1 mode transport \
auth sha1 0x6162633132330000000000000000000000000000 \
enc aes 0x00000000000000000000000000000000
grep --binary abc123 /tmp/nlmon.pcap

Signed-off-by: Kevin Cernekee <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/netlink/af_netlink.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -254,6 +254,9 @@ static int __netlink_deliver_tap_skb(str
struct sock *sk = skb->sk;
int ret = -ENOMEM;

+ if (!net_eq(dev_net(dev), sock_net(sk)))
+ return 0;
+
dev_hold(dev);

if (is_vmalloc_addr(skb->head))


2018-01-01 14:42:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 065/146] net: reevalulate autoflowlabel setting after sysctl setting

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shaohua Li <[email protected]>


[ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ]

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7c4
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Tom Herbert <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/ipv6.h | 3 ++-
net/ipv6/af_inet6.c | 1 -
net/ipv6/ip6_output.c | 12 ++++++++++--
net/ipv6/ipv6_sockglue.c | 1 +
4 files changed, 13 insertions(+), 4 deletions(-)

--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -272,7 +272,8 @@ struct ipv6_pinfo {
* 100: prefer care-of address
*/
dontfrag:1,
- autoflowlabel:1;
+ autoflowlabel:1,
+ autoflowlabel_set:1;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -210,7 +210,6 @@ lookup_protocol:
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
np->mc_loop = 1;
np->pmtudisc = IPV6_PMTUDISC_WANT;
- np->autoflowlabel = ip6_default_np_autolabel(net);
np->repflow = net->ipv6.sysctl.flowlabel_reflect;
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -166,6 +166,14 @@ int ip6_output(struct net *net, struct s
!(IP6CB(skb)->flags & IP6SKB_REROUTED));
}

+static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+{
+ if (!np->autoflowlabel_set)
+ return ip6_default_np_autolabel(net);
+ else
+ return np->autoflowlabel;
+}
+
/*
* xmit an sk_buff (used by TCP, SCTP and DCCP)
* Note : socket lock is not held for SYNACK packets, but might be modified
@@ -230,7 +238,7 @@ int ip6_xmit(const struct sock *sk, stru
hlimit = ip6_dst_hoplimit(dst);

ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));

hdr->payload_len = htons(seg_len);
hdr->nexthdr = proto;
@@ -1626,7 +1634,7 @@ struct sk_buff *__ip6_make_skb(struct so

ip6_flow_hdr(hdr, v6_cork->tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));
hdr->hop_limit = v6_cork->hop_limit;
hdr->nexthdr = proto;
hdr->saddr = fl6->saddr;
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -878,6 +878,7 @@ pref_skip_coa:
break;
case IPV6_AUTOFLOWLABEL:
np->autoflowlabel = valbool;
+ np->autoflowlabel_set = 1;
retv = 0;
break;
case IPV6_RECVFRAGSIZE:


2018-01-01 14:42:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 031/146] x86/mm/pti: Add Kconfig

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 385ce0ea4c078517fa51c261882c4e72fba53005 upstream.

Finally allow CONFIG_PAGE_TABLE_ISOLATION to be enabled.

PARAVIRT generally requires that the kernel not manage its own page tables.
It also means that the hypervisor and kernel must agree wholeheartedly
about what format the page tables are in and what they contain.
PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they
can not be used together.

I've seen conflicting feedback from maintainers lately about whether they
want the Kconfig magic to go first or last in a patch series. It's going
last here because the partially-applied series leads to kernels that can
not boot in a bunch of cases. I did a run through the entire series with
CONFIG_PAGE_TABLE_ISOLATION=y to look for build errors, though.

[ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
security/Kconfig | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/security/Kconfig
+++ b/security/Kconfig
@@ -54,6 +54,16 @@ config SECURITY_NETWORK
implement socket and networking access controls.
If you are unsure how to answer this question, answer N.

+config PAGE_TABLE_ISOLATION
+ bool "Remove the kernel mapping in user mode"
+ depends on X86_64 && !UML
+ help
+ This feature reduces the number of hardware side channels by
+ ensuring that the majority of kernel addresses are not mapped
+ into userspace.
+
+ See Documentation/x86/pagetable-isolation.txt for more details.
+
config SECURITY_INFINIBAND
bool "Infiniband Security Hooks"
depends on SECURITY && INFINIBAND


2018-01-01 14:42:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 033/146] x86/mm/dump_pagetables: Check user space page table for WX pages

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit b4bf4f924b1d7bade38fd51b2e401d20d0956e4d upstream.

ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
but does not check the PAGE_TABLE_ISOLATION user space page table.

Restructure the code so that dmesg output is selected by an explicit
argument and not implicit via checking the pgd argument for !NULL.

Add the check for the user space page table.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 1 +
arch/x86/mm/debug_pagetables.c | 2 +-
arch/x86/mm/dump_pagetables.c | 30 +++++++++++++++++++++++++-----
3 files changed, 27 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,6 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);

void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);

#ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@

static int ptdump_show(struct seq_file *m, void *v)
{
- ptdump_walk_pgd_level(m, NULL);
+ ptdump_walk_pgd_level_debugfs(m, NULL);
return 0;
}

--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -476,7 +476,7 @@ static inline bool is_hypervisor_range(i
}

static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
- bool checkwx)
+ bool checkwx, bool dmesg)
{
#ifdef CONFIG_X86_64
pgd_t *start = (pgd_t *) &init_top_pgt;
@@ -489,7 +489,7 @@ static void ptdump_walk_pgd_level_core(s

if (pgd) {
start = pgd;
- st.to_dmesg = true;
+ st.to_dmesg = dmesg;
}

st.check_wx = checkwx;
@@ -527,13 +527,33 @@ static void ptdump_walk_pgd_level_core(s

void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
{
- ptdump_walk_pgd_level_core(m, pgd, false);
+ ptdump_walk_pgd_level_core(m, pgd, false, true);
+}
+
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+{
+ ptdump_walk_pgd_level_core(m, pgd, false, false);
+}
+EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
+
+static void ptdump_walk_user_pgd_level_checkwx(void)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pgd_t *pgd = (pgd_t *) &init_top_pgt;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ pr_info("x86/mm: Checking user space page tables\n");
+ pgd = kernel_to_user_pgdp(pgd);
+ ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+#endif
}
-EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level);

void ptdump_walk_pgd_level_checkwx(void)
{
- ptdump_walk_pgd_level_core(NULL, NULL, true);
+ ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+ ptdump_walk_user_pgd_level_checkwx();
}

static int __init pt_dump_init(void)


2018-01-01 14:42:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 034/146] x86/mm/dump_pagetables: Allow dumping current pagetables

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit a4b51ef6552c704764684cef7e753162dc87c5fa upstream.

Add two debugfs files which allow to dump the pagetable of the current
task.

current_kernel dumps the regular page table. This is the page table which
is normally shared between kernel and user space. If kernel page table
isolation is enabled this is the kernel space mapping.

If kernel page table isolation is enabled the second file, current_user,
dumps the user space page table.

These files allow to verify the resulting page tables for page table
isolation, but even in the normal case its useful to be able to inspect
user space page tables of current for debugging purposes.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 2 -
arch/x86/mm/debug_pagetables.c | 71 ++++++++++++++++++++++++++++++++++++++---
arch/x86/mm/dump_pagetables.c | 6 ++-
3 files changed, 73 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,7 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);

void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
void ptdump_walk_pgd_level_checkwx(void);

#ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@

static int ptdump_show(struct seq_file *m, void *v)
{
- ptdump_walk_pgd_level_debugfs(m, NULL);
+ ptdump_walk_pgd_level_debugfs(m, NULL, false);
return 0;
}

@@ -22,7 +22,57 @@ static const struct file_operations ptdu
.release = single_release,
};

-static struct dentry *dir, *pe;
+static int ptdump_show_curknl(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(&current->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+ up_read(&current->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curknl(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curknl, NULL);
+}
+
+static const struct file_operations ptdump_curknl_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curknl,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static struct dentry *pe_curusr;
+
+static int ptdump_show_curusr(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(&current->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+ up_read(&current->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curusr(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curusr, NULL);
+}
+
+static const struct file_operations ptdump_curusr_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curusr,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
+static struct dentry *dir, *pe_knl, *pe_curknl;

static int __init pt_dump_debug_init(void)
{
@@ -30,9 +80,22 @@ static int __init pt_dump_debug_init(voi
if (!dir)
return -ENOMEM;

- pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops);
- if (!pe)
+ pe_knl = debugfs_create_file("kernel", 0400, dir, NULL,
+ &ptdump_fops);
+ if (!pe_knl)
+ goto err;
+
+ pe_curknl = debugfs_create_file("current_kernel", 0400,
+ dir, NULL, &ptdump_curknl_fops);
+ if (!pe_curknl)
+ goto err;
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pe_curusr = debugfs_create_file("current_user", 0400,
+ dir, NULL, &ptdump_curusr_fops);
+ if (!pe_curusr)
goto err;
+#endif
return 0;
err:
debugfs_remove_recursive(dir);
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -530,8 +530,12 @@ void ptdump_walk_pgd_level(struct seq_fi
ptdump_walk_pgd_level_core(m, pgd, false, true);
}

-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (user && static_cpu_has(X86_FEATURE_PTI))
+ pgd = kernel_to_user_pgdp(pgd);
+#endif
ptdump_walk_pgd_level_core(m, pgd, false, false);
}
EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);


2018-01-01 14:42:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 035/146] x86/ldt: Make the LDT mapping RO

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 9f5cb6b32d9e0a3a7453222baaf15664d92adbf2 upstream.

Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is
enabled its a primary target for attacks, if a user space interface fails
to validate a write address correctly. That can never happen, right?

The SDM states:

If the segment descriptors in the GDT or an LDT are placed in ROM, the
processor can enter an indefinite loop if software or the processor
attempts to update (write to) the ROM-based segment descriptors. To
prevent this problem, set the accessed bits for all segment descriptors
placed in a ROM. Also, remove operating-system or executive code that
attempts to modify segment descriptors located in ROM.

So its a valid approach to set the ACCESS bit when setting up the LDT entry
and to map the table RO. Fixup the selftest so it can handle that new mode.

Remove the manual ACCESS bit setter in set_tls_desc() as this is now
pointless. Folded the patch from Peter Ziljstra.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/desc.h | 2 ++
arch/x86/kernel/ldt.c | 7 ++++++-
arch/x86/kernel/tls.c | 11 ++---------
tools/testing/selftests/x86/ldt_gdt.c | 3 +--
4 files changed, 11 insertions(+), 12 deletions(-)

--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -21,6 +21,8 @@ static inline void fill_ldt(struct desc_

desc->type = (info->read_exec_only ^ 1) << 1;
desc->type |= info->contents << 2;
+ /* Set the ACCESS bit so it can be mapped RO */
+ desc->type |= 1;

desc->s = 1;
desc->dpl = 0x3;
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -158,7 +158,12 @@ map_ldt_struct(struct mm_struct *mm, str
ptep = get_locked_pte(mm, va, &ptl);
if (!ptep)
return -ENOMEM;
- pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL));
+ /*
+ * Map it RO so the easy to find address is not a primary
+ * target via some kernel interface which misses a
+ * permission check.
+ */
+ pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL));
set_pte_at(mm, va, ptep, pte);
pte_unmap_unlock(ptep, ptl);
}
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -93,17 +93,10 @@ static void set_tls_desc(struct task_str
cpu = get_cpu();

while (n-- > 0) {
- if (LDT_empty(info) || LDT_zero(info)) {
+ if (LDT_empty(info) || LDT_zero(info))
memset(desc, 0, sizeof(*desc));
- } else {
+ else
fill_ldt(desc, info);
-
- /*
- * Always set the accessed bit so that the CPU
- * doesn't try to write to the (read-only) GDT.
- */
- desc->type |= 1;
- }
++info;
++desc;
}
--- a/tools/testing/selftests/x86/ldt_gdt.c
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -122,8 +122,7 @@ static void check_valid_segment(uint16_t
* NB: Different Linux versions do different things with the
* accessed bit in set_thread_area().
*/
- if (ar != expected_ar &&
- (ldt || ar != (expected_ar | AR_ACCESSED))) {
+ if (ar != expected_ar && ar != (expected_ar | AR_ACCESSED)) {
printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
(ldt ? "LDT" : "GDT"), index, ar, expected_ar);
nerrs++;


2018-01-01 14:43:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 027/146] x86/mm: Optimize RESTORE_CR3

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit 21e94459110252d41b45c0c8ba50fd72a664d50c upstream.

Most NMI/paranoid exceptions will not in fact change pagetables and would
thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3
writes.

Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the
kernel mappings and now that we track which user PCIDs need flushing we can
avoid those too when possible.

This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both
sites have plenty available.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/calling.h | 30 ++++++++++++++++++++++++++++--
arch/x86/entry/entry_64.S | 4 ++--
2 files changed, 30 insertions(+), 4 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -281,8 +281,34 @@ For 32-bit we have the following convent
.Ldone_\@:
.endm

-.macro RESTORE_CR3 save_reg:req
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * KERNEL pages can always resume with NOFLUSH as we do
+ * explicit flushes.
+ */
+ bt $X86_CR3_PTI_SWITCH_BIT, \save_reg
+ jnc .Lnoflush_\@
+
+ /*
+ * Check if there's a pending flush for the user ASID we're
+ * about to set.
+ */
+ movq \save_reg, \scratch_reg
+ andq $(0x7FF), \scratch_reg
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ SET_NOFLUSH_BIT \save_reg
+
+.Lwrcr3_\@:
/*
* The CR3 write could be avoided when not changing its value,
* but would require a CR3 read *and* a scratch register.
@@ -301,7 +327,7 @@ For 32-bit we have the following convent
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
.endm
-.macro RESTORE_CR3 save_reg:req
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
.endm

#endif
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1288,7 +1288,7 @@ ENTRY(paranoid_exit)
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
- RESTORE_CR3 save_reg=%r14
+ RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
.Lparanoid_exit_no_swapgs:
@@ -1730,7 +1730,7 @@ end_repeat_nmi:
movq $-1, %rsi
call do_nmi

- RESTORE_CR3 save_reg=%r14
+ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14

testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore


2018-01-01 14:43:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 067/146] RDS: Check cmsg_len before dereferencing CMSG_DATA

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Avinash Repaka <[email protected]>


[ Upstream commit 14e138a86f6347c6199f610576d2e11c03bec5f0 ]

RDS currently doesn't check if the length of the control message is
large enough to hold the required data, before dereferencing the control
message data. This results in following crash:

BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
[inline]
BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
net/rds/send.c:1066
Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157

CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
rds_rdma_bytes net/rds/send.c:1013 [inline]
rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
sock_sendmsg_nosec net/socket.c:628 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:638
___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
__sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
SYSC_sendmmsg net/socket.c:2139 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2134
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x43fe49
RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000

To fix this, we verify that the cmsg_len is large enough to hold the
data to be read, before proceeding further.

Reported-by: syzbot <[email protected]>
Signed-off-by: Avinash Repaka <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Reviewed-by: Yuval Shaia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/rds/send.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1009,6 +1009,9 @@ static int rds_rdma_bytes(struct msghdr
continue;

if (cmsg->cmsg_type == RDS_CMSG_RDMA_ARGS) {
+ if (cmsg->cmsg_len <
+ CMSG_LEN(sizeof(struct rds_rdma_args)))
+ return -EINVAL;
args = CMSG_DATA(cmsg);
*rdma_bytes += args->remote_vec.bytes;
}


2018-01-01 14:43:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 076/146] s390/qeth: update takeover IPs after configuration change

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 02f510f326501470348a5df341e8232c3497bbbb ]

Any modification to the takeover IP-ranges requires that we re-evaluate
which IP addresses are takeover-eligible. Otherwise we might do takeover
for some addresses when we no longer should, or vice-versa.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_core.h | 4 +-
drivers/s390/net/qeth_core_main.c | 4 +-
drivers/s390/net/qeth_l3.h | 2 -
drivers/s390/net/qeth_l3_main.c | 31 ++++++++++++++++--
drivers/s390/net/qeth_l3_sys.c | 63 ++++++++++++++++++++------------------
5 files changed, 67 insertions(+), 37 deletions(-)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -565,8 +565,8 @@ enum qeth_cq {

struct qeth_ipato {
bool enabled;
- int invert4;
- int invert6;
+ bool invert4;
+ bool invert6;
struct list_head entries;
};

--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1480,8 +1480,8 @@ static int qeth_setup_card(struct qeth_c
/* IP address takeover */
INIT_LIST_HEAD(&card->ipato.entries);
card->ipato.enabled = false;
- card->ipato.invert4 = 0;
- card->ipato.invert6 = 0;
+ card->ipato.invert4 = false;
+ card->ipato.invert6 = false;
/* init QDIO stuff */
qeth_init_qdio_info(card);
INIT_DELAYED_WORK(&card->buffer_reclaim_work, qeth_buffer_reclaim_work);
--- a/drivers/s390/net/qeth_l3.h
+++ b/drivers/s390/net/qeth_l3.h
@@ -82,7 +82,7 @@ void qeth_l3_del_vipa(struct qeth_card *
int qeth_l3_add_rxip(struct qeth_card *, enum qeth_prot_versions, const u8 *);
void qeth_l3_del_rxip(struct qeth_card *card, enum qeth_prot_versions,
const u8 *);
-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *, struct qeth_ipaddr *);
+void qeth_l3_update_ipato(struct qeth_card *card);
struct qeth_ipaddr *qeth_l3_get_addr_buffer(enum qeth_prot_versions);
int qeth_l3_add_ip(struct qeth_card *, struct qeth_ipaddr *);
int qeth_l3_delete_ip(struct qeth_card *, struct qeth_ipaddr *);
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -163,8 +163,8 @@ static void qeth_l3_convert_addr_to_bits
}
}

-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
- struct qeth_ipaddr *addr)
+static bool qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
+ struct qeth_ipaddr *addr)
{
struct qeth_ipato_entry *ipatoe;
u8 addr_bits[128] = {0, };
@@ -605,6 +605,27 @@ int qeth_l3_setrouting_v6(struct qeth_ca
/*
* IP address takeover related functions
*/
+
+/**
+ * qeth_l3_update_ipato() - Update 'takeover' property, for all NORMAL IPs.
+ *
+ * Caller must hold ip_lock.
+ */
+void qeth_l3_update_ipato(struct qeth_card *card)
+{
+ struct qeth_ipaddr *addr;
+ unsigned int i;
+
+ hash_for_each(card->ip_htable, i, addr, hnode) {
+ if (addr->type != QETH_IP_TYPE_NORMAL)
+ continue;
+ if (qeth_l3_is_addr_covered_by_ipato(card, addr))
+ addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+ else
+ addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
+ }
+}
+
static void qeth_l3_clear_ipato_list(struct qeth_card *card)
{
struct qeth_ipato_entry *ipatoe, *tmp;
@@ -616,6 +637,7 @@ static void qeth_l3_clear_ipato_list(str
kfree(ipatoe);
}

+ qeth_l3_update_ipato(card);
spin_unlock_bh(&card->ip_lock);
}

@@ -640,8 +662,10 @@ int qeth_l3_add_ipato_entry(struct qeth_
}
}

- if (!rc)
+ if (!rc) {
list_add_tail(&new->entry, &card->ipato.entries);
+ qeth_l3_update_ipato(card);
+ }

spin_unlock_bh(&card->ip_lock);

@@ -664,6 +688,7 @@ void qeth_l3_del_ipato_entry(struct qeth
(proto == QETH_PROT_IPV4)? 4:16) &&
(ipatoe->mask_bits == mask_bits)) {
list_del(&ipatoe->entry);
+ qeth_l3_update_ipato(card);
kfree(ipatoe);
}
}
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -370,9 +370,8 @@ static ssize_t qeth_l3_dev_ipato_enable_
struct device_attribute *attr, const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
- struct qeth_ipaddr *addr;
- int i, rc = 0;
bool enable;
+ int rc = 0;

if (!card)
return -EINVAL;
@@ -391,20 +390,12 @@ static ssize_t qeth_l3_dev_ipato_enable_
goto out;
}

- if (card->ipato.enabled == enable)
- goto out;
- card->ipato.enabled = enable;
-
- spin_lock_bh(&card->ip_lock);
- hash_for_each(card->ip_htable, i, addr, hnode) {
- if (addr->type != QETH_IP_TYPE_NORMAL)
- continue;
- if (!enable)
- addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
- else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
- addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+ if (card->ipato.enabled != enable) {
+ card->ipato.enabled = enable;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
}
- spin_unlock_bh(&card->ip_lock);
out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
@@ -430,20 +421,27 @@ static ssize_t qeth_l3_dev_ipato_invert4
const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
+ bool invert;
int rc = 0;

if (!card)
return -EINVAL;

mutex_lock(&card->conf_mutex);
- if (sysfs_streq(buf, "toggle"))
- card->ipato.invert4 = (card->ipato.invert4)? 0 : 1;
- else if (sysfs_streq(buf, "1"))
- card->ipato.invert4 = 1;
- else if (sysfs_streq(buf, "0"))
- card->ipato.invert4 = 0;
- else
+ if (sysfs_streq(buf, "toggle")) {
+ invert = !card->ipato.invert4;
+ } else if (kstrtobool(buf, &invert)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.invert4 != invert) {
+ card->ipato.invert4 = invert;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
+ }
+out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
}
@@ -609,20 +607,27 @@ static ssize_t qeth_l3_dev_ipato_invert6
struct device_attribute *attr, const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
+ bool invert;
int rc = 0;

if (!card)
return -EINVAL;

mutex_lock(&card->conf_mutex);
- if (sysfs_streq(buf, "toggle"))
- card->ipato.invert6 = (card->ipato.invert6)? 0 : 1;
- else if (sysfs_streq(buf, "1"))
- card->ipato.invert6 = 1;
- else if (sysfs_streq(buf, "0"))
- card->ipato.invert6 = 0;
- else
+ if (sysfs_streq(buf, "toggle")) {
+ invert = !card->ipato.invert6;
+ } else if (kstrtobool(buf, &invert)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.invert6 != invert) {
+ card->ipato.invert6 = invert;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
+ }
+out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
}


2018-01-01 14:43:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 077/146] net: ipv4: fix for a race condition in raw_sendmsg

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mohamed Ghannam <[email protected]>


[ Upstream commit 8f659a03a0ba9289b9aeb9b4470e6fb263d6f483 ]

inet->hdrincl is racy, and could lead to uninitialized stack pointer
usage, so its value should be read only once.

Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt")
Signed-off-by: Mohamed Ghannam <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/raw.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -513,11 +513,16 @@ static int raw_sendmsg(struct sock *sk,
int err;
struct ip_options_data opt_copy;
struct raw_frag_vec rfv;
+ int hdrincl;

err = -EMSGSIZE;
if (len > 0xFFFF)
goto out;

+ /* hdrincl should be READ_ONCE(inet->hdrincl)
+ * but READ_ONCE() doesn't work with bit fields
+ */
+ hdrincl = inet->hdrincl;
/*
* Check the flags.
*/
@@ -593,7 +598,7 @@ static int raw_sendmsg(struct sock *sk,
/* Linux does not mangle headers on raw sockets,
* so that IP options + IP_HDRINCL is non-sense.
*/
- if (inet->hdrincl)
+ if (hdrincl)
goto done;
if (ipc.opt->opt.srr) {
if (!daddr)
@@ -615,12 +620,12 @@ static int raw_sendmsg(struct sock *sk,

flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
RT_SCOPE_UNIVERSE,
- inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? IPPROTO_RAW : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
- (inet->hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
+ (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);

- if (!inet->hdrincl) {
+ if (!hdrincl) {
rfv.msg = msg;
rfv.hlen = 0;

@@ -645,7 +650,7 @@ static int raw_sendmsg(struct sock *sk,
goto do_confirm;
back_from_confirm:

- if (inet->hdrincl)
+ if (hdrincl)
err = raw_send_hdrinc(sk, &fl4, msg, len,
&rt, msg->msg_flags, &ipc.sockc);



2018-01-01 14:43:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 078/146] net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tobias Jordan <[email protected]>


[ Upstream commit 589bf32f09852041fbd3b7ce1a9e703f95c230ba ]

add appropriate calls to clk_disable_unprepare() by jumping to out_mdio
in case orion_mdio_probe() returns -EPROBE_DEFER.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock")
Signed-off-by: Tobias Jordan <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/mvmdio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -344,7 +344,8 @@ static int orion_mdio_probe(struct platf
dev->regs + MVMDIO_ERR_INT_MASK);

} else if (dev->err_interrupt == -EPROBE_DEFER) {
- return -EPROBE_DEFER;
+ ret = -EPROBE_DEFER;
+ goto out_mdio;
}

if (pdev->dev.of_node)


2018-01-01 14:43:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 081/146] ip6_gre: fix device features for ioctl setup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <[email protected]>


[ Upstream commit e5a9336adb317db55eb3fe8200856096f3c71109 ]

When ip6gre is created using ioctl, its features, such as
scatter-gather, GSO and tx-checksumming will be turned off:

# ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
# ethtool -k gre6 (truncated output)
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
generic-segmentation-offload: off [requested on]

But when netlink is used, they will be enabled:
# ip link add gre6 type ip6gre remote fd00::1
# ethtool -k gre6 (truncated output)
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
generic-segmentation-offload: on

This results in a loss of performance when gre6 is created via ioctl.
The issue was found with LTP/gre tests.

Fix it by moving the setup of device features to a separate function
and invoke it with ndo_init callback because both netlink and ioctl
will eventually call it via register_netdevice():

register_netdevice()
- ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
- ip6gre_tunnel_init_common()
- ip6gre_tnl_init_features()

The moved code also contains two minor style fixes:
* removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
* fixed the issue reported by checkpatch: "Unnecessary parentheses around
'nt->encap.type == TUNNEL_ENCAP_NONE'"

Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO")
Signed-off-by: Alexey Kodanev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_gre.c | 57 +++++++++++++++++++++++++++++------------------------
1 file changed, 32 insertions(+), 25 deletions(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1020,6 +1020,36 @@ static void ip6gre_tunnel_setup(struct n
eth_random_addr(dev->perm_addr);
}

+#define GRE6_FEATURES (NETIF_F_SG | \
+ NETIF_F_FRAGLIST | \
+ NETIF_F_HIGHDMA | \
+ NETIF_F_HW_CSUM)
+
+static void ip6gre_tnl_init_features(struct net_device *dev)
+{
+ struct ip6_tnl *nt = netdev_priv(dev);
+
+ dev->features |= GRE6_FEATURES;
+ dev->hw_features |= GRE6_FEATURES;
+
+ if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
+ /* TCP offload with GRE SEQ is not supported, nor
+ * can we support 2 levels of outer headers requiring
+ * an update.
+ */
+ if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
+ nt->encap.type == TUNNEL_ENCAP_NONE) {
+ dev->features |= NETIF_F_GSO_SOFTWARE;
+ dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+ }
+
+ /* Can use a lockless transmit, unless we generate
+ * output sequences
+ */
+ dev->features |= NETIF_F_LLTX;
+ }
+}
+
static int ip6gre_tunnel_init_common(struct net_device *dev)
{
struct ip6_tnl *tunnel;
@@ -1054,6 +1084,8 @@ static int ip6gre_tunnel_init_common(str
if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
dev->mtu -= 8;

+ ip6gre_tnl_init_features(dev);
+
return 0;
}

@@ -1302,11 +1334,6 @@ static const struct net_device_ops ip6gr
.ndo_get_iflink = ip6_tnl_get_iflink,
};

-#define GRE6_FEATURES (NETIF_F_SG | \
- NETIF_F_FRAGLIST | \
- NETIF_F_HIGHDMA | \
- NETIF_F_HW_CSUM)
-
static void ip6gre_tap_setup(struct net_device *dev)
{

@@ -1386,26 +1413,6 @@ static int ip6gre_newlink(struct net *sr
nt->net = dev_net(dev);
ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);

- dev->features |= GRE6_FEATURES;
- dev->hw_features |= GRE6_FEATURES;
-
- if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
- /* TCP offload with GRE SEQ is not supported, nor
- * can we support 2 levels of outer headers requiring
- * an update.
- */
- if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
- (nt->encap.type == TUNNEL_ENCAP_NONE)) {
- dev->features |= NETIF_F_GSO_SOFTWARE;
- dev->hw_features |= NETIF_F_GSO_SOFTWARE;
- }
-
- /* Can use a lockless transmit, unless we generate
- * output sequences
- */
- dev->features |= NETIF_F_LLTX;
- }
-
err = register_netdevice(dev);
if (err)
goto out;


2018-01-01 14:43:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 083/146] net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <[email protected]>


[ Upstream commit 84aeb437ab98a2bce3d4b2111c79723aedfceb33 ]

The early call to br_stp_change_bridge_id in bridge's newlink can cause
a memory leak if an error occurs during the newlink because the fdb
entries are not cleaned up if a different lladdr was specified, also
another minor issue is that it generates fdb notifications with
ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
which get added on NETDEV_REGISTER event, but are not cleaned up in the
newlink error path. To remove this special case the call to
br_stp_change_bridge_id is done after netdev register and we cleanup the
bridge on changelink error via br_dev_delete to plug all leaks.

This patch makes netlink bridge destruction on newlink error the same as
dellink and ioctl del which is necessary since at that point we have a
fully initialized bridge device.

To reproduce the issue:
$ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
RTNETLINK answers: Invalid argument

$ rmmod bridge
[ 1822.142525] =============================================================================
[ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
[ 1822.144821] -----------------------------------------------------------------------------

[ 1822.145990] Disabling lock debugging due to kernel taint
[ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
[ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87
[ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1822.150008] Call Trace:
[ 1822.150510] dump_stack+0x78/0xa9
[ 1822.151156] slab_err+0xb1/0xd3
[ 1822.151834] ? __kmalloc+0x1bb/0x1ce
[ 1822.152546] __kmem_cache_shutdown+0x151/0x28b
[ 1822.153395] shutdown_cache+0x13/0x144
[ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb
[ 1822.154669] SyS_delete_module+0x194/0x244
[ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a
[ 1822.156343] RIP: 0033:0x7f929bd38b17
[ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
[ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
[ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
[ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
[ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
[ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
[ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128

Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_netlink.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1223,19 +1223,20 @@ static int br_dev_newlink(struct net *sr
struct net_bridge *br = netdev_priv(dev);
int err;

+ err = register_netdevice(dev);
+ if (err)
+ return err;
+
if (tb[IFLA_ADDRESS]) {
spin_lock_bh(&br->lock);
br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
spin_unlock_bh(&br->lock);
}

- err = register_netdevice(dev);
- if (err)
- return err;
-
err = br_changelink(dev, tb, data, extack);
if (err)
- unregister_netdevice(dev);
+ br_dev_delete(dev, NULL);
+
return err;
}



2018-01-01 14:43:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 084/146] net: Fix double free and memory corruption in get_net_ns_by_id()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <[email protected]>


[ Upstream commit 21b5944350052d2583e82dd59b19a9ba94a007f0 ]

(I can trivially verify that that idr_remove in cleanup_net happens
after the network namespace count has dropped to zero --EWB)

Function get_net_ns_by_id() does not check for net::count
after it has found a peer in netns_ids idr.

It may dereference a peer, after its count has already been
finaly decremented. This leads to double free and memory
corruption:

put_net(peer) rtnl_lock()
atomic_dec_and_test(&peer->count) [count=0] ...
__put_net(peer) get_net_ns_by_id(net, id)
spin_lock(&cleanup_list_lock)
list_add(&net->cleanup_list, &cleanup_list)
spin_unlock(&cleanup_list_lock)
queue_work() peer = idr_find(&net->netns_ids, id)
| get_net(peer) [count=1]
| ...
| (use after final put)
v ...
cleanup_net() ...
spin_lock(&cleanup_list_lock) ...
list_replace_init(&cleanup_list, ..) ...
spin_unlock(&cleanup_list_lock) ...
... ...
... put_net(peer)
... atomic_dec_and_test(&peer->count) [count=0]
... spin_lock(&cleanup_list_lock)
... list_add(&net->cleanup_list, &cleanup_list)
... spin_unlock(&cleanup_list_lock)
... queue_work()
... rtnl_unlock()
rtnl_lock() ...
for_each_net(tmp) { ...
id = __peernet2id(tmp, peer) ...
spin_lock_irq(&tmp->nsid_lock) ...
idr_remove(&tmp->netns_ids, id) ...
... ...
net_drop_ns() ...
net_free(peer) ...
} ...
|
v
cleanup_net()
...
(Second free of peer)

Also, put_net() on the right cpu may reorder with left's cpu
list_replace_init(&cleanup_list, ..), and then cleanup_list
will be corrupted.

Since cleanup_net() is executed in worker thread, while
put_net(peer) can happen everywhere, there should be
enough time for concurrent get_net_ns_by_id() to pick
the peer up, and the race does not seem to be unlikely.
The patch fixes the problem in standard way.

(Also, there is possible problem in peernet2id_alloc(), which requires
check for net::count under nsid_lock and maybe_get_net(peer), but
in current stable kernel it's used under rtnl_lock() and it has to be
safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
be fixed too. While this is not in stable kernel yet, so I'll send
a separate message to netdev@ later).

Cc: Nicolas Dichtel <[email protected]>
Signed-off-by: Kirill Tkhai <[email protected]>
Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
Reviewed-by: Andrey Ryabinin <[email protected]>
Reviewed-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/net_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -266,7 +266,7 @@ struct net *get_net_ns_by_id(struct net
spin_lock_bh(&net->nsid_lock);
peer = idr_find(&net->netns_ids, id);
if (peer)
- get_net(peer);
+ peer = maybe_get_net(peer);
spin_unlock_bh(&net->nsid_lock);
rcu_read_unlock();



2018-01-01 14:43:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 085/146] net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Grygorii Strashko <[email protected]>


[ Upstream commit c1a8d0a3accf64a014d605e6806ce05d1c17adf1 ]

Under some circumstances driver will perform PHY reset in
ksz9031_read_status() to fix autoneg failure case (idle error count =
0xFF). When this happens ksz9031 will not detect link status change any
more when connecting to Netgear 1G switch (link can be recovered sometimes by
restarting netdevice "ifconfig down up"). Reproduced with TI am572x board
equipped with ksz9031 PHY while connecting to Netgear 1G switch.

Fix the issue by reconfiguring autonegotiation after PHY reset in
ksz9031_read_status().

Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg")
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/micrel.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -622,6 +622,7 @@ static int ksz9031_read_status(struct ph
phydev->link = 0;
if (phydev->drv->config_intr && phy_interrupt_is_valid(phydev))
phydev->drv->config_intr(phydev);
+ return genphy_config_aneg(phydev);
}

return 0;


2018-01-01 14:43:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 068/146] tcp_bbr: record "full bw reached" decision in new full_bw_reached bit

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit c589e69b508d29ed8e644dfecda453f71c02ec27 ]

This commit records the "full bw reached" decision in a new
full_bw_reached bit. This is a pure refactor that does not change the
current behavior, but enables subsequent fixes and improvements.

In particular, this enables simple and clean fixes because the full_bw
and full_bw_cnt can be unconditionally zeroed without worrying about
forgetting that we estimated we filled the pipe in Startup. And it
enables future improvements because multiple code paths can be used
for estimating that we filled the pipe in Startup; any new code paths
only need to set this bit when they think the pipe is full.

Note that this fix intentionally reduces the width of the full_bw_cnt
counter, since we have never used the most significant bit.

Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -110,7 +110,8 @@ struct bbr {
u32 lt_last_lost; /* LT intvl start: tp->lost */
u32 pacing_gain:10, /* current gain for setting pacing rate */
cwnd_gain:10, /* current gain for setting cwnd */
- full_bw_cnt:3, /* number of rounds without large bw gains */
+ full_bw_reached:1, /* reached full bw in Startup? */
+ full_bw_cnt:2, /* number of rounds without large bw gains */
cycle_idx:3, /* current index in pacing_gain cycle array */
has_seen_rtt:1, /* have we seen an RTT sample yet? */
unused_b:5;
@@ -180,7 +181,7 @@ static bool bbr_full_bw_reached(const st
{
const struct bbr *bbr = inet_csk_ca(sk);

- return bbr->full_bw_cnt >= bbr_full_bw_cnt;
+ return bbr->full_bw_reached;
}

/* Return the windowed max recent bandwidth sample, in pkts/uS << BW_SCALE. */
@@ -717,6 +718,7 @@ static void bbr_check_full_bw_reached(st
return;
}
++bbr->full_bw_cnt;
+ bbr->full_bw_reached = bbr->full_bw_cnt >= bbr_full_bw_cnt;
}

/* If pipe is probably full, drain the queue and then enter steady-state. */
@@ -850,6 +852,7 @@ static void bbr_init(struct sock *sk)
bbr->restore_cwnd = 0;
bbr->round_start = 0;
bbr->idle_restart = 0;
+ bbr->full_bw_reached = 0;
bbr->full_bw = 0;
bbr->full_bw_cnt = 0;
bbr->cycle_mstamp = 0;


2018-01-01 14:43:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 086/146] sock: free skb in skb_complete_tx_timestamp on error

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>


[ Upstream commit 35b99dffc3f710cafceee6c8c6ac6a98eb2cb4bf ]

skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4296,7 +4296,7 @@ void skb_complete_tx_timestamp(struct sk
struct sock *sk = skb->sk;

if (!skb_may_tx_timestamp(sk, false))
- return;
+ goto err;

/* Take a reference to prevent skb_orphan() from freeing the socket,
* but only if the socket refcount is not zero.
@@ -4305,7 +4305,11 @@ void skb_complete_tx_timestamp(struct sk
*skb_hwtstamps(skb) = *hwtstamps;
__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, false);
sock_put(sk);
+ return;
}
+
+err:
+ kfree_skb(skb);
}
EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);



2018-01-01 14:44:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 088/146] net/mlx5: Fix rate limit packet pacing naming and struct

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eran Ben Elisha <[email protected]>


[ Upstream commit 37e92a9d4fe38dc3e7308913575983a6a088c8d4 ]

In mlx5_ifc, struct size was not complete, and thus driver was sending
garbage after the last defined field. Fixed it by adding reserved field
to complete the struct size.

In addition, rename all set_rate_limit to set_pp_rate_limit to be
compliant with the Firmware <-> Driver definition.

Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: 1466cc5b23d1 ("net/mlx5: Rate limit tables support")
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/rl.c | 22 +++++++++++-----------
include/linux/mlx5/mlx5_ifc.h | 8 +++++---
3 files changed, 18 insertions(+), 16 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -362,7 +362,7 @@ static int mlx5_internal_err_ret_value(s
case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
case MLX5_CMD_OP_ALLOC_Q_COUNTER:
case MLX5_CMD_OP_QUERY_Q_COUNTER:
- case MLX5_CMD_OP_SET_RATE_LIMIT:
+ case MLX5_CMD_OP_SET_PP_RATE_LIMIT:
case MLX5_CMD_OP_QUERY_RATE_LIMIT:
case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
@@ -505,7 +505,7 @@ const char *mlx5_command_str(int command
MLX5_COMMAND_STR_CASE(ALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(DEALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER);
- MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT);
+ MLX5_COMMAND_STR_CASE(SET_PP_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
--- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -125,16 +125,16 @@ static struct mlx5_rl_entry *find_rl_ent
return ret_entry;
}

-static int mlx5_set_rate_limit_cmd(struct mlx5_core_dev *dev,
+static int mlx5_set_pp_rate_limit_cmd(struct mlx5_core_dev *dev,
u32 rate, u16 index)
{
- u32 in[MLX5_ST_SZ_DW(set_rate_limit_in)] = {0};
- u32 out[MLX5_ST_SZ_DW(set_rate_limit_out)] = {0};
+ u32 in[MLX5_ST_SZ_DW(set_pp_rate_limit_in)] = {0};
+ u32 out[MLX5_ST_SZ_DW(set_pp_rate_limit_out)] = {0};

- MLX5_SET(set_rate_limit_in, in, opcode,
- MLX5_CMD_OP_SET_RATE_LIMIT);
- MLX5_SET(set_rate_limit_in, in, rate_limit_index, index);
- MLX5_SET(set_rate_limit_in, in, rate_limit, rate);
+ MLX5_SET(set_pp_rate_limit_in, in, opcode,
+ MLX5_CMD_OP_SET_PP_RATE_LIMIT);
+ MLX5_SET(set_pp_rate_limit_in, in, rate_limit_index, index);
+ MLX5_SET(set_pp_rate_limit_in, in, rate_limit, rate);
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
}

@@ -173,7 +173,7 @@ int mlx5_rl_add_rate(struct mlx5_core_de
entry->refcount++;
} else {
/* new rate limit */
- err = mlx5_set_rate_limit_cmd(dev, rate, entry->index);
+ err = mlx5_set_pp_rate_limit_cmd(dev, rate, entry->index);
if (err) {
mlx5_core_err(dev, "Failed configuring rate: %u (%d)\n",
rate, err);
@@ -209,7 +209,7 @@ void mlx5_rl_remove_rate(struct mlx5_cor
entry->refcount--;
if (!entry->refcount) {
/* need to remove rate */
- mlx5_set_rate_limit_cmd(dev, 0, entry->index);
+ mlx5_set_pp_rate_limit_cmd(dev, 0, entry->index);
entry->rate = 0;
}

@@ -262,8 +262,8 @@ void mlx5_cleanup_rl_table(struct mlx5_c
/* Clear all configured rates */
for (i = 0; i < table->max_size; i++)
if (table->rl_entry[i].rate)
- mlx5_set_rate_limit_cmd(dev, 0,
- table->rl_entry[i].index);
+ mlx5_set_pp_rate_limit_cmd(dev, 0,
+ table->rl_entry[i].index);

kfree(dev->priv.rl_table.rl_entry);
}
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -147,7 +147,7 @@ enum {
MLX5_CMD_OP_ALLOC_Q_COUNTER = 0x771,
MLX5_CMD_OP_DEALLOC_Q_COUNTER = 0x772,
MLX5_CMD_OP_QUERY_Q_COUNTER = 0x773,
- MLX5_CMD_OP_SET_RATE_LIMIT = 0x780,
+ MLX5_CMD_OP_SET_PP_RATE_LIMIT = 0x780,
MLX5_CMD_OP_QUERY_RATE_LIMIT = 0x781,
MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT = 0x782,
MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT = 0x783,
@@ -7233,7 +7233,7 @@ struct mlx5_ifc_add_vxlan_udp_dport_in_b
u8 vxlan_udp_port[0x10];
};

-struct mlx5_ifc_set_rate_limit_out_bits {
+struct mlx5_ifc_set_pp_rate_limit_out_bits {
u8 status[0x8];
u8 reserved_at_8[0x18];

@@ -7242,7 +7242,7 @@ struct mlx5_ifc_set_rate_limit_out_bits
u8 reserved_at_40[0x40];
};

-struct mlx5_ifc_set_rate_limit_in_bits {
+struct mlx5_ifc_set_pp_rate_limit_in_bits {
u8 opcode[0x10];
u8 reserved_at_10[0x10];

@@ -7255,6 +7255,8 @@ struct mlx5_ifc_set_rate_limit_in_bits {
u8 reserved_at_60[0x20];

u8 rate_limit[0x20];
+
+ u8 reserved_at_a0[0x160];
};

struct mlx5_ifc_access_register_out_bits {


2018-01-01 14:44:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 090/146] net/mlx5e: Fix features check of IPv6 traffic

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gal Pressman <[email protected]>


[ Upstream commit 2989ad1ec03021ee6d2193c35414f1d970a243de ]

The assumption that the next header field contains the transport
protocol is wrong for IPv6 packets with extension headers.
Instead, we should look the inner-most next header field in the buffer.
This will fix TSO offload for tunnels over IPv6 with extension headers.

Performance testing: 19.25x improvement, cool!
Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap.
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
TSO: Enabled
Before: 4,926.24 Mbps
Now : 94,827.91 Mbps

Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3554,6 +3554,7 @@ static netdev_features_t mlx5e_tunnel_fe
struct sk_buff *skb,
netdev_features_t features)
{
+ unsigned int offset = 0;
struct udphdr *udph;
u8 proto;
u16 port;
@@ -3563,7 +3564,7 @@ static netdev_features_t mlx5e_tunnel_fe
proto = ip_hdr(skb)->protocol;
break;
case htons(ETH_P_IPV6):
- proto = ipv6_hdr(skb)->nexthdr;
+ proto = ipv6_find_hdr(skb, &offset, -1, NULL, NULL);
break;
default:
goto out;


2018-01-01 14:44:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 091/146] net/mlx5e: Add refcount to VXLAN structure

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gal Pressman <[email protected]>


[ Upstream commit 23f4cc2cd9ed92570647220aca60d0197d8c1fa9 ]

A refcount mechanism must be implemented in order to prevent unwanted
scenarios such as:
- Open an IPv4 VXLAN interface
- Open an IPv6 VXLAN interface (different socket)
- Remove one of the interfaces

With current implementation, the UDP port will be removed from our VXLAN
database and turn off the offloads for the other interface, which is
still active.
The reference count mechanism will only allow UDP port removals once all
consumers are gone.

Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/vxlan.c | 50 ++++++++++++------------
drivers/net/ethernet/mellanox/mlx5/core/vxlan.h | 1
2 files changed, 28 insertions(+), 23 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
@@ -88,8 +88,11 @@ static void mlx5e_vxlan_add_port(struct
struct mlx5e_vxlan *vxlan;
int err;

- if (mlx5e_vxlan_lookup_port(priv, port))
+ vxlan = mlx5e_vxlan_lookup_port(priv, port);
+ if (vxlan) {
+ atomic_inc(&vxlan->refcount);
goto free_work;
+ }

if (mlx5e_vxlan_core_add_port_cmd(priv->mdev, port))
goto free_work;
@@ -99,6 +102,7 @@ static void mlx5e_vxlan_add_port(struct
goto err_delete_port;

vxlan->udp_port = port;
+ atomic_set(&vxlan->refcount, 1);

spin_lock_bh(&vxlan_db->lock);
err = radix_tree_insert(&vxlan_db->tree, vxlan->udp_port, vxlan);
@@ -116,32 +120,33 @@ free_work:
kfree(vxlan_work);
}

-static void __mlx5e_vxlan_core_del_port(struct mlx5e_priv *priv, u16 port)
+static void mlx5e_vxlan_del_port(struct work_struct *work)
{
+ struct mlx5e_vxlan_work *vxlan_work =
+ container_of(work, struct mlx5e_vxlan_work, work);
+ struct mlx5e_priv *priv = vxlan_work->priv;
struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan;
+ u16 port = vxlan_work->port;
struct mlx5e_vxlan *vxlan;
+ bool remove = false;

spin_lock_bh(&vxlan_db->lock);
- vxlan = radix_tree_delete(&vxlan_db->tree, port);
- spin_unlock_bh(&vxlan_db->lock);
-
+ vxlan = radix_tree_lookup(&vxlan_db->tree, port);
if (!vxlan)
- return;
-
- mlx5e_vxlan_core_del_port_cmd(priv->mdev, vxlan->udp_port);
-
- kfree(vxlan);
-}
+ goto out_unlock;

-static void mlx5e_vxlan_del_port(struct work_struct *work)
-{
- struct mlx5e_vxlan_work *vxlan_work =
- container_of(work, struct mlx5e_vxlan_work, work);
- struct mlx5e_priv *priv = vxlan_work->priv;
- u16 port = vxlan_work->port;
+ if (atomic_dec_and_test(&vxlan->refcount)) {
+ radix_tree_delete(&vxlan_db->tree, port);
+ remove = true;
+ }

- __mlx5e_vxlan_core_del_port(priv, port);
+out_unlock:
+ spin_unlock_bh(&vxlan_db->lock);

+ if (remove) {
+ mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
+ kfree(vxlan);
+ }
kfree(vxlan_work);
}

@@ -171,12 +176,11 @@ void mlx5e_vxlan_cleanup(struct mlx5e_pr
struct mlx5e_vxlan *vxlan;
unsigned int port = 0;

- spin_lock_bh(&vxlan_db->lock);
+ /* Lockless since we are the only radix-tree consumers, wq is disabled */
while (radix_tree_gang_lookup(&vxlan_db->tree, (void **)&vxlan, port, 1)) {
port = vxlan->udp_port;
- spin_unlock_bh(&vxlan_db->lock);
- __mlx5e_vxlan_core_del_port(priv, (u16)port);
- spin_lock_bh(&vxlan_db->lock);
+ radix_tree_delete(&vxlan_db->tree, port);
+ mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
+ kfree(vxlan);
}
- spin_unlock_bh(&vxlan_db->lock);
}
--- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h
@@ -36,6 +36,7 @@
#include "en.h"

struct mlx5e_vxlan {
+ atomic_t refcount;
u16 udp_port;
};



2018-01-01 14:44:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 092/146] net/mlx5e: Prevent possible races in VXLAN control flow

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gal Pressman <[email protected]>


[ Upstream commit 0c1cc8b2215f5122ca614b5adca60346018758c3 ]

When calling add/remove VXLAN port, a lock must be held in order to
prevent race scenarios when more than one add/remove happens at the
same time.
Fix by holding our state_lock (mutex) as done by all other parts of the
driver.
Note that the spinlock protecting the radix-tree is still needed in
order to synchronize radix-tree access from softirq context.

Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/vxlan.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
@@ -88,6 +88,7 @@ static void mlx5e_vxlan_add_port(struct
struct mlx5e_vxlan *vxlan;
int err;

+ mutex_lock(&priv->state_lock);
vxlan = mlx5e_vxlan_lookup_port(priv, port);
if (vxlan) {
atomic_inc(&vxlan->refcount);
@@ -117,6 +118,7 @@ err_free:
err_delete_port:
mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
free_work:
+ mutex_unlock(&priv->state_lock);
kfree(vxlan_work);
}

@@ -130,6 +132,7 @@ static void mlx5e_vxlan_del_port(struct
struct mlx5e_vxlan *vxlan;
bool remove = false;

+ mutex_lock(&priv->state_lock);
spin_lock_bh(&vxlan_db->lock);
vxlan = radix_tree_lookup(&vxlan_db->tree, port);
if (!vxlan)
@@ -147,6 +150,7 @@ out_unlock:
mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
kfree(vxlan);
}
+ mutex_unlock(&priv->state_lock);
kfree(vxlan_work);
}



2018-01-01 14:44:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 093/146] net/mlx5: Fix error flow in CREATE_QP command

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Moni Shoua <[email protected]>


[ Upstream commit dbff26e44dc3ec4de6578733b054a0114652a764 ]

In error flow, when DESTROY_QP command should be executed, the wrong
mailbox was set with data, not the one that is written to hardware,
Fix that.

Fixes: 09a7d9eca1a6 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -213,8 +213,8 @@ int mlx5_core_create_qp(struct mlx5_core
err_cmd:
memset(din, 0, sizeof(din));
memset(dout, 0, sizeof(dout));
- MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
- MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
+ MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP);
+ MLX5_SET(destroy_qp_in, din, qpn, qp->qpn);
mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout));
return err;
}


2018-01-01 14:44:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 069/146] tcp md5sig: Use skbs saddr when replying to an incoming segment

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Paasch <[email protected]>


[ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ]

The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.

Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.

This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.

Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -844,7 +844,7 @@ static void tcp_v4_reqsk_send_ack(const
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent,
0,
- tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
+ tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->saddr,
AF_INET),
inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0,
ip_hdr(skb)->tos);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -988,7 +988,7 @@ static void tcp_v6_reqsk_send_ack(const
req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent, sk->sk_bound_dev_if,
- tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
+ tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr),
0, 0);
}



2018-01-01 14:44:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 096/146] net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit 4b52d010113e11006a389f2a8315167ede9e0b10 ]

The PHY on BCM7278 has an additional bit that needs to be cleared:
IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out
of suspend/resume cycles.

Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/dsa/bcm_sf2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -167,7 +167,7 @@ static void bcm_sf2_gphy_enable_set(stru
reg = reg_readl(priv, REG_SPHY_CNTRL);
if (enable) {
reg |= PHY_RESET;
- reg &= ~(EXT_PWR_DOWN | IDDQ_BIAS | CK25_DIS);
+ reg &= ~(EXT_PWR_DOWN | IDDQ_BIAS | IDDQ_GLOBAL_PWR | CK25_DIS);
reg_writel(priv, reg, REG_SPHY_CNTRL);
udelay(21);
reg = reg_readl(priv, REG_SPHY_CNTRL);


2018-01-01 14:44:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 097/146] s390/qeth: fix error handling in checksum cmd callback

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit ad3cbf61332914711e5f506972b1dc9af8d62146 ]

Make sure to check both return code fields before processing the
response. Otherwise we risk operating on invalid data.

Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_core_main.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5445,6 +5445,13 @@ out:
}
EXPORT_SYMBOL_GPL(qeth_poll);

+static int qeth_setassparms_inspect_rc(struct qeth_ipa_cmd *cmd)
+{
+ if (!cmd->hdr.return_code)
+ cmd->hdr.return_code = cmd->data.setassparms.hdr.return_code;
+ return cmd->hdr.return_code;
+}
+
int qeth_setassparms_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
@@ -6304,7 +6311,7 @@ static int qeth_ipa_checksum_run_cmd_cb(
(struct qeth_checksum_cmd *)reply->param;

QETH_CARD_TEXT(card, 4, "chkdoccb");
- if (cmd->hdr.return_code)
+ if (qeth_setassparms_inspect_rc(cmd))
return 0;

memset(chksum_cb, 0, sizeof(*chksum_cb));


2018-01-01 14:44:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 098/146] sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 2342b8d95bcae5946e1b9b8d58645f37500ef2e7 ]

Now in sctp_setsockopt_reset_streams, it only does the check
optlen < sizeof(*params) for optlen. But it's not enough, as
params->srs_number_streams should also match optlen.

If the streams in params->srs_stream_list are less than stream
nums in params->srs_number_streams, later when dereferencing
the stream list, it could cause a slab-out-of-bounds crash, as
reported by syzbot.

This patch is to fix it by also checking the stream numbers in
sctp_setsockopt_reset_streams to make sure at least it's not
greater than the streams in the list.

Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/socket.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3874,13 +3874,17 @@ static int sctp_setsockopt_reset_streams
struct sctp_association *asoc;
int retval = -EINVAL;

- if (optlen < sizeof(struct sctp_reset_streams))
+ if (optlen < sizeof(*params))
return -EINVAL;

params = memdup_user(optval, optlen);
if (IS_ERR(params))
return PTR_ERR(params);

+ if (params->srs_number_streams * sizeof(__u16) >
+ optlen - sizeof(*params))
+ goto out;
+
asoc = sctp_id2assoc(sk, params->srs_assoc_id);
if (!asoc)
goto out;


2018-01-01 14:44:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 101/146] tcp: fix potential underestimation on rcv_rtt

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Wang <[email protected]>


[ Upstream commit 9ee11bd03cb1a5c3ca33c2bb70e7ed325f68890f ]

When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.

Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -521,9 +521,6 @@ static void tcp_rcv_rtt_update(struct tc
u32 new_sample = tp->rcv_rtt_est.rtt_us;
long m = sample;

- if (m == 0)
- m = 1;
-
if (new_sample != 0) {
/* If we sample in larger samples in the non-timestamp
* case, we could grossly overestimate the RTT especially
@@ -560,6 +557,8 @@ static inline void tcp_rcv_rtt_measure(s
if (before(tp->rcv_nxt, tp->rcv_rtt_est.seq))
return;
delta_us = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcv_rtt_est.time);
+ if (!delta_us)
+ delta_us = 1;
tcp_rcv_rtt_update(tp, delta_us, 1);

new_measure:
@@ -576,8 +575,11 @@ static inline void tcp_rcv_rtt_measure_t
(TCP_SKB_CB(skb)->end_seq -
TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) {
u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
- u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
+ u32 delta_us;

+ if (!delta)
+ delta = 1;
+ delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
tcp_rcv_rtt_update(tp, delta_us, 0);
}
}


2018-01-01 14:45:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 102/146] net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well.

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zhao Qiang <[email protected]>


[ Upstream commit c505873eaece2b4aefd07d339dc7e1400e0235ac ]

88E1145 also need this autoneg errata.

Fixes: f2899788353c ("net: phy: marvell: Limit errata to 88m1101")
Signed-off-by: Zhao Qiang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/marvell.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -2069,7 +2069,7 @@ static struct phy_driver marvell_drivers
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
.config_init = &m88e1145_config_init,
- .config_aneg = &marvell_config_aneg,
+ .config_aneg = &m88e1101_config_aneg,
.read_status = &genphy_read_status,
.ack_interrupt = &marvell_ack_interrupt,
.config_intr = &marvell_config_intr,


2018-01-01 14:45:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 103/146] ipv6: Honor specified parameters in fibmatch lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ido Schimmel <[email protected]>


[ Upstream commit 58acfd714e6b02e8617448b431c2b64a2f1f0792 ]

Currently, parameters such as oif and source address are not taken into
account during fibmatch lookup. Example (IPv4 for reference) before
patch:

$ ip -4 route show
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1

$ ip -6 route show
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy0 proto kernel metric 256 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium

$ ip -4 route get fibmatch 192.0.2.2 oif dummy0
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
$ ip -4 route get fibmatch 192.0.2.2 oif dummy1
RTNETLINK answers: No route to host

$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium

After:

$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
RTNETLINK answers: Network is unreachable

The problem stems from the fact that the necessary route lookup flags
are not set based on these parameters.

Instead of duplicating the same logic for fibmatch, we can simply
resolve the original route from its copy and dump it instead.

Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/route.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3700,19 +3700,13 @@ static int inet6_rtm_getroute(struct sk_
if (!ipv6_addr_any(&fl6.saddr))
flags |= RT6_LOOKUP_F_HAS_SADDR;

- if (!fibmatch)
- dst = ip6_route_input_lookup(net, dev, &fl6, flags);
- else
- dst = ip6_route_lookup(net, &fl6, 0);
+ dst = ip6_route_input_lookup(net, dev, &fl6, flags);

rcu_read_unlock();
} else {
fl6.flowi6_oif = oif;

- if (!fibmatch)
- dst = ip6_route_output(net, NULL, &fl6);
- else
- dst = ip6_route_lookup(net, &fl6, 0);
+ dst = ip6_route_output(net, NULL, &fl6);
}


@@ -3729,6 +3723,15 @@ static int inet6_rtm_getroute(struct sk_
goto errout;
}

+ if (fibmatch && rt->dst.from) {
+ struct rt6_info *ort = container_of(rt->dst.from,
+ struct rt6_info, dst);
+
+ dst_hold(&ort->dst);
+ ip6_rt_put(rt);
+ rt = ort;
+ }
+
skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb) {
ip6_rt_put(rt);


2018-01-01 14:45:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 104/146] tcp: refresh tcp_mstamp from timers callbacks

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 4688eb7cf3ae2c2721d1dacff5c1384cba47d176 ]

Only the retransmit timer currently refreshes tcp_mstamp

We should do the same for delayed acks and keepalives.

Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.

Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Cc: Mike Maloney <[email protected]>
Cc: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Mike Maloney <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_timer.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -264,6 +264,7 @@ void tcp_delack_timer_handler(struct soc
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
+ tcp_mstamp_refresh(tcp_sk(sk));
tcp_send_ack(sk);
__NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS);
}
@@ -627,6 +628,7 @@ static void tcp_keepalive_timer (unsigne
goto out;
}

+ tcp_mstamp_refresh(tp);
if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) {
if (tp->linger2 >= 0) {
const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN;


2018-01-01 14:45:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 070/146] tg3: Fix rx hang on MTU change with 5717/5719

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian King <[email protected]>


[ Upstream commit 748a240c589824e9121befb1cba5341c319885bc ]

This fixes a hang issue seen when changing the MTU size from 1500 MTU
to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom,
they've indicated that these chipsets have the same phy as the 57766
chipset, so the same workarounds apply. This has been tested by IBM
on both Power 8 and Power 9 systems as well as by Broadcom on x86
hardware and has been confirmed to resolve the hang issue.

Signed-off-by: Brian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -14227,7 +14227,9 @@ static int tg3_change_mtu(struct net_dev
/* Reset PHY, otherwise the read DMA engine will be in a mode that
* breaks all requests to 256 bytes.
*/
- if (tg3_asic_rev(tp) == ASIC_REV_57766)
+ if (tg3_asic_rev(tp) == ASIC_REV_57766 ||
+ tg3_asic_rev(tp) == ASIC_REV_5717 ||
+ tg3_asic_rev(tp) == ASIC_REV_5719)
reset_phy = true;

err = tg3_restart_hw(tp, reset_phy);


2018-01-01 14:45:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 071/146] tcp_bbr: reset full pipe detection on loss recovery undo

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 2f6c498e4f15d27852c04ed46d804a39137ba364 ]

Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.

Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.

Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -874,6 +874,10 @@ static u32 bbr_sndbuf_expand(struct sock
*/
static u32 bbr_undo_cwnd(struct sock *sk)
{
+ struct bbr *bbr = inet_csk_ca(sk);
+
+ bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
+ bbr->full_bw_cnt = 0;
return tcp_sk(sk)->snd_cwnd;
}



2018-01-01 14:45:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 072/146] tcp_bbr: reset long-term bandwidth sampling on loss recovery undo

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 600647d467c6d04b3954b41a6ee1795b5ae00550 ]

Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.

Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -878,6 +878,7 @@ static u32 bbr_undo_cwnd(struct sock *sk

bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
bbr->full_bw_cnt = 0;
+ bbr_reset_lt_bw_sampling(sk);
return tcp_sk(sk)->snd_cwnd;
}



2018-01-01 14:45:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 074/146] s390/qeth: dont apply takeover changes to RXIP

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit b22d73d6689fd902a66c08ebe71ab2f3b351e22f ]

When takeover is switched off, current code clears the 'TAKEOVER' flag on
all IPs. But the flag is also used for RXIP addresses, and those should
not be affected by the takeover mode.
Fix the behaviour by consistenly applying takover logic to NORMAL
addresses only.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_l3_main.c | 5 +++--
drivers/s390/net/qeth_l3_sys.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -173,6 +173,8 @@ int qeth_l3_is_addr_covered_by_ipato(str

if (!card->ipato.enabled)
return 0;
+ if (addr->type != QETH_IP_TYPE_NORMAL)
+ return 0;

qeth_l3_convert_addr_to_bits((u8 *) &addr->u, addr_bits,
(addr->proto == QETH_PROT_IPV4)? 4:16);
@@ -289,8 +291,7 @@ int qeth_l3_add_ip(struct qeth_card *car
memcpy(addr, tmp_addr, sizeof(struct qeth_ipaddr));
addr->ref_counter = 1;

- if (addr->type == QETH_IP_TYPE_NORMAL &&
- qeth_l3_is_addr_covered_by_ipato(card, addr)) {
+ if (qeth_l3_is_addr_covered_by_ipato(card, addr)) {
QETH_CARD_TEXT(card, 2, "tkovaddr");
addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
}
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -396,10 +396,11 @@ static ssize_t qeth_l3_dev_ipato_enable_
card->ipato.enabled = enable;

hash_for_each(card->ip_htable, i, addr, hnode) {
+ if (addr->type != QETH_IP_TYPE_NORMAL)
+ continue;
if (!enable)
addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
- else if (addr->type == QETH_IP_TYPE_NORMAL &&
- qeth_l3_is_addr_covered_by_ipato(card, addr))
+ else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
}
out:


2018-01-01 14:45:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 075/146] s390/qeth: lock IP table while applying takeover changes

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 8a03a3692b100d84785ee7a834e9215e304c9e00 ]

Modifying the flags of an IP addr object needs to be protected against
eg. concurrent removal of the same object from the IP table.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_l3_sys.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -395,6 +395,7 @@ static ssize_t qeth_l3_dev_ipato_enable_
goto out;
card->ipato.enabled = enable;

+ spin_lock_bh(&card->ip_lock);
hash_for_each(card->ip_htable, i, addr, hnode) {
if (addr->type != QETH_IP_TYPE_NORMAL)
continue;
@@ -403,6 +404,7 @@ static ssize_t qeth_l3_dev_ipato_enable_
else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
}
+ spin_unlock_bh(&card->ip_lock);
out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;


2018-01-01 14:45:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 066/146] ptr_ring: add barriers

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Michael S. Tsirkin" <[email protected]>


[ Upstream commit a8ceb5dbfde1092b466936bca0ff3be127ecf38e ]

Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array. This was observed causing crashes.

To fix, add memory barriers. The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian <[email protected]>
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/ptr_ring.h | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(stru

/* Note: callers invoking this in a loop must use a compiler barrier,
* for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
*/
static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
{
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;

+ /* Make sure the pointer we are storing points to a valid data. */
+ /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+ smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
@@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(s
if (ptr)
__ptr_ring_discard_one(r);

+ /* Make sure anyone accessing data through the pointer is up to date. */
+ /* Pairs with smp_wmb in __ptr_ring_produce. */
+ smp_read_barrier_depends();
return ptr;
}



2018-01-01 14:45:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 107/146] net: sched: fix static key imbalance in case of ingress/clsact_init error

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Pirko <[email protected]>


[ Upstream commit b59e6979a86384e68b0ab6ffeab11f0034fba82d ]

Move static key increments to the beginning of the init function
so they pair 1:1 with decrements in ingress/clsact_destroy,
which is called in case ingress/clsact_init fails.

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_ingress.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -59,11 +59,12 @@ static int ingress_init(struct Qdisc *sc
struct net_device *dev = qdisc_dev(sch);
int err;

+ net_inc_ingress_queue();
+
err = tcf_block_get(&q->block, &dev->ingress_cl_list);
if (err)
return err;

- net_inc_ingress_queue();
sch->flags |= TCQ_F_CPUSTATS;

return 0;
@@ -153,6 +154,9 @@ static int clsact_init(struct Qdisc *sch
struct net_device *dev = qdisc_dev(sch);
int err;

+ net_inc_ingress_queue();
+ net_inc_egress_queue();
+
err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list);
if (err)
return err;
@@ -161,9 +165,6 @@ static int clsact_init(struct Qdisc *sch
if (err)
return err;

- net_inc_ingress_queue();
- net_inc_egress_queue();
-
sch->flags |= TCQ_F_CPUSTATS;

return 0;


2018-01-01 14:45:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 116/146] skbuff: in skb_copy_ubufs unclone before releasing zerocopy

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>


skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.

Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.

But I forgot an edge case where such an skb arrives that is cloned.

The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.

But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).

Still, it is non-obvious that no path exists. And it is fragile to
rely on this.

Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1181,12 +1181,12 @@ int skb_copy_ubufs(struct sk_buff *skb,
int i, new_frags;
u32 d_off;

- if (!num_frags)
- goto release;
-
if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;

+ if (!num_frags)
+ goto release;
+
new_frags = (__skb_pagelen(skb) + PAGE_SIZE - 1) >> PAGE_SHIFT;
for (i = 0; i < new_frags; i++) {
page = alloc_page(gfp_mask);


2018-01-01 14:45:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 118/146] usbip: fix usbip bind writing random string after command in match_busid

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Juan Zea <[email protected]>

commit 544c4605acc5ae4afe7dd5914147947db182f2fb upstream.

usbip bind writes commands followed by random string when writing to
match_busid attribute in sysfs, caused by using full variable size
instead of string length.

Signed-off-by: Juan Zea <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/usb/usbip/src/utils.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/tools/usb/usbip/src/utils.c
+++ b/tools/usb/usbip/src/utils.c
@@ -30,6 +30,7 @@ int modify_match_busid(char *busid, int
char command[SYSFS_BUS_ID_SIZE + 4];
char match_busid_attr_path[SYSFS_PATH_MAX];
int rc;
+ int cmd_size;

snprintf(match_busid_attr_path, sizeof(match_busid_attr_path),
"%s/%s/%s/%s/%s/%s", SYSFS_MNT_PATH, SYSFS_BUS_NAME,
@@ -37,12 +38,14 @@ int modify_match_busid(char *busid, int
attr_name);

if (add)
- snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s", busid);
+ cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s",
+ busid);
else
- snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s", busid);
+ cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s",
+ busid);

rc = write_sysfs_attribute(match_busid_attr_path, command,
- sizeof(command));
+ cmd_size);
if (rc < 0) {
dbg("failed to write match_busid: %s", strerror(errno));
return -1;


2018-01-01 14:46:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 120/146] usbip: stub: stop printing kernel pointer addresses in messages

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shuah Khan <[email protected]>

commit 248a22044366f588d46754c54dfe29ffe4f8b4df upstream.

Remove and/or change debug, info. and error messages to not print
kernel pointer addresses.

Signed-off-by: Shuah Khan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/usbip/stub_main.c | 5 +++--
drivers/usb/usbip/stub_rx.c | 7 ++-----
drivers/usb/usbip/stub_tx.c | 6 +++---
3 files changed, 8 insertions(+), 10 deletions(-)

--- a/drivers/usb/usbip/stub_main.c
+++ b/drivers/usb/usbip/stub_main.c
@@ -251,11 +251,12 @@ void stub_device_cleanup_urbs(struct stu
struct stub_priv *priv;
struct urb *urb;

- dev_dbg(&sdev->udev->dev, "free sdev %p\n", sdev);
+ dev_dbg(&sdev->udev->dev, "Stub device cleaning up urbs\n");

while ((priv = stub_priv_pop(sdev))) {
urb = priv->urb;
- dev_dbg(&sdev->udev->dev, "free urb %p\n", urb);
+ dev_dbg(&sdev->udev->dev, "free urb seqnum %lu\n",
+ priv->seqnum);
usb_kill_urb(urb);

kmem_cache_free(stub_priv_cache, priv);
--- a/drivers/usb/usbip/stub_rx.c
+++ b/drivers/usb/usbip/stub_rx.c
@@ -225,9 +225,6 @@ static int stub_recv_cmd_unlink(struct s
if (priv->seqnum != pdu->u.cmd_unlink.seqnum)
continue;

- dev_info(&priv->urb->dev->dev, "unlink urb %p\n",
- priv->urb);
-
/*
* This matched urb is not completed yet (i.e., be in
* flight in usb hcd hardware/driver). Now we are
@@ -266,8 +263,8 @@ static int stub_recv_cmd_unlink(struct s
ret = usb_unlink_urb(priv->urb);
if (ret != -EINPROGRESS)
dev_err(&priv->urb->dev->dev,
- "failed to unlink a urb %p, ret %d\n",
- priv->urb, ret);
+ "failed to unlink a urb # %lu, ret %d\n",
+ priv->seqnum, ret);

return 0;
}
--- a/drivers/usb/usbip/stub_tx.c
+++ b/drivers/usb/usbip/stub_tx.c
@@ -102,7 +102,7 @@ void stub_complete(struct urb *urb)
/* link a urb to the queue of tx. */
spin_lock_irqsave(&sdev->priv_lock, flags);
if (sdev->ud.tcp_socket == NULL) {
- usbip_dbg_stub_tx("ignore urb for closed connection %p", urb);
+ usbip_dbg_stub_tx("ignore urb for closed connection\n");
/* It will be freed in stub_device_cleanup_urbs(). */
} else if (priv->unlinking) {
stub_enqueue_ret_unlink(sdev, priv->seqnum, urb->status);
@@ -204,8 +204,8 @@ static int stub_send_ret_submit(struct s

/* 1. setup usbip_header */
setup_ret_submit_pdu(&pdu_header, urb);
- usbip_dbg_stub_tx("setup txdata seqnum: %d urb: %p\n",
- pdu_header.base.seqnum, urb);
+ usbip_dbg_stub_tx("setup txdata seqnum: %d\n",
+ pdu_header.base.seqnum);
usbip_header_correct_endian(&pdu_header, 1);

iov[iovnum].iov_base = &pdu_header;


2018-01-01 14:46:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 121/146] usbip: vhci: stop printing kernel pointer addresses in messages

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shuah Khan <[email protected]>

commit 8272d099d05f7ab2776cf56a2ab9f9443be18907 upstream.

Remove and/or change debug, info. and error messages to not print
kernel pointer addresses.

Signed-off-by: Shuah Khan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/usbip/vhci_hcd.c | 10 ----------
drivers/usb/usbip/vhci_rx.c | 23 +++++++++++------------
drivers/usb/usbip/vhci_tx.c | 3 ++-
3 files changed, 13 insertions(+), 23 deletions(-)

--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -670,9 +670,6 @@ static int vhci_urb_enqueue(struct usb_h
struct vhci_device *vdev;
unsigned long flags;

- usbip_dbg_vhci_hc("enter, usb_hcd %p urb %p mem_flags %d\n",
- hcd, urb, mem_flags);
-
if (portnum > VHCI_HC_PORTS) {
pr_err("invalid port number %d\n", portnum);
return -ENODEV;
@@ -836,8 +833,6 @@ static int vhci_urb_dequeue(struct usb_h
struct vhci_device *vdev;
unsigned long flags;

- pr_info("dequeue a urb %p\n", urb);
-
spin_lock_irqsave(&vhci->lock, flags);

priv = urb->hcpriv;
@@ -865,7 +860,6 @@ static int vhci_urb_dequeue(struct usb_h
/* tcp connection is closed */
spin_lock(&vdev->priv_lock);

- pr_info("device %p seems to be disconnected\n", vdev);
list_del(&priv->list);
kfree(priv);
urb->hcpriv = NULL;
@@ -877,8 +871,6 @@ static int vhci_urb_dequeue(struct usb_h
* vhci_rx will receive RET_UNLINK and give back the URB.
* Otherwise, we give back it here.
*/
- pr_info("gives back urb %p\n", urb);
-
usb_hcd_unlink_urb_from_ep(hcd, urb);

spin_unlock_irqrestore(&vhci->lock, flags);
@@ -906,8 +898,6 @@ static int vhci_urb_dequeue(struct usb_h

unlink->unlink_seqnum = priv->seqnum;

- pr_info("device %p seems to be still connected\n", vdev);
-
/* send cmd_unlink and try to cancel the pending URB in the
* peer */
list_add_tail(&unlink->list, &vdev->unlink_tx);
--- a/drivers/usb/usbip/vhci_rx.c
+++ b/drivers/usb/usbip/vhci_rx.c
@@ -37,24 +37,23 @@ struct urb *pickup_urb_and_free_priv(str
urb = priv->urb;
status = urb->status;

- usbip_dbg_vhci_rx("find urb %p vurb %p seqnum %u\n",
- urb, priv, seqnum);
+ usbip_dbg_vhci_rx("find urb seqnum %u\n", seqnum);

switch (status) {
case -ENOENT:
/* fall through */
case -ECONNRESET:
- dev_info(&urb->dev->dev,
- "urb %p was unlinked %ssynchronuously.\n", urb,
- status == -ENOENT ? "" : "a");
+ dev_dbg(&urb->dev->dev,
+ "urb seq# %u was unlinked %ssynchronuously\n",
+ seqnum, status == -ENOENT ? "" : "a");
break;
case -EINPROGRESS:
/* no info output */
break;
default:
- dev_info(&urb->dev->dev,
- "urb %p may be in a error, status %d\n", urb,
- status);
+ dev_dbg(&urb->dev->dev,
+ "urb seq# %u may be in a error, status %d\n",
+ seqnum, status);
}

list_del(&priv->list);
@@ -81,8 +80,8 @@ static void vhci_recv_ret_submit(struct
spin_unlock_irqrestore(&vdev->priv_lock, flags);

if (!urb) {
- pr_err("cannot find a urb of seqnum %u\n", pdu->base.seqnum);
- pr_info("max seqnum %d\n",
+ pr_err("cannot find a urb of seqnum %u max seqnum %d\n",
+ pdu->base.seqnum,
atomic_read(&vhci_hcd->seqnum));
usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
return;
@@ -105,7 +104,7 @@ static void vhci_recv_ret_submit(struct
if (usbip_dbg_flag_vhci_rx)
usbip_dump_urb(urb);

- usbip_dbg_vhci_rx("now giveback urb %p\n", urb);
+ usbip_dbg_vhci_rx("now giveback urb %u\n", pdu->base.seqnum);

spin_lock_irqsave(&vhci->lock, flags);
usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
@@ -172,7 +171,7 @@ static void vhci_recv_ret_unlink(struct
pr_info("the urb (seqnum %d) was already given back\n",
pdu->base.seqnum);
} else {
- usbip_dbg_vhci_rx("now giveback urb %p\n", urb);
+ usbip_dbg_vhci_rx("now giveback urb %d\n", pdu->base.seqnum);

/* If unlink is successful, status is -ECONNRESET */
urb->status = pdu->u.ret_unlink.status;
--- a/drivers/usb/usbip/vhci_tx.c
+++ b/drivers/usb/usbip/vhci_tx.c
@@ -83,7 +83,8 @@ static int vhci_send_cmd_submit(struct v
memset(&msg, 0, sizeof(msg));
memset(&iov, 0, sizeof(iov));

- usbip_dbg_vhci_tx("setup txdata urb %p\n", urb);
+ usbip_dbg_vhci_tx("setup txdata urb seqnum %lu\n",
+ priv->seqnum);

/* 1. setup usbip_header */
setup_cmd_submit_pdu(&pdu_header, urb);


2018-01-01 14:46:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 122/146] USB: chipidea: msm: fix ulpi-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 964728f9f407eca0b417fdf8e784b7a76979490c upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

Note that the original premature free of the parent node has already
been fixed separately, but that fix was apparently never backported to
stable.

Fixes: 47654a162081 ("usb: chipidea: msm: Restore wrapper settings after reset")
Fixes: b74c43156c0c ("usb: chipidea: msm: ci_hdrc_msm_probe() missing of_node_get()")
Cc: Stephen Boyd <[email protected]>
Cc: Frank Rowand <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/chipidea/ci_hdrc_msm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/chipidea/ci_hdrc_msm.c
+++ b/drivers/usb/chipidea/ci_hdrc_msm.c
@@ -251,7 +251,7 @@ static int ci_hdrc_msm_probe(struct plat
if (ret)
goto err_mux;

- ulpi_node = of_find_node_by_name(of_node_get(pdev->dev.of_node), "ulpi");
+ ulpi_node = of_get_child_by_name(pdev->dev.of_node, "ulpi");
if (ulpi_node) {
phy_node = of_get_next_available_child(ulpi_node, NULL);
ci->hsic = of_device_is_compatible(phy_node, "qcom,usb-hsic-phy");


2018-01-01 14:46:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 125/146] USB: serial: option: add support for Telit ME910 PID 0x1101

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniele Palmas <[email protected]>

commit 08933099e6404f588f81c2050bfec7313e06eeaf upstream.

This patch adds support for PID 0x1101 of Telit ME910.

Signed-off-by: Daniele Palmas <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/serial/option.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -283,6 +283,7 @@ static void option_instat_callback(struc
#define TELIT_PRODUCT_LE922_USBCFG3 0x1043
#define TELIT_PRODUCT_LE922_USBCFG5 0x1045
#define TELIT_PRODUCT_ME910 0x1100
+#define TELIT_PRODUCT_ME910_DUAL_MODEM 0x1101
#define TELIT_PRODUCT_LE920 0x1200
#define TELIT_PRODUCT_LE910 0x1201
#define TELIT_PRODUCT_LE910_USBCFG4 0x1206
@@ -648,6 +649,11 @@ static const struct option_blacklist_inf
.reserved = BIT(1) | BIT(3),
};

+static const struct option_blacklist_info telit_me910_dual_modem_blacklist = {
+ .sendsetup = BIT(0),
+ .reserved = BIT(3),
+};
+
static const struct option_blacklist_info telit_le910_blacklist = {
.sendsetup = BIT(0),
.reserved = BIT(1) | BIT(2),
@@ -1247,6 +1253,8 @@ static const struct usb_device_id option
.driver_info = (kernel_ulong_t)&telit_le922_blacklist_usbcfg0 },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910),
.driver_info = (kernel_ulong_t)&telit_me910_blacklist },
+ { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910_DUAL_MODEM),
+ .driver_info = (kernel_ulong_t)&telit_me910_dual_modem_blacklist },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910),
.driver_info = (kernel_ulong_t)&telit_le910_blacklist },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910_USBCFG4),


2018-01-01 14:46:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 108/146] bnxt_en: Fix sources of spurious netpoll warnings

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Calvin Owens <[email protected]>


[ Upstream commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519 ]

After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and
903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
bnxt_poll+0x0/0xd0 exceeded budget in poll
<snip>
Call Trace:
[<ffffffff814be5cd>] dump_stack+0x4d/0x70
[<ffffffff8107e013>] __warn+0xd3/0xf0
[<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
[<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
[<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
[<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
[<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
[<ffffffff810c9549>] console_unlock+0x339/0x5b0
[<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
[<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
[<ffffffff81173df5>] printk+0x48/0x50
[<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
[<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
[<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
[<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
[<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
[<ffffffff81095f8b>] process_one_work+0x19b/0x480
[<ffffffff810967ca>] worker_thread+0x6a/0x520
[<ffffffff8109c7c4>] kthread+0xe4/0x100
[<ffffffff81884c52>] ret_from_fork+0x22/0x40

This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.

Signed-off-by: Calvin Owens <[email protected]>
Acked-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1875,7 +1875,7 @@ static int bnxt_poll_work(struct bnxt *b
* here forever if we consistently cannot allocate
* buffers.
*/
- else if (rc == -ENOMEM)
+ else if (rc == -ENOMEM && budget)
rx_pkts++;
else if (rc == -EBUSY) /* partial completion */
break;
@@ -1961,7 +1961,7 @@ static int bnxt_poll_nitroa0(struct napi
cpu_to_le32(RX_CMPL_ERRORS_CRC_ERROR);

rc = bnxt_rx_pkt(bp, bnapi, &raw_cons, &event);
- if (likely(rc == -EIO))
+ if (likely(rc == -EIO) && budget)
rx_pkts++;
else if (rc == -EBUSY) /* partial completion */
break;


2018-01-01 14:46:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 127/146] usb: Add device quirk for Logitech HD Pro Webcam C925e

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Fleytman Dmitry Fleytman <[email protected]>

commit 7f038d256c723dd390d2fca942919573995f4cfd upstream.

Commit e0429362ab15
("usb: Add device quirk for Logitech HD Pro Webcams C920 and C930e")
introduced quirk to workaround an issue with some Logitech webcams.

There is one more model that has the same issue - C925e, so applying
the same quirk as well.

See aforementioned commit message for detailed explanation of the problem.

Signed-off-by: Dmitry Fleytman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/core/quirks.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -57,10 +57,11 @@ static const struct usb_device_id usb_qu
/* Microsoft LifeCam-VX700 v2.0 */
{ USB_DEVICE(0x045e, 0x0770), .driver_info = USB_QUIRK_RESET_RESUME },

- /* Logitech HD Pro Webcams C920, C920-C and C930e */
+ /* Logitech HD Pro Webcams C920, C920-C, C925e and C930e */
{ USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT },
{ USB_DEVICE(0x046d, 0x0841), .driver_info = USB_QUIRK_DELAY_INIT },
{ USB_DEVICE(0x046d, 0x0843), .driver_info = USB_QUIRK_DELAY_INIT },
+ { USB_DEVICE(0x046d, 0x085b), .driver_info = USB_QUIRK_DELAY_INIT },

/* Logitech ConferenceCam CC3000e */
{ USB_DEVICE(0x046d, 0x0847), .driver_info = USB_QUIRK_DELAY_INIT },


2018-01-01 14:46:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 128/146] usb: add RESET_RESUME for ELSA MicroLink 56K

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Oliver Neukum <[email protected]>

commit b9096d9f15c142574ebebe8fbb137012bb9d99c2 upstream.

This modem needs this quirk to operate. It produces timeouts when
resumed without reset.

Signed-off-by: Oliver Neukum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -155,6 +155,9 @@ static const struct usb_device_id usb_qu
/* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */
{ USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM },

+ /* ELSA MicroLink 56K */
+ { USB_DEVICE(0x05cc, 0x2267), .driver_info = USB_QUIRK_RESET_RESUME },
+
/* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter */
{ USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM },



2018-01-01 14:46:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 130/146] usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Thompson <[email protected]>

commit da99706689481717998d1d48edd389f339eea979 upstream.

When plugging in a USB webcam I see the following message:
xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs
XHCI_TRUST_TX_LENGTH quirk?
handle_tx_event: 913 callbacks suppressed

All is quiet again with this patch (and I've done a fair but of soak
testing with the camera since).

Signed-off-by: Daniel Thompson <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/host/xhci-pci.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -190,6 +190,9 @@ static void xhci_pci_quirks(struct devic
xhci->quirks |= XHCI_BROKEN_STREAMS;
}
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
+ pdev->device == 0x0014)
+ xhci->quirks |= XHCI_TRUST_TX_LENGTH;
+ if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0015)
xhci->quirks |= XHCI_RESET_ON_RESUME;
if (pdev->vendor == PCI_VENDOR_ID_VIA)


2018-01-01 14:46:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 131/146] timers: Use deferrable base independent of base::nohz_active

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anna-Maria Gleixner <[email protected]>

commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream.

During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.

Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.

To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: [email protected]
Cc: Paul McKenney <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/timer.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -814,11 +814,10 @@ static inline struct timer_base *get_tim
struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);

/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
return base;
}
@@ -828,11 +827,10 @@ static inline struct timer_base *get_tim
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = this_cpu_ptr(&timer_bases[BASE_DEF]);
return base;
}
@@ -1644,7 +1642,7 @@ static __latent_entropy void run_timer_s
base->must_forward_clk = false;

__run_timers(base);
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
}



2018-01-01 14:46:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 132/146] timers: Invoke timer_start_debug() where it makes sense

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit fd45bb77ad682be728d1002431d77b8c73342836 upstream.

The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.

Call the debug function after setting the new base and flags.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: [email protected]
Cc: Paul McKenney <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/timer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -982,8 +982,6 @@ __mod_timer(struct timer_list *timer, un
if (!ret && pending_only)
goto out_unlock;

- debug_activate(timer, expires);
-
new_base = get_target_base(base, timer->flags);

if (base != new_base) {
@@ -1007,6 +1005,8 @@ __mod_timer(struct timer_list *timer, un
}
}

+ debug_activate(timer, expires);
+
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance


2018-01-01 14:47:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 134/146] binder: fix proc->files use-after-free

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Todd Kjos <[email protected]>

commit 7f3dc0088b98533f17128058fac73cd8b2752ef1 upstream.

proc->files cleanup is initiated by binder_vma_close. Therefore
a reference on the binder_proc is not enough to prevent the
files_struct from being released while the binder_proc still has
a reference. This can lead to an attempt to dereference the
stale pointer obtained from proc->files prior to proc->files
cleanup. This has been seen once in task_get_unused_fd_flags()
when __alloc_fd() is called with a stale "files".

The fix is to protect proc->files with a mutex to prevent cleanup
while in use.

Signed-off-by: Todd Kjos <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder.c | 44 +++++++++++++++++++++++++++++++-------------
1 file changed, 31 insertions(+), 13 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -482,7 +482,8 @@ enum binder_deferred_state {
* @tsk task_struct for group_leader of process
* (invariant after initialized)
* @files files_struct for process
- * (invariant after initialized)
+ * (protected by @files_lock)
+ * @files_lock mutex to protect @files
* @deferred_work_node: element for binder_deferred_list
* (protected by binder_deferred_lock)
* @deferred_work: bitmap of deferred work to perform
@@ -530,6 +531,7 @@ struct binder_proc {
int pid;
struct task_struct *tsk;
struct files_struct *files;
+ struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -877,20 +879,26 @@ static void binder_inc_node_tmpref_ilock

static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
{
- struct files_struct *files = proc->files;
unsigned long rlim_cur;
unsigned long irqs;
+ int ret;

- if (files == NULL)
- return -ESRCH;
-
- if (!lock_task_sighand(proc->tsk, &irqs))
- return -EMFILE;
-
+ mutex_lock(&proc->files_lock);
+ if (proc->files == NULL) {
+ ret = -ESRCH;
+ goto err;
+ }
+ if (!lock_task_sighand(proc->tsk, &irqs)) {
+ ret = -EMFILE;
+ goto err;
+ }
rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
unlock_task_sighand(proc->tsk, &irqs);

- return __alloc_fd(files, 0, rlim_cur, flags);
+ ret = __alloc_fd(proc->files, 0, rlim_cur, flags);
+err:
+ mutex_unlock(&proc->files_lock);
+ return ret;
}

/*
@@ -899,8 +907,10 @@ static int task_get_unused_fd_flags(stru
static void task_fd_install(
struct binder_proc *proc, unsigned int fd, struct file *file)
{
+ mutex_lock(&proc->files_lock);
if (proc->files)
__fd_install(proc->files, fd, file);
+ mutex_unlock(&proc->files_lock);
}

/*
@@ -910,9 +920,11 @@ static long task_close_fd(struct binder_
{
int retval;

- if (proc->files == NULL)
- return -ESRCH;
-
+ mutex_lock(&proc->files_lock);
+ if (proc->files == NULL) {
+ retval = -ESRCH;
+ goto err;
+ }
retval = __close_fd(proc->files, fd);
/* can't restart close syscall because file table entry was cleared */
if (unlikely(retval == -ERESTARTSYS ||
@@ -920,7 +932,8 @@ static long task_close_fd(struct binder_
retval == -ERESTARTNOHAND ||
retval == -ERESTART_RESTARTBLOCK))
retval = -EINTR;
-
+err:
+ mutex_unlock(&proc->files_lock);
return retval;
}

@@ -4627,7 +4640,9 @@ static int binder_mmap(struct file *filp
ret = binder_alloc_mmap_handler(&proc->alloc, vma);
if (ret)
return ret;
+ mutex_lock(&proc->files_lock);
proc->files = get_files_struct(current);
+ mutex_unlock(&proc->files_lock);
return 0;

err_bad_arg:
@@ -4651,6 +4666,7 @@ static int binder_open(struct inode *nod
spin_lock_init(&proc->outer_lock);
get_task_struct(current->group_leader);
proc->tsk = current->group_leader;
+ mutex_init(&proc->files_lock);
INIT_LIST_HEAD(&proc->todo);
proc->default_priority = task_nice(current);
binder_dev = container_of(filp->private_data, struct binder_device,
@@ -4903,9 +4919,11 @@ static void binder_deferred_func(struct

files = NULL;
if (defer & BINDER_DEFERRED_PUT_FILES) {
+ mutex_lock(&proc->files_lock);
files = proc->files;
if (files)
proc->files = NULL;
+ mutex_unlock(&proc->files_lock);
}

if (defer & BINDER_DEFERRED_FLUSH)


2018-01-01 14:47:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 135/146] phy: tegra: fix device-tree node lookups

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 046046737bd35bed047460f080ea47e186be731e upstream.

Fix child-node lookups during probe, which ended up searching the whole
device tree depth-first starting at the parents rather than just
matching on their children.

To make things worse, some parent nodes could end up being being
prematurely freed (by tegra_xusb_pad_register()) as
of_find_node_by_name() drops a reference to its first argument.

Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support")
Cc: Thierry Reding <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kishon Vijay Abraham I <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/phy/tegra/xusb.c | 60 +++++++++++++++++++++++------------------------
1 file changed, 30 insertions(+), 30 deletions(-)

--- a/drivers/phy/tegra/xusb.c
+++ b/drivers/phy/tegra/xusb.c
@@ -75,14 +75,14 @@ MODULE_DEVICE_TABLE(of, tegra_xusb_padct
static struct device_node *
tegra_xusb_find_pad_node(struct tegra_xusb_padctl *padctl, const char *name)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(padctl->dev->of_node);
+ struct device_node *pads, *np;

- np = of_find_node_by_name(np, "pads");
- if (np)
- np = of_find_node_by_name(np, name);
+ pads = of_get_child_by_name(padctl->dev->of_node, "pads");
+ if (!pads)
+ return NULL;
+
+ np = of_get_child_by_name(pads, name);
+ of_node_put(pads);

return np;
}
@@ -90,16 +90,16 @@ tegra_xusb_find_pad_node(struct tegra_xu
static struct device_node *
tegra_xusb_pad_find_phy_node(struct tegra_xusb_pad *pad, unsigned int index)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(pad->dev.of_node);
+ struct device_node *np, *lanes;

- np = of_find_node_by_name(np, "lanes");
- if (!np)
+ lanes = of_get_child_by_name(pad->dev.of_node, "lanes");
+ if (!lanes)
return NULL;

- return of_find_node_by_name(np, pad->soc->lanes[index].name);
+ np = of_get_child_by_name(lanes, pad->soc->lanes[index].name);
+ of_node_put(lanes);
+
+ return np;
}

static int
@@ -195,7 +195,7 @@ int tegra_xusb_pad_register(struct tegra
unsigned int i;
int err;

- children = of_find_node_by_name(pad->dev.of_node, "lanes");
+ children = of_get_child_by_name(pad->dev.of_node, "lanes");
if (!children)
return -ENODEV;

@@ -444,21 +444,21 @@ static struct device_node *
tegra_xusb_find_port_node(struct tegra_xusb_padctl *padctl, const char *type,
unsigned int index)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(padctl->dev->of_node);
+ struct device_node *ports, *np;
+ char *name;
+
+ ports = of_get_child_by_name(padctl->dev->of_node, "ports");
+ if (!ports)
+ return NULL;

- np = of_find_node_by_name(np, "ports");
- if (np) {
- char *name;
-
- name = kasprintf(GFP_KERNEL, "%s-%u", type, index);
- if (!name)
- return ERR_PTR(-ENOMEM);
- np = of_find_node_by_name(np, name);
- kfree(name);
+ name = kasprintf(GFP_KERNEL, "%s-%u", type, index);
+ if (!name) {
+ of_node_put(ports);
+ return ERR_PTR(-ENOMEM);
}
+ np = of_get_child_by_name(ports, name);
+ kfree(name);
+ of_node_put(ports);

return np;
}
@@ -847,7 +847,7 @@ static void tegra_xusb_remove_ports(stru

static int tegra_xusb_padctl_probe(struct platform_device *pdev)
{
- struct device_node *np = of_node_get(pdev->dev.of_node);
+ struct device_node *np = pdev->dev.of_node;
const struct tegra_xusb_padctl_soc *soc;
struct tegra_xusb_padctl *padctl;
const struct of_device_id *match;
@@ -855,7 +855,7 @@ static int tegra_xusb_padctl_probe(struc
int err;

/* for backwards compatibility with old device trees */
- np = of_find_node_by_name(np, "pads");
+ np = of_get_child_by_name(np, "pads");
if (!np) {
dev_warn(&pdev->dev, "deprecated DT, using legacy driver\n");
return tegra_xusb_padctl_legacy_probe(pdev);


2018-01-01 14:47:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 109/146] phylink: ensure the PHY interface mode is appropriately set

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Russell King <[email protected]>


[ Upstream commit 182088aa3c6c7f7c20a2c1dcc9ded4a3fc631f38 ]

When setting the ethtool settings, ensure that the validated PHY
interface mode is propagated to the current link settings, so that
2500BaseX can be selected.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phylink.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -948,6 +948,7 @@ int phylink_ethtool_ksettings_set(struct
mutex_lock(&pl->state_mutex);
/* Configure the MAC to match the new settings */
linkmode_copy(pl->link_config.advertising, our_kset.link_modes.advertising);
+ pl->link_config.interface = config.interface;
pl->link_config.speed = our_kset.base.speed;
pl->link_config.duplex = our_kset.base.duplex;
pl->link_config.an_enabled = our_kset.base.autoneg != AUTONEG_DISABLE;


2018-01-01 14:47:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 136/146] drivers: base: cacheinfo: fix cache type for non-architected system cache

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sudeep Holla <[email protected]>

commit f57ab9a01a36ef3454333251cc57e3a9948b17bf upstream.

Commit dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for
cache properties") doesn't initialise the cache type if it's present
only in DT and the architecture is not aware of it. They are unified
system level cache which are generally transparent.

This patch check if the cache type is set to NOCACHE but the DT node
indicates that it's unified cache and sets the cache type accordingly.

Fixes: dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties")
Reported-and-tested-by: Tan Xiaojun <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/base/cacheinfo.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -186,6 +186,11 @@ static void cache_associativity(struct c
this_leaf->ways_of_associativity = (size / nr_sets) / line_size;
}

+static bool cache_node_is_unified(struct cacheinfo *this_leaf)
+{
+ return of_property_read_bool(this_leaf->of_node, "cache-unified");
+}
+
static void cache_of_override_properties(unsigned int cpu)
{
int index;
@@ -194,6 +199,14 @@ static void cache_of_override_properties

for (index = 0; index < cache_leaves(cpu); index++) {
this_leaf = this_cpu_ci->info_list + index;
+ /*
+ * init_cache_level must setup the cache level correctly
+ * overriding the architecturally specified levels, so
+ * if type is NONE at this stage, it should be unified
+ */
+ if (this_leaf->type == CACHE_TYPE_NOCACHE &&
+ cache_node_is_unified(this_leaf))
+ this_leaf->type = CACHE_TYPE_UNIFIED;
cache_size(this_leaf);
cache_get_line_size(this_leaf);
cache_nr_sets(this_leaf);


2018-01-01 14:47:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 137/146] staging: android: ion: Fix dma direction for dma_sync_sg_for_cpu/device

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sushmita Susheelendra <[email protected]>

commit d6b246bb7a29703f53aa4c050b8b3205d749caee upstream.

Use the direction argument passed into begin_cpu_access
and end_cpu_access when calling the dma_sync_sg_for_cpu/device.
The actual cache primitive called depends on the direction
passed in.

Signed-off-by: Sushmita Susheelendra <[email protected]>
Acked-by: Laura Abbott <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/android/ion/ion.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -348,7 +348,7 @@ static int ion_dma_buf_begin_cpu_access(
mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents,
- DMA_BIDIRECTIONAL);
+ direction);
}
mutex_unlock(&buffer->lock);

@@ -370,7 +370,7 @@ static int ion_dma_buf_end_cpu_access(st
mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents,
- DMA_BIDIRECTIONAL);
+ direction);
}
mutex_unlock(&buffer->lock);



2018-01-01 14:47:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 138/146] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/tick-sched.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -674,6 +674,11 @@ static void tick_nohz_restart(struct tic
ts->next_tick = 0;
}

+static inline bool local_timer_softirq_pending(void)
+{
+ return local_softirq_pending() & TIMER_SOFTIRQ;
+}
+
static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
ktime_t now, int cpu)
{
@@ -690,8 +695,18 @@ static ktime_t tick_nohz_stop_sched_tick
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;

- if (rcu_needs_cpu(basemono, &next_rcu) ||
- arch_needs_cpu() || irq_work_needs_cpu()) {
+ /*
+ * Keep the periodic tick, when RCU, architecture or irq_work
+ * requests it.
+ * Aside of that check whether the local timer softirq is
+ * pending. If so its a bad idea to call get_next_timer_interrupt()
+ * because there is an already expired timer, so it will request
+ * immeditate expiry, which rearms the hardware timer with a
+ * minimal delta which brings us back to this place
+ * immediately. Lather, rinse and repeat...
+ */
+ if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
+ irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*


2018-01-01 14:47:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 139/146] x86/smpboot: Remove stale TLB flush invocations

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 322f8b8b340c824aef891342b0f5795d15e11562 upstream.

smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector()
invoke local_flush_tlb() for no obvious reason.

Digging in history revealed that the original code in the 2.1 era added
those because the code manipulated a swapper_pg_dir pagetable entry. The
pagetable manipulation was removed long ago in the 2.3 timeframe, but the
TLB flush invocations stayed around forever.

Remove them along with the pointless pr_debug()s which come from the same 2.1
change.

Reported-by: Dominik Brodowski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/smpboot.c | 9 ---------
1 file changed, 9 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -128,14 +128,10 @@ static inline void smpboot_setup_warm_re
spin_lock_irqsave(&rtc_lock, flags);
CMOS_WRITE(0xa, 0xf);
spin_unlock_irqrestore(&rtc_lock, flags);
- local_flush_tlb();
- pr_debug("1.\n");
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
start_eip >> 4;
- pr_debug("2.\n");
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
start_eip & 0xf;
- pr_debug("3.\n");
}

static inline void smpboot_restore_warm_reset_vector(void)
@@ -143,11 +139,6 @@ static inline void smpboot_restore_warm_
unsigned long flags;

/*
- * Install writable page 0 entry to set BIOS data area.
- */
- local_flush_tlb();
-
- /*
* Paranoid: Set warm reset code and vector here back
* to default values.
*/


2018-01-01 14:47:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 140/146] x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit decab0888e6e14e11d53cefa85f8b3d3b45ce73c upstream.

The preempt_disable/enable() pair in __native_flush_tlb() was added in
commit:

5cf0791da5c1 ("x86/mm: Disable preemption during CR3 read+write")

... to protect the UP variant of flush_tlb_mm_range().

That preempt_disable/enable() pair should have been added to the UP variant
of flush_tlb_mm_range() instead.

The UP variant was removed with commit:

ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")

... but the preempt_disable/enable() pair stayed around.

The latest change to __native_flush_tlb() in commit:

6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")

... added an access to a per CPU variable outside the preempt disabled
regions, which makes no sense at all. __native_flush_tlb() must always
be called with at least preemption disabled.

Remove the preempt_disable/enable() pair and add a WARN_ON_ONCE() to catch
bad callers independent of the smp_processor_id() debugging.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/tlbflush.h | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -345,15 +345,17 @@ static inline void invalidate_user_asid(
*/
static inline void __native_flush_tlb(void)
{
- invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/*
- * If current->mm == NULL then we borrow a mm which may change
- * during a task switch and therefore we must not be preempted
- * while we write CR3 back:
+ * Preemption or interrupts must be disabled to protect the access
+ * to the per CPU variable and to prevent being preempted between
+ * read_cr3() and write_cr3().
*/
- preempt_disable();
+ WARN_ON_ONCE(preemptible());
+
+ invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+
+ /* If current->mm == NULL then the read_cr3() "borrows" an mm */
native_write_cr3(__native_read_cr3());
- preempt_enable();
}

/*


2018-01-01 14:47:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 142/146] x86/espfix/64: Fix espfix double-fault handling on 5-level systems

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit c739f930be1dd5fd949030e3475a884fe06dae9b upstream.

Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
was wrong, and it resulted in panics due to unhandled double faults.
Use P4D_SHIFT instead, which is correct on 4-level and 5-level
machines.

This fixes a panic when running x86 selftests on 5-level machines.

Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging")
Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/traps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -361,7 +361,7 @@ dotraplinkage void do_double_fault(struc
*
* No need for ist_enter here because we don't use RCU.
*/
- if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
+ if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
regs->cs == __KERNEL_CS &&
regs->ip == (unsigned long)native_irq_return_iret)
{


2018-01-01 14:47:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 143/146] x86/ldt: Plug memory leak in error path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit a62d69857aab4caa43049e72fe0ed5c4a60518dd upstream.

The error path in write_ldt() tries to free 'old_ldt' instead of the newly
allocated 'new_ldt', resulting in a memory leak. It also misses to clean up a
half populated LDT pagetable, which is not a leak as it gets cleaned up
when the process exits.

Free both the potentially half populated LDT pagetable and the newly
allocated LDT struct. This can be done unconditionally because once an LDT
is mapped subsequent maps will succeed, because the PTE page is already
populated and the two LDTs fit into that single page.

Reported-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1712311121340.1899@nanos
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/ldt.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -421,7 +421,13 @@ static int write_ldt(void __user *ptr, u
*/
error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0);
if (error) {
- free_ldt_struct(old_ldt);
+ /*
+ * This only can fail for the first LDT setup. If an LDT is
+ * already installed then the PTE page is already
+ * populated. Mop up a half populated page table.
+ */
+ free_ldt_pgtables(mm);
+ free_ldt_struct(new_ldt);
goto out_unlock;
}



2018-01-01 14:47:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 110/146] phylink: ensure AN is enabled

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Russell King <[email protected]>


[ Upstream commit 74ee0e8c1bf9925c59cc8f1c65c29adf6e4cf603 ]

Ensure that we mark AN as enabled at boot time, rather than leaving
it disabled. This is noticable if your SFP module is fiber, and
it supports faster speeds than 1G with 2.5G support in place.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phylink.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -525,6 +525,7 @@ struct phylink *phylink_create(struct ne
pl->link_config.pause = MLO_PAUSE_AN;
pl->link_config.speed = SPEED_UNKNOWN;
pl->link_config.duplex = DUPLEX_UNKNOWN;
+ pl->link_config.an_enabled = true;
pl->ops = ops;
__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);



2018-01-01 14:48:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 111/146] ipv4: fib: Fix metrics match when deleting a route

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Phil Sutter <[email protected]>


[ Upstream commit d03a45572efa068fa64db211d6d45222660e76c5 ]

The recently added fib_metrics_match() causes a regression for routes
with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has
TCP_CONG_NEEDS_ECN flag set:

| # ip link add d0 type dummy
| # ip link set d0 up
| # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp
| # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp
| RTNETLINK answers: No such process

During route insertion, fib_convert_metrics() detects that the given CC
algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in
RTAX_FEATURES.

During route deletion though, fib_metrics_match() compares stored
RTAX_FEATURES value with that from userspace (which obviously has no
knowledge about DST_FEATURE_ECN_CA) and fails.

Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/fib_semantics.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -706,7 +706,7 @@ bool fib_metrics_match(struct fib_config

nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
int type = nla_type(nla);
- u32 val;
+ u32 fi_val, val;

if (!type)
continue;
@@ -723,7 +723,11 @@ bool fib_metrics_match(struct fib_config
val = nla_get_u32(nla);
}

- if (fi->fib_metrics->metrics[type - 1] != val)
+ fi_val = fi->fib_metrics->metrics[type - 1];
+ if (type == RTAX_FEATURES)
+ fi_val &= ~DST_FEATURE_ECN_CA;
+
+ if (fi_val != val)
return false;
}



2018-01-01 14:48:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 114/146] skbuff: orphan frags before zerocopy clone

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>


[ Upstream commit 268b790679422a89e9ab0685d9f291edae780c98 ]

Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
calls to skb_uarg(skb)->callback for the same data.

skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
with each segment. This is only safe for uargs that do refcounting,
which is those that pass skb_orphan_frags without dropping their
shared frags. For others, skb_orphan_frags drops the user frags and
sets the uarg to NULL, after which sock_zerocopy_clone has no effect.

Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
calls for the same data causing the vhost_net_ubuf_ref_>refcount to
drop below zero.

Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY")
Reported-by: Andreas Hartmann <[email protected]>
Reported-by: David Hill <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3657,8 +3657,6 @@ normal:

skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
SKBTX_SHARED_FRAG;
- if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
- goto err;

while (pos < offset + len) {
if (i >= nfrags) {
@@ -3684,6 +3682,8 @@ normal:

if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
goto err;
+ if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+ goto err;

*nskb_frag = *frag;
__skb_frag_ref(nskb_frag);


2018-01-01 14:48:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 106/146] vxlan: restore dev->mtu setting based on lower device

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <[email protected]>


[ Upstream commit f870c1ff65a6d1f3a083f277280802ee09a5b44d ]

Stefano Brivio says:
Commit a985343ba906 ("vxlan: refactor verification and
application of configuration") introduced a change in the
behaviour of initial MTU setting: earlier, the MTU for a link
created on top of a given lower device, without an initial MTU
specification, was set to the MTU of the lower device minus
headroom as a result of this path in vxlan_dev_configure():

if (!conf->mtu)
dev->mtu = lowerdev->mtu -
(use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);

which is now gone. Now, the initial MTU, in absence of a
configured value, is simply set by ether_setup() to ETH_DATA_LEN
(1500 bytes).

This breaks userspace expectations in case the MTU of
the lower device is higher than 1500 bytes minus headroom.

This patch restores the previous behaviour on newlink operation. Since
max_mtu can be negative and we update dev->mtu directly, also check it
for valid minimum.

Reported-by: Junhan Yan <[email protected]>
Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration")
Signed-off-by: Alexey Kodanev <[email protected]>
Acked-by: Stefano Brivio <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/vxlan.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3105,6 +3105,11 @@ static void vxlan_config_apply(struct ne

max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
VXLAN_HEADROOM);
+ if (max_mtu < ETH_MIN_MTU)
+ max_mtu = ETH_MIN_MTU;
+
+ if (!changelink && !conf->mtu)
+ dev->mtu = max_mtu;
}

if (dev->mtu > max_mtu)


2018-01-01 14:48:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 145/146] n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 966031f340185eddd05affcf72b740549f056348 upstream.

We added support for EXTPROC back in 2010 in commit 26df6d13406d ("tty:
Add EXTPROC support for LINEMODE") and the intent was to allow it to
override some (all?) ICANON behavior. Quoting from that original commit
message:

There is a new bit in the termios local flag word, EXTPROC.
When this bit is set, several aspects of the terminal driver
are disabled. Input line editing, character echo, and mapping
of signals are all disabled. This allows the telnetd to turn
off these functions when in linemode, but still keep track of
what state the user wants the terminal to be in.

but the problem turns out that "several aspects of the terminal driver
are disabled" is a bit ambiguous, and you can really confuse the n_tty
layer by setting EXTPROC and then causing some of the ICANON invariants
to no longer be maintained.

This fixes at least one such case (TIOCINQ) becoming unhappy because of
the confusion over whether ICANON really means ICANON when EXTPROC is set.

This basically makes TIOCINQ match the case of read: if EXTPROC is set,
we ignore ICANON. Also, make sure to reset the ICANON state ie EXTPROC
changes, not just if ICANON changes.

Fixes: 26df6d13406d ("tty: Add EXTPROC support for LINEMODE")
Reported-by: Tetsuo Handa <[email protected]>
Reported-by: syzkaller <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/n_tty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1764,7 +1764,7 @@ static void n_tty_set_termios(struct tty
{
struct n_tty_data *ldata = tty->disc_data;

- if (!old || (old->c_lflag ^ tty->termios.c_lflag) & ICANON) {
+ if (!old || (old->c_lflag ^ tty->termios.c_lflag) & (ICANON | EXTPROC)) {
bitmap_zero(ldata->read_flags, N_TTY_BUF_SIZE);
ldata->line_start = ldata->read_tail;
if (!L_ICANON(tty) || !read_cnt(ldata)) {
@@ -2427,7 +2427,7 @@ static int n_tty_ioctl(struct tty_struct
return put_user(tty_chars_in_buffer(tty), (int __user *) arg);
case TIOCINQ:
down_write(&tty->termios_rwsem);
- if (L_ICANON(tty))
+ if (L_ICANON(tty) && !L_EXTPROC(tty))
retval = inq_canon(ldata);
else
retval = read_cnt(ldata);


2018-01-01 14:48:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 146/146] tty: fix tty_ldisc_receive_buf() documentation

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit e7e51dcf3b8a5f65c5653a054ad57eb2492a90d0 upstream.

The tty_ldisc_receive_buf() helper returns the number of bytes
processed so drop the bogus "not" from the kernel doc comment.

Fixes: 8d082cd300ab ("tty: Unify receive_buf() code paths")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/tty_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -446,7 +446,7 @@ EXPORT_SYMBOL_GPL(tty_prepare_flip_strin
* Callers other than flush_to_ldisc() need to exclude the kworker
* from concurrent use of the line discipline, see paste_selection().
*
- * Returns the number of bytes not processed
+ * Returns the number of bytes processed
*/
int tty_ldisc_receive_buf(struct tty_ldisc *ld, const unsigned char *p,
char *f, int count)


2018-01-01 14:48:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 144/146] x86/ldt: Make LDT pgtable free conditional

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 7f414195b0c3612acd12b4611a5fe75995cf10c7 upstream.

Andy prefers to be paranoid about the pagetable free in the error path of
write_ldt(). Make it conditional and warn whenever the installment of a
secondary LDT fails.

Requested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/ldt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -426,7 +426,8 @@ static int write_ldt(void __user *ptr, u
* already installed then the PTE page is already
* populated. Mop up a half populated page table.
*/
- free_ldt_pgtables(mm);
+ if (!WARN_ON_ONCE(old_ldt))
+ free_ldt_pgtables(mm);
free_ldt_struct(new_ldt);
goto out_unlock;
}


2018-01-01 14:48:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 115/146] skbuff: skb_copy_ubufs must release uarg even without user frags

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>


[ Upstream commit b90ddd568792bcb0054eaf0f61785c8f80c3bd1c ]

skb_copy_ubufs creates a private copy of frags[] to release its hold
on user frags, then calls uarg->callback to notify the owner.

Call uarg->callback even when no frags exist. This edge case can
happen when zerocopy_sg_from_iter finds enough room in skb_headlen
to copy all the data.

Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1182,7 +1182,7 @@ int skb_copy_ubufs(struct sk_buff *skb,
u32 d_off;

if (!num_frags)
- return 0;
+ goto release;

if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;
@@ -1242,6 +1242,7 @@ int skb_copy_ubufs(struct sk_buff *skb,
__skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off);
skb_shinfo(skb)->nr_frags = new_frags;

+release:
skb_zcopy_clear(skb, false);
return 0;
}


2018-01-01 14:49:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 113/146] Revert "mlx5: move affinity hints assignments to generic code"

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Saeed Mahameed <[email protected]>


[ Upstream commit 231243c82793428467524227ae02ca451e6a98e7 ]

Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with -EIO

This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jes Sorensen <[email protected]>
Reported-by: Jes Sorensen <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 45 ++++++-------
drivers/net/ethernet/mellanox/mlx5/core/main.c | 75 ++++++++++++++++++++--
include/linux/mlx5/driver.h | 1
4 files changed, 93 insertions(+), 29 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -590,6 +590,7 @@ struct mlx5e_channel {
struct mlx5_core_dev *mdev;
struct mlx5e_tstamp *tstamp;
int ix;
+ int cpu;
};

struct mlx5e_channels {
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -71,11 +71,6 @@ struct mlx5e_channel_param {
struct mlx5e_cq_param icosq_cq;
};

-static int mlx5e_get_node(struct mlx5e_priv *priv, int ix)
-{
- return pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix);
-}
-
static bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev)
{
return MLX5_CAP_GEN(mdev, striding_rq) &&
@@ -452,17 +447,16 @@ static int mlx5e_rq_alloc_mpwqe_info(str
int wq_sz = mlx5_wq_ll_get_size(&rq->wq);
int mtt_sz = mlx5e_get_wqe_mtt_sz();
int mtt_alloc = mtt_sz + MLX5_UMR_ALIGN - 1;
- int node = mlx5e_get_node(c->priv, c->ix);
int i;

rq->mpwqe.info = kzalloc_node(wq_sz * sizeof(*rq->mpwqe.info),
- GFP_KERNEL, node);
+ GFP_KERNEL, cpu_to_node(c->cpu));
if (!rq->mpwqe.info)
goto err_out;

/* We allocate more than mtt_sz as we will align the pointer */
- rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz,
- GFP_KERNEL, node);
+ rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz, GFP_KERNEL,
+ cpu_to_node(c->cpu));
if (unlikely(!rq->mpwqe.mtt_no_align))
goto err_free_wqe_info;

@@ -570,7 +564,7 @@ static int mlx5e_alloc_rq(struct mlx5e_c
int err;
int i;

- rqp->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ rqp->wq.db_numa_node = cpu_to_node(c->cpu);

err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->wq,
&rq->wq_ctrl);
@@ -636,8 +630,7 @@ static int mlx5e_alloc_rq(struct mlx5e_c
default: /* MLX5_WQ_TYPE_LINKED_LIST */
rq->wqe.frag_info =
kzalloc_node(wq_sz * sizeof(*rq->wqe.frag_info),
- GFP_KERNEL,
- mlx5e_get_node(c->priv, c->ix));
+ GFP_KERNEL, cpu_to_node(c->cpu));
if (!rq->wqe.frag_info) {
err = -ENOMEM;
goto err_rq_wq_destroy;
@@ -1007,13 +1000,13 @@ static int mlx5e_alloc_xdpsq(struct mlx5
sq->uar_map = mdev->mlx5e_res.bfreg.map;
sq->min_inline_mode = params->tx_min_inline_mode;

- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, &param->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];

- err = mlx5e_alloc_xdpsq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_xdpsq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;

@@ -1060,13 +1053,13 @@ static int mlx5e_alloc_icosq(struct mlx5
sq->channel = c;
sq->uar_map = mdev->mlx5e_res.bfreg.map;

- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, &param->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];

- err = mlx5e_alloc_icosq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_icosq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;

@@ -1132,13 +1125,13 @@ static int mlx5e_alloc_txqsq(struct mlx5
if (MLX5_IPSEC_DEV(c->priv->mdev))
set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);

- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, &param->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];

- err = mlx5e_alloc_txqsq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_txqsq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;

@@ -1510,8 +1503,8 @@ static int mlx5e_alloc_cq(struct mlx5e_c
struct mlx5_core_dev *mdev = c->priv->mdev;
int err;

- param->wq.buf_numa_node = mlx5e_get_node(c->priv, c->ix);
- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.buf_numa_node = cpu_to_node(c->cpu);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
param->eq_ix = c->ix;

err = mlx5e_alloc_cq_common(mdev, param, cq);
@@ -1610,6 +1603,11 @@ static void mlx5e_close_cq(struct mlx5e_
mlx5e_free_cq(cq);
}

+static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
+{
+ return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
+}
+
static int mlx5e_open_tx_cqs(struct mlx5e_channel *c,
struct mlx5e_params *params,
struct mlx5e_channel_param *cparam)
@@ -1758,12 +1756,13 @@ static int mlx5e_open_channel(struct mlx
{
struct mlx5e_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
+ int cpu = mlx5e_get_cpu(priv, ix);
struct mlx5e_channel *c;
unsigned int irq;
int err;
int eqn;

- c = kzalloc_node(sizeof(*c), GFP_KERNEL, mlx5e_get_node(priv, ix));
+ c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
if (!c)
return -ENOMEM;

@@ -1771,6 +1770,7 @@ static int mlx5e_open_channel(struct mlx
c->mdev = priv->mdev;
c->tstamp = &priv->tstamp;
c->ix = ix;
+ c->cpu = cpu;
c->pdev = &priv->mdev->pdev->dev;
c->netdev = priv->netdev;
c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
@@ -1859,8 +1859,7 @@ static void mlx5e_activate_channel(struc
for (tc = 0; tc < c->num_tc; tc++)
mlx5e_activate_txqsq(&c->sq[tc]);
mlx5e_activate_rq(&c->rq);
- netif_set_xps_queue(c->netdev,
- mlx5_get_vector_affinity(c->priv->mdev, c->ix), c->ix);
+ netif_set_xps_queue(c->netdev, get_cpu_mask(c->cpu), c->ix);
}

static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -316,9 +316,6 @@ static int mlx5_alloc_irq_vectors(struct
{
struct mlx5_priv *priv = &dev->priv;
struct mlx5_eq_table *table = &priv->eq_table;
- struct irq_affinity irqdesc = {
- .pre_vectors = MLX5_EQ_VEC_COMP_BASE,
- };
int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
int nvec;

@@ -332,10 +329,9 @@ static int mlx5_alloc_irq_vectors(struct
if (!priv->irq_info)
goto err_free_msix;

- nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
+ nvec = pci_alloc_irq_vectors(dev->pdev,
MLX5_EQ_VEC_COMP_BASE + 1, nvec,
- PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
- &irqdesc);
+ PCI_IRQ_MSIX);
if (nvec < 0)
return nvec;

@@ -621,6 +617,63 @@ u64 mlx5_read_internal_timer(struct mlx5
return (u64)timer_l | (u64)timer_h1 << 32;
}

+static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
+{
+ struct mlx5_priv *priv = &mdev->priv;
+ int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
+
+ if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
+ mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
+ return -ENOMEM;
+ }
+
+ cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
+ priv->irq_info[i].mask);
+
+ if (IS_ENABLED(CONFIG_SMP) &&
+ irq_set_affinity_hint(irq, priv->irq_info[i].mask))
+ mlx5_core_warn(mdev, "irq_set_affinity_hint failed, irq 0x%.4x", irq);
+
+ return 0;
+}
+
+static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
+{
+ struct mlx5_priv *priv = &mdev->priv;
+ int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
+
+ irq_set_affinity_hint(irq, NULL);
+ free_cpumask_var(priv->irq_info[i].mask);
+}
+
+static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
+{
+ int err;
+ int i;
+
+ for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
+ err = mlx5_irq_set_affinity_hint(mdev, i);
+ if (err)
+ goto err_out;
+ }
+
+ return 0;
+
+err_out:
+ for (i--; i >= 0; i--)
+ mlx5_irq_clear_affinity_hint(mdev, i);
+
+ return err;
+}
+
+static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
+{
+ int i;
+
+ for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
+ mlx5_irq_clear_affinity_hint(mdev, i);
+}
+
int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
unsigned int *irqn)
{
@@ -1093,6 +1146,12 @@ static int mlx5_load_one(struct mlx5_cor
goto err_stop_eqs;
}

+ err = mlx5_irq_set_affinity_hints(dev);
+ if (err) {
+ dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
+ goto err_affinity_hints;
+ }
+
err = mlx5_init_fs(dev);
if (err) {
dev_err(&pdev->dev, "Failed to init flow steering\n");
@@ -1150,6 +1209,9 @@ err_sriov:
mlx5_cleanup_fs(dev);

err_fs:
+ mlx5_irq_clear_affinity_hints(dev);
+
+err_affinity_hints:
free_comp_eqs(dev);

err_stop_eqs:
@@ -1218,6 +1280,7 @@ static int mlx5_unload_one(struct mlx5_c

mlx5_sriov_detach(dev);
mlx5_cleanup_fs(dev);
+ mlx5_irq_clear_affinity_hints(dev);
free_comp_eqs(dev);
mlx5_stop_eqs(dev);
mlx5_put_uars_page(dev, priv->uar);
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -546,6 +546,7 @@ struct mlx5_core_sriov {
};

struct mlx5_irq_info {
+ cpumask_var_t mask;
char name[MLX5_MAX_IRQ_NAME];
};



2018-01-01 14:50:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 112/146] ipv6: set all.accept_dad to 0 by default

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <[email protected]>


[ Upstream commit 094009531612246d9e13f9e0c3ae2205d7f63a0a ]

With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag
is also taken into account (default value is 1). If either global or
per-interface flag is non-zero, DAD will be enabled on a given interface.

This is not backward compatible: before those patches, the user could
disable DAD just by setting the per-interface flag to 0. Now, the
user instead needs to set both flags to 0 to actually disable DAD.

Restore the previous behaviour by setting the default for the global
'accept_dad' flag to 0. This way, DAD is still enabled by default,
as per-interface flags are set to 1 on device creation, but setting
them to 0 is enough to disable DAD on a given interface.

- Before 35e015e1f57a7 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
X 0 no
X 1 yes

- After 35e015e1f577 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
0 0 no
0 1 yes
1 0 yes

- After this fix:
global per-interface DAD enabled
1 1 yes
0 0 no
[default] 0 1 yes
1 0 yes

Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real")
CC: Stefano Brivio <[email protected]>
CC: Matteo Croce <[email protected]>
CC: Erik Kline <[email protected]>
Signed-off-by: Nicolas Dichtel <[email protected]>
Acked-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/addrconf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -231,7 +231,7 @@ static struct ipv6_devconf ipv6_devconf
.proxy_ndp = 0,
.accept_source_route = 0, /* we do not accept RH0 by default. */
.disable_ipv6 = 0,
- .accept_dad = 1,
+ .accept_dad = 0,
.suppress_frag_ndisc = 1,
.accept_ra_mtu = 1,
.stable_secret = {


2018-01-01 14:50:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 141/146] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit ac461122c88a10b7d775de2f56467f097c9e627a upstream.

Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up
and unified the IDT invalidation that existed in a couple of places. It
changed no actual real code.

Despite not changing any actual real code, it _did_ change code generation:
by implementing the common idt_invalidate() function in
archx86/kernel/idt.c, it made the use of the function in
arch/x86/kernel/machine_kexec_32.c be a real function call rather than an
(accidental) inlining of the function.

That, in turn, exposed two issues:

- in load_segments(), we had incorrectly reset all the segment
registers, which then made the stack canary load (which gcc does
using offset of %gs) cause a trap. Instead of %gs pointing to the
stack canary, it will be the normal zero-based kernel segment, and
the stack canary load will take a page fault at address 0x14.

- to make this even harder to debug, we had invalidated the GDT just
before calling idt_invalidate(), which meant that the fault happened
with an invalid GDT, which in turn causes a triple fault and
immediate reboot.

Fix this by

(a) not reloading the special segments in load_segments(). We currently
don't do any percpu accesses (which would require %fs on x86-32) in
this area, but there's no reason to think that we might not want to
do them, and like %gs, it's pointless to break it.

(b) doing idt_invalidate() before invalidating the GDT, to keep things
at least _slightly_ more debuggable for a bit longer. Without a
IDT, traps will not work. Without a GDT, traps also will not work,
but neither will any segment loads etc. So in a very real sense,
the GDT is even more core than the IDT.

Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation")
Reported-and-tested-by: Alexandru Chirvasitu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/machine_kexec_32.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -48,8 +48,6 @@ static void load_segments(void)
"\tmovl $"STR(__KERNEL_DS)",%%eax\n"
"\tmovl %%eax,%%ds\n"
"\tmovl %%eax,%%es\n"
- "\tmovl %%eax,%%fs\n"
- "\tmovl %%eax,%%gs\n"
"\tmovl %%eax,%%ss\n"
: : : "eax", "memory");
#undef STR
@@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image)
* The gdt & idt are now invalid.
* If you want to load them you must set up your own idt & gdt.
*/
- set_gdt(phys_to_virt(0), 0);
idt_invalidate(phys_to_virt(0));
+ set_gdt(phys_to_virt(0), 0);

/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head,


2018-01-01 14:47:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 133/146] timers: Reinitialize per cpu bases on hotplug

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream.

The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/cpuhotplug.h | 2 +-
include/linux/timer.h | 4 +++-
kernel/cpu.c | 4 ++--
kernel/time/timer.c | 15 +++++++++++++++
4 files changed, 21 insertions(+), 4 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -86,7 +86,7 @@ enum cpuhp_state {
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_ZCOMP_PREPARE,
- CPUHP_TIMERS_DEAD,
+ CPUHP_TIMERS_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -246,9 +246,11 @@ unsigned long round_jiffies_up(unsigned
unsigned long round_jiffies_up_relative(unsigned long j);

#ifdef CONFIG_HOTPLUG_CPU
+int timers_prepare_cpu(unsigned int cpu);
int timers_dead_cpu(unsigned int cpu);
#else
-#define timers_dead_cpu NULL
+#define timers_prepare_cpu NULL
+#define timers_dead_cpu NULL
#endif

#endif
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1277,9 +1277,9 @@ static struct cpuhp_step cpuhp_bp_states
* before blk_mq_queue_reinit_notify() from notify_dead(),
* otherwise a RCU stall occurs.
*/
- [CPUHP_TIMERS_DEAD] = {
+ [CPUHP_TIMERS_PREPARE] = {
.name = "timers:dead",
- .startup.single = NULL,
+ .startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
/* Kicks the plugged cpu into life */
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1801,6 +1801,21 @@ static void migrate_timer_list(struct ti
}
}

+int timers_prepare_cpu(unsigned int cpu)
+{
+ struct timer_base *base;
+ int b;
+
+ for (b = 0; b < NR_BASES; b++) {
+ base = per_cpu_ptr(&timer_bases[b], cpu);
+ base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->is_idle = false;
+ base->must_forward_clk = true;
+ }
+ return 0;
+}
+
int timers_dead_cpu(unsigned int cpu)
{
struct timer_base *old_base;


2018-01-01 14:52:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 129/146] USB: Fix off by one in type-specific length check of BOS SSP capability

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Nyman <[email protected]>

commit 07b9f12864d16c3a861aef4817eb1efccbc5d0e6 upstream.

USB 3.1 devices are not detected as 3.1 capable since 4.15-rc3 due to a
off by one in commit 81cf4a45360f ("USB: core: Add type-specific length
check of BOS descriptors")

It uses USB_DT_USB_SSP_CAP_SIZE() to get SSP capability size which takes
the zero based SSAC as argument, not the actual count of sublink speed
attributes.

USB3 spec 9.6.2.5 says "The number of Sublink Speed Attributes = SSAC + 1."

The type-specific length check patch was added to stable and needs to be
fixed there as well

Fixes: 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors")
CC: Masakazu Mokuno <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/core/config.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/core/config.c
+++ b/drivers/usb/core/config.c
@@ -1007,7 +1007,7 @@ int usb_get_bos_descriptor(struct usb_de
case USB_SSP_CAP_TYPE:
ssp_cap = (struct usb_ssp_cap_descriptor *)buffer;
ssac = (le32_to_cpu(ssp_cap->bmAttributes) &
- USB_SSP_SUBLINK_SPEED_ATTRIBS) + 1;
+ USB_SSP_SUBLINK_SPEED_ATTRIBS);
if (length >= USB_DT_USB_SSP_CAP_SIZE(ssac))
dev->bos->ssp_cap = ssp_cap;
break;


2018-01-01 14:46:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 123/146] USB: serial: ftdi_sio: add id for Airbus DS P8GR

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Max Schulze <[email protected]>

commit c6a36ad383559a60a249aa6016cebf3cb8b6c485 upstream.

Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver.

Signed-off-by: Max Schulze <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 7 insertions(+)

--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1017,6 +1017,7 @@ static const struct usb_device_id id_tab
.driver_info = (kernel_ulong_t)&ftdi_jtag_quirk },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_BT_USB_PID) },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_WL_USB_PID) },
+ { USB_DEVICE(AIRBUS_DS_VID, AIRBUS_DS_P8GR) },
{ } /* Terminating entry */
};

--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -915,6 +915,12 @@
#define ICPDAS_I7563U_PID 0x0105

/*
+ * Airbus Defence and Space
+ */
+#define AIRBUS_DS_VID 0x1e8e /* Vendor ID */
+#define AIRBUS_DS_P8GR 0x6001 /* Tetra P8GR */
+
+/*
* RT Systems programming cables for various ham radios
*/
#define RTSYSTEMS_VID 0x2100 /* Vendor ID */


2018-01-01 14:46:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 119/146] usbip: prevent leaking socket pointer address in messages

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shuah Khan <[email protected]>

commit 90120d15f4c397272aaf41077960a157fc4212bf upstream.

usbip driver is leaking socket pointer address in messages. Remove
the messages that aren't useful and print sockfd in the ones that
are useful for debugging.

Signed-off-by: Shuah Khan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/usbip/stub_dev.c | 3 +--
drivers/usb/usbip/usbip_common.c | 16 +++++-----------
drivers/usb/usbip/vhci_hcd.c | 2 +-
3 files changed, 7 insertions(+), 14 deletions(-)

--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -163,8 +163,7 @@ static void stub_shutdown_connection(str
* step 1?
*/
if (ud->tcp_socket) {
- dev_dbg(&sdev->udev->dev, "shutdown tcp_socket %p\n",
- ud->tcp_socket);
+ dev_dbg(&sdev->udev->dev, "shutdown sockfd %d\n", ud->sockfd);
kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR);
}

--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -331,26 +331,20 @@ int usbip_recv(struct socket *sock, void
struct msghdr msg = {.msg_flags = MSG_NOSIGNAL};
int total = 0;

+ if (!sock || !buf || !size)
+ return -EINVAL;
+
iov_iter_kvec(&msg.msg_iter, READ|ITER_KVEC, &iov, 1, size);

usbip_dbg_xmit("enter\n");

- if (!sock || !buf || !size) {
- pr_err("invalid arg, sock %p buff %p size %d\n", sock, buf,
- size);
- return -EINVAL;
- }
-
do {
- int sz = msg_data_left(&msg);
+ msg_data_left(&msg);
sock->sk->sk_allocation = GFP_NOIO;

result = sock_recvmsg(sock, &msg, MSG_WAITALL);
- if (result <= 0) {
- pr_debug("receive sock %p buf %p size %u ret %d total %d\n",
- sock, buf + total, sz, result, total);
+ if (result <= 0)
goto err;
- }

total += result;
} while (msg_data_left(&msg));
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -989,7 +989,7 @@ static void vhci_shutdown_connection(str

/* need this? see stub_dev.c */
if (ud->tcp_socket) {
- pr_debug("shutdown tcp_socket %p\n", ud->tcp_socket);
+ pr_debug("shutdown tcp_socket %d\n", ud->sockfd);
kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR);
}



2018-01-01 14:54:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 117/146] [PATCH] sparc64: repair calling incorrect hweight function from stubs

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Engelhardt <[email protected]>


[ Upstream commit 59585b4be9ae4dc6506551709bdcd6f5210b8a01 ]

Commit v4.12-rc4-1-g9289ea7f952b introduced a mistake that made the
64-bit hweight stub call the 16-bit hweight function.

Fixes: 9289ea7f952b ("sparc64: Use indirect calls in hamming weight stubs")
Signed-off-by: Jan Engelhardt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/lib/hweight.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/sparc/lib/hweight.S
+++ b/arch/sparc/lib/hweight.S
@@ -44,8 +44,8 @@ EXPORT_SYMBOL(__arch_hweight32)
.previous

ENTRY(__arch_hweight64)
- sethi %hi(__sw_hweight16), %g1
- jmpl %g1 + %lo(__sw_hweight16), %g0
+ sethi %hi(__sw_hweight64), %g1
+ jmpl %g1 + %lo(__sw_hweight64), %g0
nop
ENDPROC(__arch_hweight64)
EXPORT_SYMBOL(__arch_hweight64)


2018-01-01 14:55:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 073/146] s390/qeth: apply takeover changes when mode is toggled

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <[email protected]>


[ Upstream commit 7fbd9493f0eeae8cef58300505a9ef5c8fce6313 ]

Just as for an explicit enable/disable, toggling the takeover mode also
requires that the IP addresses get updated. Otherwise all IPs that were
added to the table before the mode-toggle, get registered with the old
settings.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/net/qeth_core.h | 2 +-
drivers/s390/net/qeth_core_main.c | 2 +-
drivers/s390/net/qeth_l3_sys.c | 35 +++++++++++++++++------------------
3 files changed, 19 insertions(+), 20 deletions(-)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -564,7 +564,7 @@ enum qeth_cq {
};

struct qeth_ipato {
- int enabled;
+ bool enabled;
int invert4;
int invert6;
struct list_head entries;
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1479,7 +1479,7 @@ static int qeth_setup_card(struct qeth_c
qeth_set_intial_options(card);
/* IP address takeover */
INIT_LIST_HEAD(&card->ipato.entries);
- card->ipato.enabled = 0;
+ card->ipato.enabled = false;
card->ipato.invert4 = 0;
card->ipato.invert6 = 0;
/* init QDIO stuff */
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -372,6 +372,7 @@ static ssize_t qeth_l3_dev_ipato_enable_
struct qeth_card *card = dev_get_drvdata(dev);
struct qeth_ipaddr *addr;
int i, rc = 0;
+ bool enable;

if (!card)
return -EINVAL;
@@ -384,25 +385,23 @@ static ssize_t qeth_l3_dev_ipato_enable_
}

if (sysfs_streq(buf, "toggle")) {
- card->ipato.enabled = (card->ipato.enabled)? 0 : 1;
- } else if (sysfs_streq(buf, "1")) {
- card->ipato.enabled = 1;
- hash_for_each(card->ip_htable, i, addr, hnode) {
- if ((addr->type == QETH_IP_TYPE_NORMAL) &&
- qeth_l3_is_addr_covered_by_ipato(card, addr))
- addr->set_flags |=
- QETH_IPA_SETIP_TAKEOVER_FLAG;
- }
- } else if (sysfs_streq(buf, "0")) {
- card->ipato.enabled = 0;
- hash_for_each(card->ip_htable, i, addr, hnode) {
- if (addr->set_flags &
- QETH_IPA_SETIP_TAKEOVER_FLAG)
- addr->set_flags &=
- ~QETH_IPA_SETIP_TAKEOVER_FLAG;
- }
- } else
+ enable = !card->ipato.enabled;
+ } else if (kstrtobool(buf, &enable)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.enabled == enable)
+ goto out;
+ card->ipato.enabled = enable;
+
+ hash_for_each(card->ip_htable, i, addr, hnode) {
+ if (!enable)
+ addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
+ else if (addr->type == QETH_IP_TYPE_NORMAL &&
+ qeth_l3_is_addr_covered_by_ipato(card, addr))
+ addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+ }
out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;


2018-01-01 14:45:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 105/146] net/mlx5: FPGA, return -EINVAL if size is zero

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kamal Heib <[email protected]>


[ Upstream commit bae115a2bb479142605726e6aa130f43f50e801a ]

Currently, if a size of zero is passed to
mlx5_fpga_mem_{read|write}_i2c()
the "err" return value will not be initialized, which triggers gcc
warnings:

[..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error:
uninitialized symbol 'err'.
[..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error:
uninitialized symbol 'err'.

fix that.

Fixes: a9956d35d199 ('net/mlx5: FPGA, Add SBU infrastructure')
Signed-off-by: Kamal Heib <[email protected]>
Reviewed-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c
@@ -66,6 +66,9 @@ static int mlx5_fpga_mem_read_i2c(struct
u8 actual_size;
int err;

+ if (!size)
+ return -EINVAL;
+
if (!fdev->mdev)
return -ENOTCONN;

@@ -95,6 +98,9 @@ static int mlx5_fpga_mem_write_i2c(struc
u8 actual_size;
int err;

+ if (!size)
+ return -EINVAL;
+
if (!fdev->mdev)
return -ENOTCONN;



2018-01-01 14:44:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 100/146] mlxsw: spectrum: Disable MAC learning for ovs port

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuval Mintz <[email protected]>


[ Upstream commit fccff0862838908d21eaf956d57e09c6c189f7c5 ]

Learning is currently enabled for ports which are OVS slaves -
even though OVS doesn't need this indication.
Since we're not associating a fid with the port, HW would continuously
notify driver of learned [& aged] MACs which would be logged as errors.

Fixes: 2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master")
Signed-off-by: Yuval Mintz <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4164,6 +4164,7 @@ static int mlxsw_sp_port_stp_set(struct

static int mlxsw_sp_port_ovs_join(struct mlxsw_sp_port *mlxsw_sp_port)
{
+ u16 vid = 1;
int err;

err = mlxsw_sp_port_vp_mode_set(mlxsw_sp_port, true);
@@ -4176,8 +4177,19 @@ static int mlxsw_sp_port_ovs_join(struct
true, false);
if (err)
goto err_port_vlan_set;
+
+ for (; vid <= VLAN_N_VID - 1; vid++) {
+ err = mlxsw_sp_port_vid_learning_set(mlxsw_sp_port,
+ vid, false);
+ if (err)
+ goto err_vid_learning_set;
+ }
+
return 0;

+err_vid_learning_set:
+ for (vid--; vid >= 1; vid--)
+ mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, vid, true);
err_port_vlan_set:
mlxsw_sp_port_stp_set(mlxsw_sp_port, false);
err_port_stp_set:
@@ -4187,6 +4199,12 @@ err_port_stp_set:

static void mlxsw_sp_port_ovs_leave(struct mlxsw_sp_port *mlxsw_sp_port)
{
+ u16 vid;
+
+ for (vid = VLAN_N_VID - 1; vid >= 1; vid--)
+ mlxsw_sp_port_vid_learning_set(mlxsw_sp_port,
+ vid, true);
+
mlxsw_sp_port_vlan_set(mlxsw_sp_port, 2, VLAN_N_VID - 1,
false, false);
mlxsw_sp_port_stp_set(mlxsw_sp_port, false);


2018-01-01 14:56:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 099/146] tipc: fix hanging poll() for stream sockets

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Parthasarathy Bhuvaragan <[email protected]>


[ Upstream commit 517d7c79bdb39864e617960504bdc1aa560c75c6 ]

In commit 42b531de17d2f6 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.

This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.

In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.

Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: Parthasarathy Bhuvaragan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -709,11 +709,11 @@ static unsigned int tipc_poll(struct fil

switch (sk->sk_state) {
case TIPC_ESTABLISHED:
+ case TIPC_CONNECTING:
if (!tsk->cong_link_cnt && !tsk_conn_cong(tsk))
mask |= POLLOUT;
/* fall thru' */
case TIPC_LISTEN:
- case TIPC_CONNECTING:
if (!skb_queue_empty(&sk->sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;


2018-01-01 14:57:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 095/146] sfc: pass valid pointers from efx_enqueue_unwind

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bert Kenward <[email protected]>


[ Upstream commit d4a7a8893d4cdbc89d79ac4aa704bf8d4b67b368 ]

The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.

efx_enqueue_unwind() is called very rarely, during error handling.
Without this fix it would fail with a NULL pointer dereference in
efx_dequeue_buffer, with efx_enqueue_skb in the call stack.

Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <[email protected]>
Signed-off-by: Bert Kenward <[email protected]>
Tested-by: Jarod Wilson <[email protected]>
Acked-by: Jarod Wilson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/sfc/tx.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -77,6 +77,7 @@ static void efx_dequeue_buffer(struct ef
}

if (buffer->flags & EFX_TX_BUF_SKB) {
+ EFX_WARN_ON_PARANOID(!pkts_compl || !bytes_compl);
(*pkts_compl)++;
(*bytes_compl) += buffer->skb->len;
dev_consume_skb_any((struct sk_buff *)buffer->skb);
@@ -426,12 +427,14 @@ static int efx_tx_map_data(struct efx_tx
static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
{
struct efx_tx_buffer *buffer;
+ unsigned int bytes_compl = 0;
+ unsigned int pkts_compl = 0;

/* Work backwards until we hit the original insert pointer value */
while (tx_queue->insert_count != tx_queue->write_count) {
--tx_queue->insert_count;
buffer = __efx_tx_queue_get_insert_buffer(tx_queue);
- efx_dequeue_buffer(tx_queue, buffer, NULL, NULL);
+ efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
}
}



2018-01-01 14:58:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 094/146] openvswitch: Fix pop_vlan action for double tagged frames

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Garver <[email protected]>


[ Upstream commit c48e74736fccf25fb32bb015426359e1c2016e3b ]

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <[email protected]>
Reviewed-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/openvswitch/flow.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -532,6 +532,7 @@ static int key_extract(struct sk_buff *s
return -EINVAL;

skb_reset_network_header(skb);
+ key->eth.type = skb->protocol;
} else {
eth = eth_hdr(skb);
ether_addr_copy(key->eth.src, eth->h_source);
@@ -545,15 +546,23 @@ static int key_extract(struct sk_buff *s
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;

- skb->protocol = parse_ethertype(skb);
- if (unlikely(skb->protocol == htons(0)))
+ key->eth.type = parse_ethertype(skb);
+ if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;

+ /* Multiple tagged packets need to retain TPID to satisfy
+ * skb_vlan_pop(), which will later shift the ethertype into
+ * skb->protocol.
+ */
+ if (key->eth.cvlan.tci & htons(VLAN_TAG_PRESENT))
+ skb->protocol = key->eth.cvlan.tpid;
+ else
+ skb->protocol = key->eth.type;
+
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
- key->eth.type = skb->protocol;

/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {


2018-01-01 14:44:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 089/146] net/mlx5e: Fix possible deadlock of VXLAN lock

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gal Pressman <[email protected]>


[ Upstream commit 6323514116404cc651df1b7fffa1311ddf8ce647 ]

mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
context) and mlx5e_features_check (softirq), but the lock acquired does
not disable bottom half and might result in deadlock. Fix it by simply
replacing spin_lock() with spin_lock_bh().
While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().

lockdep's WARNING: inconsistent lock state
[ 654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
[ 654.028321] (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[ 654.028528] {SOFTIRQ-ON-W} state was registered at:
[ 654.028607] _raw_spin_lock+0x3c/0x70
[ 654.028689] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[ 654.028794] mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
[ 654.028878] process_one_work+0x1e9/0x640
[ 654.028942] worker_thread+0x4a/0x3f0
[ 654.029002] kthread+0x141/0x180
[ 654.029056] ret_from_fork+0x24/0x30
[ 654.029114] irq event stamp: 579088
[ 654.029174] hardirqs last enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
[ 654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
[ 654.029446] softirqs last enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
[ 654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
[ 654.029684] other info that might help us debug this:
[ 654.029781] Possible unsafe locking scenario:

[ 654.029868] CPU0
[ 654.029908] ----
[ 654.029947] lock(&(&vxlan_db->lock)->rlock);
[ 654.030045] <Interrupt>
[ 654.030090] lock(&(&vxlan_db->lock)->rlock);
[ 654.030162]
*** DEADLOCK ***

Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/vxlan.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c
@@ -71,9 +71,9 @@ struct mlx5e_vxlan *mlx5e_vxlan_lookup_p
struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan;
struct mlx5e_vxlan *vxlan;

- spin_lock(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
vxlan = radix_tree_lookup(&vxlan_db->tree, port);
- spin_unlock(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);

return vxlan;
}
@@ -100,9 +100,9 @@ static void mlx5e_vxlan_add_port(struct

vxlan->udp_port = port;

- spin_lock_irq(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
err = radix_tree_insert(&vxlan_db->tree, vxlan->udp_port, vxlan);
- spin_unlock_irq(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);
if (err)
goto err_free;

@@ -121,9 +121,9 @@ static void __mlx5e_vxlan_core_del_port(
struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan;
struct mlx5e_vxlan *vxlan;

- spin_lock_irq(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
vxlan = radix_tree_delete(&vxlan_db->tree, port);
- spin_unlock_irq(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);

if (!vxlan)
return;
@@ -171,12 +171,12 @@ void mlx5e_vxlan_cleanup(struct mlx5e_pr
struct mlx5e_vxlan *vxlan;
unsigned int port = 0;

- spin_lock_irq(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
while (radix_tree_gang_lookup(&vxlan_db->tree, (void **)&vxlan, port, 1)) {
port = vxlan->udp_port;
- spin_unlock_irq(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);
__mlx5e_vxlan_core_del_port(priv, (u16)port);
- spin_lock_irq(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
}
- spin_unlock_irq(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);
}


2018-01-01 14:59:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 087/146] tcp: invalidate rate samples during SACK reneging

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yousuk Seung <[email protected]>


[ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ]

Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.

< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000

In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.

This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.

Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Acked-by: Priyaranjan Jha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/tcp.h | 3 ++-
include/net/tcp.h | 2 +-
net/ipv4/tcp.c | 1 +
net/ipv4/tcp_input.c | 10 ++++++++--
net/ipv4/tcp_rate.c | 10 +++++++---
5 files changed, 19 insertions(+), 7 deletions(-)

--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -214,7 +214,8 @@ struct tcp_sock {
u8 chrono_type:2, /* current chronograph type */
rate_app_limited:1, /* rate_{delivered,interval_us} limited? */
fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
- unused:4;
+ is_sack_reneg:1, /* in recovery from loss with SACK reneg? */
+ unused:3;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
unused1 : 1,
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1085,7 +1085,7 @@ void tcp_rate_skb_sent(struct sock *sk,
void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
struct rate_sample *rs);
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs);
+ bool is_sack_reneg, struct rate_sample *rs);
void tcp_rate_check_app_limited(struct sock *sk);

/* These functions determine how the current flow behaves in respect of SACK
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2356,6 +2356,7 @@ int tcp_disconnect(struct sock *sk, int
tp->snd_cwnd_cnt = 0;
tp->window_clamp = 0;
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
tcp_clear_retrans(tp);
inet_csk_delack_init(sk);
/* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1975,6 +1975,8 @@ void tcp_enter_loss(struct sock *sk)
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
tp->sacked_out = 0;
tp->fackets_out = 0;
+ /* Mark SACK reneging until we recover from this loss event. */
+ tp->is_sack_reneg = 1;
}
tcp_clear_all_retrans_hints(tp);

@@ -2428,6 +2430,7 @@ static bool tcp_try_undo_recovery(struct
return true;
}
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
return false;
}

@@ -2459,8 +2462,10 @@ static bool tcp_try_undo_loss(struct soc
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSPURIOUSRTOS);
inet_csk(sk)->icsk_retransmits = 0;
- if (frto_undo || tcp_is_sack(tp))
+ if (frto_undo || tcp_is_sack(tp)) {
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
+ }
return true;
}
return false;
@@ -3551,6 +3556,7 @@ static int tcp_ack(struct sock *sk, cons
struct tcp_sacktag_state sack_state;
struct rate_sample rs = { .prior_delivered = 0 };
u32 prior_snd_una = tp->snd_una;
+ bool is_sack_reneg = tp->is_sack_reneg;
u32 ack_seq = TCP_SKB_CB(skb)->seq;
u32 ack = TCP_SKB_CB(skb)->ack_seq;
bool is_dupack = false;
@@ -3666,7 +3672,7 @@ static int tcp_ack(struct sock *sk, cons

delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */
lost = tp->lost - lost; /* freshly marked lost */
- tcp_rate_gen(sk, delivered, lost, sack_state.rate);
+ tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
tcp_cong_control(sk, ack, delivered, flag, sack_state.rate);
tcp_xmit_recovery(sk, rexmit);
return 1;
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -106,7 +106,7 @@ void tcp_rate_skb_delivered(struct sock

/* Update the connection delivery information and generate a rate sample. */
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs)
+ bool is_sack_reneg, struct rate_sample *rs)
{
struct tcp_sock *tp = tcp_sk(sk);
u32 snd_us, ack_us;
@@ -124,8 +124,12 @@ void tcp_rate_gen(struct sock *sk, u32 d

rs->acked_sacked = delivered; /* freshly ACKed or SACKed */
rs->losses = lost; /* freshly marked lost */
- /* Return an invalid sample if no timing information is available. */
- if (!rs->prior_mstamp) {
+ /* Return an invalid sample if no timing information is available or
+ * in recovery from loss with SACK reneging. Rate samples taken during
+ * a SACK reneging event may overestimate bw by including packets that
+ * were SACKed before the reneg.
+ */
+ if (!rs->prior_mstamp || is_sack_reneg) {
rs->delivered = -1;
rs->interval_us = -1;
return;


2018-01-01 14:43:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 082/146] ipv4: Fix use-after-free when flushing FIB tables

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ido Schimmel <[email protected]>


[ Upstream commit b4681c2829e24943aadd1a7bb3a30d41d0a20050 ]

Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
local table uses the same trie allocated for the main table when custom
rules are not in use.

When a net namespace is dismantled, the main table is flushed and freed
(via an RCU callback) before the local table. In case the callback is
invoked before the local table is iterated, a use-after-free can occur.

Fix this by iterating over the FIB tables in reverse order, so that the
main table is always freed after the local table.

v3: Reworded comment according to Alex's suggestion.
v2: Add a comment to make the fix more explicit per Dave's and Alex's
feedback.

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/fib_frontend.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1274,14 +1274,19 @@ err_table_hash_alloc:

static void ip_fib_net_exit(struct net *net)
{
- unsigned int i;
+ int i;

rtnl_lock();
#ifdef CONFIG_IP_MULTIPLE_TABLES
RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
#endif
- for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
+ /* Destroy the tables in reverse order to guarantee that the
+ * local table, ID 255, is destroyed before the main table, ID
+ * 254. This is necessary as the local table may contain
+ * references to data contained in the main table.
+ */
+ for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) {
struct hlist_head *head = &net->ipv4.fib_table_hash[i];
struct hlist_node *tmp;
struct fib_table *tb;


2018-01-01 14:43:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 080/146] adding missing rcu_read_unlock in ipxip6_rcv

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Nikita V. Shirokov" <[email protected]>


[ Upstream commit 74c4b656c3d92ec4c824ea1a4afd726b7b6568c8 ]

commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this

v1->v2:
instead of doing rcu_read_unlock in place, we are going to "drop"
section (to prevent skb leakage)

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Nikita V. Shirokov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_tunnel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -912,7 +912,7 @@ static int ipxip6_rcv(struct sk_buff *sk
if (t->parms.collect_md) {
tun_dst = ipv6_tun_rx_dst(skb, 0, 0, 0);
if (!tun_dst)
- return 0;
+ goto drop;
}
ret = __ip6_tnl_rcv(t, skb, tpi, tun_dst, dscp_ecn_decapsulate,
log_ecn_error);


2018-01-01 14:43:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 079/146] sctp: Replace use of sockets_allocated with specified macro.

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tonghao Zhang <[email protected]>


[ Upstream commit 8cb38a602478e9f806571f6920b0a3298aabf042 ]

The patch(180d8cd942ce) replaces all uses of struct sock fields'
memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem
to accessor macros. But the sockets_allocated field of sctp sock is
not replaced at all. Then replace it now for unifying the code.

Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
Cc: Glauber Costa <[email protected]>
Signed-off-by: Tonghao Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/socket.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4413,7 +4413,7 @@ static int sctp_init_sock(struct sock *s
SCTP_DBG_OBJCNT_INC(sock);

local_bh_disable();
- percpu_counter_inc(&sctp_sockets_allocated);
+ sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);

/* Nothing can fail after this block, otherwise
@@ -4457,7 +4457,7 @@ static void sctp_destroy_sock(struct soc
}
sctp_endpoint_free(sp->ep);
local_bh_disable();
- percpu_counter_dec(&sctp_sockets_allocated);
+ sk_sockets_allocated_dec(sk);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
local_bh_enable();
}


2018-01-01 15:01:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 036/146] ring-buffer: Mask out the info bits when returning buffer page length

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.

Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.

What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.

Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -281,6 +281,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)

+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -332,7 +334,9 @@ static void rb_init_page(struct buffer_d
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}



2018-01-01 15:02:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 032/146] x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <[email protected]>

commit 75298aa179d56cd64f54e58a19fffc8ab922b4c0 upstream.

The upcoming support for dumping the kernel and the user space page tables
of the current process would create more random files in the top level
debugfs directory.

Add a page table directory and move the existing file to it.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/debug_pagetables.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -22,21 +22,26 @@ static const struct file_operations ptdu
.release = single_release,
};

-static struct dentry *pe;
+static struct dentry *dir, *pe;

static int __init pt_dump_debug_init(void)
{
- pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
- &ptdump_fops);
- if (!pe)
+ dir = debugfs_create_dir("page_tables", NULL);
+ if (!dir)
return -ENOMEM;

+ pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops);
+ if (!pe)
+ goto err;
return 0;
+err:
+ debugfs_remove_recursive(dir);
+ return -ENOMEM;
}

static void __exit pt_dump_debug_exit(void)
{
- debugfs_remove_recursive(pe);
+ debugfs_remove_recursive(dir);
}

module_init(pt_dump_debug_init);


2018-01-01 14:42:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 061/146] net: fec: unmap the xmit buffer that are not transferred by DMA

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Fugang Duan <[email protected]>


[ Upstream commit 178e5f57a8d8f8fc5799a624b96fc31ef9a29ffa ]

The enet IP only support 32 bit, it will use swiotlb buffer to do dma
mapping when xmit buffer DMA memory address is bigger than 4G in i.MX
platform. After stress suspend/resume test, it will print out:

log:
[12826.352864] fec 5b040000.ethernet: swiotlb buffer is full (sz: 191 bytes)
[12826.359676] DMA: Out of SW-IOMMU space for 191 bytes at device 5b040000.ethernet
[12826.367110] fec 5b040000.ethernet eth0: Tx DMA memory map failed

The issue is that the ready xmit buffers that are dma mapped but DMA still
don't copy them into fifo, once MAC restart, these DMA buffers are not unmapped.
So it should check the dma mapping buffer and unmap them.

Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/freescale/fec_main.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -818,6 +818,12 @@ static void fec_enet_bd_init(struct net_
for (i = 0; i < txq->bd.ring_size; i++) {
/* Initialize the BD for every fragment in the page. */
bdp->cbd_sc = cpu_to_fec16(0);
+ if (bdp->cbd_bufaddr &&
+ !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr)))
+ dma_unmap_single(&fep->pdev->dev,
+ fec32_to_cpu(bdp->cbd_bufaddr),
+ fec16_to_cpu(bdp->cbd_datlen),
+ DMA_TO_DEVICE);
if (txq->tx_skbuff[i]) {
dev_kfree_skb_any(txq->tx_skbuff[i]);
txq->tx_skbuff[i] = NULL;


2018-01-01 15:03:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 030/146] x86/dumpstack: Indicate in Oops whether PTI is configured and enabled

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <[email protected]>

commit 5f26d76c3fd67c48806415ef8b1116c97beff8ba upstream.

CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may
still have some corner cases which could take some time to manifest and be
fixed. It would be useful to have Oops messages indicate whether it was
enabled for building the kernel, and whether it was disabled during boot.

Example of fully enabled:

Oops: 0001 [#1] SMP PTI

Example of enabled during build, but disabled during boot:

Oops: 0001 [#1] SMP NOPTI

We can decide to remove this after the feature has been tested in the field
long enough.

[ tglx: Made it use boot_cpu_has() as requested by Borislav ]

Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Eduardo Valentin <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirsky <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/dumpstack.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -297,11 +297,13 @@ int __die(const char *str, struct pt_reg
unsigned long sp;
#endif
printk(KERN_DEFAULT
- "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter,
+ "%s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff, ++die_counter,
IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
- IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "");
+ IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "",
+ IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION) ?
+ (boot_cpu_has(X86_FEATURE_PTI) ? " PTI" : " NOPTI") : "");

if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_nr, SIGSEGV) == NOTIFY_STOP)


2018-01-01 15:04:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 055/146] cpufreq: schedutil: Use idle_calls counter of the remote CPU

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joel Fernandes <[email protected]>

commit 466a2b42d67644447a1765276259a3ea5531ddff upstream.

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/time/tick-sched.c | 13 +++++++++++++
3 files changed, 15 insertions(+), 1 deletion(-)

--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -119,6 +119,7 @@ extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern unsigned long tick_nohz_get_idle_calls(void);
+extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -244,7 +244,7 @@ static void sugov_iowait_boost(struct su
#ifdef CONFIG_NO_HZ_COMMON
static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
{
- unsigned long idle_calls = tick_nohz_get_idle_calls();
+ unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
bool ret = idle_calls == sg_cpu->saved_idle_calls;

sg_cpu->saved_idle_calls = idle_calls;
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1010,6 +1010,19 @@ ktime_t tick_nohz_get_sleep_length(void)
}

/**
+ * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
+ * for a particular CPU.
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
+{
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
+
+ return ts->idle_calls;
+}
+
+/**
* tick_nohz_get_idle_calls - return the current idle calls counter value
*
* Called from the schedutil frequency scaling governor in scheduler context.


2018-01-01 15:04:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 053/146] ALSA: hda - fix headset mic detection issue on a Dell machine

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hui Wang <[email protected]>

commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream.

It has the codec alc256, and add its pin definition to pin quirk
table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.

Signed-off-by: Hui Wang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6559,6 +6559,11 @@ static const struct snd_hda_pin_quirk al
{0x1b, 0x01011020},
{0x21, 0x02211010}),
SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
+ {0x12, 0x90a60130},
+ {0x14, 0x90170110},
+ {0x1b, 0x01011020},
+ {0x21, 0x0221101f}),
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x12, 0x90a60160},
{0x14, 0x90170120},
{0x21, 0x02211030}),


2018-01-01 15:05:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 048/146] IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Moni Shoua <[email protected]>

commit 05d14e7b0c138cb07ba30e464f47b39434f3fdef upstream.

If the input command length is larger than the kernel supports an error should
be returned in case the unsupported bytes are not cleared, instead of the
other way aroudn. This matches what all other callers of ib_is_udata_cleared
do and will avoid user ABI problems in the future.

Fixes: 189aba99e700 ("IB/uverbs: Extend modify_qp and support packet pacing")
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/core/uverbs_cmd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2085,8 +2085,8 @@ int ib_uverbs_ex_modify_qp(struct ib_uve
return -EOPNOTSUPP;

if (ucore->inlen > sizeof(cmd)) {
- if (ib_is_udata_cleared(ucore, sizeof(cmd),
- ucore->inlen - sizeof(cmd)))
+ if (!ib_is_udata_cleared(ucore, sizeof(cmd),
+ ucore->inlen - sizeof(cmd)))
return -EOPNOTSUPP;
}



2018-01-01 15:05:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 029/146] x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit 0a126abd576ebc6403f063dbe20cf7416c9d9393 upstream.

Ideally we'd also use sparse to enforce this separation so it becomes much
more difficult to mess up.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/tlbflush.h | 55 +++++++++++++++++++++++++++++++---------
1 file changed, 43 insertions(+), 12 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,16 +13,33 @@
#include <asm/pti.h>
#include <asm/processor-flags.h>

-static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
-{
- /*
- * Bump the generation count. This also serves as a full barrier
- * that synchronizes with switch_mm(): callers are required to order
- * their read of mm_cpumask after their writes to the paging
- * structures.
- */
- return atomic64_inc_return(&mm->context.tlb_gen);
-}
+/*
+ * The x86 feature is called PCID (Process Context IDentifier). It is similar
+ * to what is traditionally called ASID on the RISC processors.
+ *
+ * We don't use the traditional ASID implementation, where each process/mm gets
+ * its own ASID and flush/restart when we run out of ASID space.
+ *
+ * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
+ * that came by on this CPU, allowing cheaper switch_mm between processes on
+ * this CPU.
+ *
+ * We end up with different spaces for different things. To avoid confusion we
+ * use different names for each of them:
+ *
+ * ASID - [0, TLB_NR_DYN_ASIDS-1]
+ * the canonical identifier for an mm
+ *
+ * kPCID - [1, TLB_NR_DYN_ASIDS]
+ * the value we write into the PCID part of CR3; corresponds to the
+ * ASID+1, because PCID 0 is special.
+ *
+ * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
+ * for KPTI each mm has two address spaces and thus needs two
+ * PCID values, but we can still do with a single ASID denomination
+ * for each mm. Corresponds to kPCID + 2048.
+ *
+ */

/* There are 12 bits of space for ASIDS in CR3 */
#define CR3_HW_ASID_BITS 12
@@ -41,7 +58,7 @@ static inline u64 inc_mm_tlb_gen(struct

/*
* ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
- * for them being zero-based. Another -1 is because ASID 0 is reserved for
+ * for them being zero-based. Another -1 is because PCID 0 is reserved for
* use by non-PCID-aware users.
*/
#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
@@ -52,6 +69,9 @@ static inline u64 inc_mm_tlb_gen(struct
*/
#define TLB_NR_DYN_ASIDS 6

+/*
+ * Given @asid, compute kPCID
+ */
static inline u16 kern_pcid(u16 asid)
{
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
@@ -86,7 +106,7 @@ static inline u16 kern_pcid(u16 asid)
}

/*
- * The user PCID is just the kernel one, plus the "switch bit".
+ * Given @asid, compute uPCID
*/
static inline u16 user_pcid(u16 asid)
{
@@ -484,6 +504,17 @@ static inline void flush_tlb_page(struct
void native_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info);

+static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
+{
+ /*
+ * Bump the generation count. This also serves as a full barrier
+ * that synchronizes with switch_mm(): callers are required to order
+ * their read of mm_cpumask after their writes to the paging
+ * structures.
+ */
+ return atomic64_inc_return(&mm->context.tlb_gen);
+}
+
static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm)
{


2018-01-01 15:06:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 044/146] ASoC: tlv320aic31xx: Fix GPIO1 register definition

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andrew F. Davis <[email protected]>

commit 737e0b7b67bdfe24090fab2852044bb283282fc5 upstream.

GPIO1 control register is number 51, fix this here.

Fixes: bafcbfe429eb ("ASoC: tlv320aic31xx: Make the register values human readable")
Signed-off-by: Andrew F. Davis <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/tlv320aic31xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/soc/codecs/tlv320aic31xx.h
+++ b/sound/soc/codecs/tlv320aic31xx.h
@@ -116,7 +116,7 @@ struct aic31xx_pdata {
/* INT2 interrupt control */
#define AIC31XX_INT2CTRL AIC31XX_REG(0, 49)
/* GPIO1 control */
-#define AIC31XX_GPIO1 AIC31XX_REG(0, 50)
+#define AIC31XX_GPIO1 AIC31XX_REG(0, 51)

#define AIC31XX_DACPRB AIC31XX_REG(0, 60)
/* ADC Instruction Set Register */


2018-01-01 15:06:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 043/146] ASoC: twl4030: fix child-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.

Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/twl4030.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/sound/soc/codecs/twl4030.c
+++ b/sound/soc/codecs/twl4030.c
@@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl403
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;

- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");

if (!pdata && twl4030_codec_node) {
@@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl403
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}

return pdata;


2018-01-01 14:40:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 039/146] ASoC: codecs: msm8916-wcd: Fix supported formats

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Kandagatla <[email protected]>

commit 51f493ae71adc2c49a317a13c38e54e1cdf46005 upstream.

This codec is configurable for only 16 bit and 32 bit samples, so reflect
this in the supported formats also remove 24bit sample from supported list.

Signed-off-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/msm8916-wcd-analog.c | 2 +-
sound/soc/codecs/msm8916-wcd-digital.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

--- a/sound/soc/codecs/msm8916-wcd-analog.c
+++ b/sound/soc/codecs/msm8916-wcd-analog.c
@@ -267,7 +267,7 @@
#define MSM8916_WCD_ANALOG_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000 |\
SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_ANALOG_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)

static int btn_mask = SND_JACK_BTN_0 | SND_JACK_BTN_1 |
SND_JACK_BTN_2 | SND_JACK_BTN_3 | SND_JACK_BTN_4;
--- a/sound/soc/codecs/msm8916-wcd-digital.c
+++ b/sound/soc/codecs/msm8916-wcd-digital.c
@@ -194,7 +194,7 @@
SNDRV_PCM_RATE_32000 | \
SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_DIGITAL_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)

struct msm8916_wcd_digital_priv {
struct clk *ahbclk, *mclk;
@@ -645,7 +645,7 @@ static int msm8916_wcd_digital_hw_params
RX_I2S_CTL_RX_I2S_MODE_MASK,
RX_I2S_CTL_RX_I2S_MODE_16);
break;
- case SNDRV_PCM_FORMAT_S24_LE:
+ case SNDRV_PCM_FORMAT_S32_LE:
snd_soc_update_bits(dai->codec, LPASS_CDC_CLK_TX_I2S_CTL,
TX_I2S_CTL_TX_I2S_MODE_MASK,
TX_I2S_CTL_TX_I2S_MODE_32);


2018-01-01 14:40:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 037/146] ring-buffer: Do no reuse reader page if still in use

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 upstream.

To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.

The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.

Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.

Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4443,8 +4443,13 @@ void ring_buffer_free_read_page(struct r
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct buffer_data_page *bpage = data;
+ struct page *page = virt_to_page(bpage);
unsigned long flags;

+ /* If the page is still in use someplace else, we can't reuse it */
+ if (page_ref_count(page) > 1)
+ goto out;
+
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);

@@ -4456,6 +4461,7 @@ void ring_buffer_free_read_page(struct r
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);

+ out:
free_page((unsigned long)bpage);
}
EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);


2018-01-01 15:07:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 038/146] iw_cxgb4: Only validate the MSN for successful completions

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steve Wise <[email protected]>

commit f55688c45442bc863f40ad678c638785b26cdce6 upstream.

If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).

Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/cxgb4/cq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -586,10 +586,10 @@ static int poll_cq(struct t4_wq *wq, str
ret = -EAGAIN;
goto skip_cqe;
}
- if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) {
+ if (unlikely(!CQE_STATUS(hw_cqe) &&
+ CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) {
t4_set_wq_in_error(wq);
- hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN));
- goto proc_cqe;
+ hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN));
}
goto proc_cqe;
}


2018-01-01 15:08:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 008/146] x86/pti: Add the pti= cmdline option and documentation

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <[email protected]>

commit 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 upstream.

Keep the "nopti" optional for traditional reasons.

[ tglx: Don't allow force on when running on XEN PV and made 'on'
printout conditional ]

Requested-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirsky <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/admin-guide/kernel-parameters.txt | 6 +++++
arch/x86/mm/pti.c | 26 +++++++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3255,6 +3255,12 @@
pt. [PARIDE]
See Documentation/blockdev/paride.txt.

+ pti= [X86_64]
+ Control user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -54,21 +54,45 @@ static void __init pti_print_if_insecure
pr_info("%s\n", reason);
}

+static void __init pti_print_if_secure(const char *reason)
+{
+ if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ pr_info("%s\n", reason);
+}
+
void __init pti_check_boottime_disable(void)
{
+ char arg[5];
+ int ret;
+
if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
pti_print_if_insecure("disabled on XEN PV.");
return;
}

+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (ret == 3 && !strncmp(arg, "off", 3)) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+ if (ret == 2 && !strncmp(arg, "on", 2)) {
+ pti_print_if_secure("force enabled on command line.");
+ goto enable;
+ }
+ if (ret == 4 && !strncmp(arg, "auto", 4))
+ goto autosel;
+ }
+
if (cmdline_find_option_bool(boot_command_line, "nopti")) {
pti_print_if_insecure("disabled on command line.");
return;
}

+autosel:
if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
return;
-
+enable:
setup_force_cpu_cap(X86_FEATURE_PTI);
}



2018-01-01 15:08:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 005/146] x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit c313ec66317d421fb5768d78c56abed2dc862264 upstream.

Global pages stay in the TLB across context switches. Since all contexts
share the same kernel mapping, these mappings are marked as global pages
so kernel entries in the TLB are not flushed out on a context switch.

But, even having these entries in the TLB opens up something that an
attacker can use, such as the double-page-fault attack:

http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf

That means that even when PAGE_TABLE_ISOLATION switches page tables
on return to user space the global pages would stay in the TLB cache.

Disable global pages so that kernel TLB entries can be flushed before
returning to user space. This way, all accesses to kernel addresses from
userspace result in a TLB miss independent of the existence of a kernel
mapping.

Suppress global pages via the __supported_pte_mask. The user space
mappings set PAGE_GLOBAL for the minimal kernel mappings which are
required for entry/exit. These mappings are set up manually so the
filtering does not take place.

[ The __supported_pte_mask simplification was written by Thomas Gleixner. ]
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/init.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,6 +161,12 @@ struct map_range {

static int page_size_mask;

+static void enable_global_pages(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ __supported_pte_mask |= _PAGE_GLOBAL;
+}
+
static void __init probe_page_size_mask(void)
{
/*
@@ -179,11 +185,11 @@ static void __init probe_page_size_mask(
cr4_set_bits_and_update_boot(X86_CR4_PSE);

/* Enable PGE if available */
+ __supported_pte_mask &= ~_PAGE_GLOBAL;
if (boot_cpu_has(X86_FEATURE_PGE)) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
- __supported_pte_mask |= _PAGE_GLOBAL;
- } else
- __supported_pte_mask &= ~_PAGE_GLOBAL;
+ enable_global_pages();
+ }

/* Enable 1 GB linear kernel mappings if available: */
if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {


2018-01-01 15:09:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 026/146] x86/mm: Use/Fix PCID to optimize user/kernel switches

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit 6fd166aae78c0ab738d49bda653cbd9e3b1491cf upstream.

We can use PCID to retain the TLBs across CR3 switches; including those now
part of the user/kernel switch. This increases performance of kernel
entry/exit at the cost of more expensive/complicated TLB flushing.

Now that we have two address spaces, one for kernel and one for user space,
we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
(just like we use the PFN LSB for the PGD). Since we do TLB invalidation
from kernel space, the existing code will only invalidate the kernel PCID,
we augment that by marking the corresponding user PCID invalid, and upon
switching back to userspace, use a flushing CR3 write for the switch.

In order to access the user_pcid_flush_mask we use PER_CPU storage, which
means the previously established SWAPGS vs CR3 ordering is now mandatory
and required.

Having to do this memory access does require additional registers, most
sites have a functioning stack and we can spill one (RAX), sites without
functional stack need to otherwise provide the second scratch register.

Note: PCID is generally available on Intel Sandybridge and later CPUs.
Note: Up until this point TLB flushing was broken in this series.

Based-on-code-from: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/calling.h | 72 ++++++++++++++++++----
arch/x86/entry/entry_64.S | 9 +-
arch/x86/entry/entry_64_compat.S | 4 -
arch/x86/include/asm/processor-flags.h | 5 +
arch/x86/include/asm/tlbflush.h | 91 ++++++++++++++++++++++++----
arch/x86/include/uapi/asm/processor-flags.h | 7 +-
arch/x86/kernel/asm-offsets.c | 4 +
arch/x86/mm/init.c | 2
arch/x86/mm/tlb.c | 1
9 files changed, 162 insertions(+), 33 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -3,6 +3,9 @@
#include <asm/unwind_hints.h>
#include <asm/cpufeatures.h>
#include <asm/page_types.h>
+#include <asm/percpu.h>
+#include <asm/asm-offsets.h>
+#include <asm/processor-flags.h>

/*

@@ -191,17 +194,21 @@ For 32-bit we have the following convent

#ifdef CONFIG_PAGE_TABLE_ISOLATION

-/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */
-#define PTI_SWITCH_MASK (1<<PAGE_SHIFT)
+/*
+ * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
+ * halves:
+ */
+#define PTI_SWITCH_PGTABLES_MASK (1<<PAGE_SHIFT)
+#define PTI_SWITCH_MASK (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))

-.macro ADJUST_KERNEL_CR3 reg:req
- /* Clear "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
- andq $(~PTI_SWITCH_MASK), \reg
+.macro SET_NOFLUSH_BIT reg:req
+ bts $X86_CR3_PCID_NOFLUSH_BIT, \reg
.endm

-.macro ADJUST_USER_CR3 reg:req
- /* Move CR3 up a page to the user page tables: */
- orq $(PTI_SWITCH_MASK), \reg
+.macro ADJUST_KERNEL_CR3 reg:req
+ ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID
+ /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+ andq $(~PTI_SWITCH_MASK), \reg
.endm

.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
@@ -212,21 +219,58 @@ For 32-bit we have the following convent
.Lend_\@:
.endm

-.macro SWITCH_TO_USER_CR3 scratch_reg:req
+#define THIS_CPU_user_pcid_flush_mask \
+ PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
- ADJUST_USER_CR3 \scratch_reg
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * Test if the ASID needs a flush.
+ */
+ movq \scratch_reg, \scratch_reg2
+ andq $(0x7FF), \scratch_reg /* mask ASID */
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ /* Flush needed, clear the bit */
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ movq \scratch_reg2, \scratch_reg
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ movq \scratch_reg2, \scratch_reg
+ SET_NOFLUSH_BIT \scratch_reg
+
+.Lwrcr3_\@:
+ /* Flip the PGD and ASID to the user version */
+ orq $(PTI_SWITCH_MASK), \scratch_reg
mov \scratch_reg, %cr3
.Lend_\@:
.endm

+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
+ pushq %rax
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=\scratch_reg scratch_reg2=%rax
+ popq %rax
+.endm
+
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
movq %cr3, \scratch_reg
movq \scratch_reg, \save_reg
/*
- * Is the switch bit zero? This means the address is
- * up in real PAGE_TABLE_ISOLATION patches in a moment.
+ * Is the "switch mask" all zero? That means that both of
+ * these are zero:
+ *
+ * 1. The user/kernel PCID bit, and
+ * 2. The user/kernel "bit" that points CR3 to the
+ * bottom half of the 8k PGD
+ *
+ * That indicates a kernel CR3 value, not a user CR3.
*/
testq $(PTI_SWITCH_MASK), \scratch_reg
jz .Ldone_\@
@@ -251,7 +295,9 @@ For 32-bit we have the following convent

.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
.endm
-.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
+.endm
+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
.endm
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -23,7 +23,6 @@
#include <asm/segment.h>
#include <asm/cache.h>
#include <asm/errno.h>
-#include "calling.h"
#include <asm/asm-offsets.h>
#include <asm/msr.h>
#include <asm/unistd.h>
@@ -40,6 +39,8 @@
#include <asm/frame.h>
#include <linux/err.h>

+#include "calling.h"
+
.code64
.section .entry.text, "ax"

@@ -406,7 +407,7 @@ syscall_return_via_sysret:
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
- SWITCH_TO_USER_CR3 scratch_reg=%rdi
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi

popq %rdi
popq %rsp
@@ -744,7 +745,7 @@ GLOBAL(swapgs_restore_regs_and_return_to
* We can do future final exit work right here.
*/

- SWITCH_TO_USER_CR3 scratch_reg=%rdi
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi

/* Restore RDI. */
popq %rdi
@@ -857,7 +858,7 @@ native_irq_return_ldt:
*/
orq PER_CPU_VAR(espfix_stack), %rax

- SWITCH_TO_USER_CR3 scratch_reg=%rdi /* to user CR3 */
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
SWAPGS /* to user GS */
popq %rdi /* Restore user RDI */

--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -275,9 +275,9 @@ sysret32_from_system_call:
* switch until after after the last reference to the process
* stack.
*
- * %r8 is zeroed before the sysret, thus safe to clobber.
+ * %r8/%r9 are zeroed before the sysret, thus safe to clobber.
*/
- SWITCH_TO_USER_CR3 scratch_reg=%r8
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=%r8 scratch_reg2=%r9

xorq %r8, %r8
xorq %r9, %r9
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -38,6 +38,11 @@
#define CR3_ADDR_MASK __sme_clr(0x7FFFFFFFFFFFF000ull)
#define CR3_PCID_MASK 0xFFFull
#define CR3_NOFLUSH BIT_ULL(63)
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_SWITCH_BIT 11
+#endif
+
#else
/*
* CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -10,6 +10,8 @@
#include <asm/special_insns.h>
#include <asm/smp.h>
#include <asm/invpcid.h>
+#include <asm/pti.h>
+#include <asm/processor-flags.h>

static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
{
@@ -24,24 +26,54 @@ static inline u64 inc_mm_tlb_gen(struct

/* There are 12 bits of space for ASIDS in CR3 */
#define CR3_HW_ASID_BITS 12
+
/*
* When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
* user/kernel switches
*/
-#define PTI_CONSUMED_ASID_BITS 0
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define PTI_CONSUMED_PCID_BITS 1
+#else
+# define PTI_CONSUMED_PCID_BITS 0
+#endif
+
+#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS)

-#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - PTI_CONSUMED_ASID_BITS)
/*
* ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
* for them being zero-based. Another -1 is because ASID 0 is reserved for
* use by non-PCID-aware users.
*/
-#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
+
+/*
+ * 6 because 6 should be plenty and struct tlb_state will fit in two cache
+ * lines.
+ */
+#define TLB_NR_DYN_ASIDS 6

static inline u16 kern_pcid(u16 asid)
{
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
/*
+ * Make sure that the dynamic ASID space does not confict with the
+ * bit we are using to switch between user and kernel ASIDs.
+ */
+ BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT));
+
+ /*
+ * The ASID being passed in here should have respected the
+ * MAX_ASID_AVAILABLE and thus never have the switch bit set.
+ */
+ VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT));
+#endif
+ /*
+ * The dynamically-assigned ASIDs that get passed in are small
+ * (<TLB_NR_DYN_ASIDS). They never have the high switch bit set,
+ * so do not bother to clear it.
+ *
* If PCID is on, ASID-aware code paths put the ASID+1 into the
* PCID bits. This serves two purposes. It prevents a nasty
* situation in which PCID-unaware code saves CR3, loads some other
@@ -95,12 +127,6 @@ static inline bool tlb_defer_switch_to_i
return !static_cpu_has(X86_FEATURE_PCID);
}

-/*
- * 6 because 6 should be plenty and struct tlb_state will fit in
- * two cache lines.
- */
-#define TLB_NR_DYN_ASIDS 6
-
struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
@@ -146,6 +172,13 @@ struct tlb_state {
bool invalidate_other;

/*
+ * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
+ * the corresponding user PCID needs a flush next time we
+ * switch to it; see SWITCH_TO_USER_CR3.
+ */
+ unsigned short user_pcid_flush_mask;
+
+ /*
* Access to this CR4 shadow and to H/W CR4 is protected by
* disabling interrupts when modifying either one.
*/
@@ -250,14 +283,41 @@ static inline void cr4_set_bits_and_upda
extern void initialize_tlbstate_and_flush(void);

/*
+ * Given an ASID, flush the corresponding user ASID. We can delay this
+ * until the next time we switch to it.
+ *
+ * See SWITCH_TO_USER_CR3.
+ */
+static inline void invalidate_user_asid(u16 asid)
+{
+ /* There is no user ASID if address space separation is off */
+ if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ return;
+
+ /*
+ * We only have a single ASID if PCID is off and the CR3
+ * write will have flushed it.
+ */
+ if (!cpu_feature_enabled(X86_FEATURE_PCID))
+ return;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ __set_bit(kern_pcid(asid),
+ (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
+}
+
+/*
* flush the entire current user mapping
*/
static inline void __native_flush_tlb(void)
{
+ invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
+ * If current->mm == NULL then we borrow a mm which may change
+ * during a task switch and therefore we must not be preempted
+ * while we write CR3 back:
*/
preempt_disable();
native_write_cr3(__native_read_cr3());
@@ -301,7 +361,14 @@ static inline void __native_flush_tlb_gl
*/
static inline void __native_flush_tlb_single(unsigned long addr)
{
+ u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ invalidate_user_asid(loaded_mm_asid);
}

/*
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -78,7 +78,12 @@
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+
+#define X86_CR3_PCID_BITS 12
+#define X86_CR3_PCID_MASK (_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
+
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)

/*
* Intel CPU features in CR4
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -17,6 +17,7 @@
#include <asm/sigframe.h>
#include <asm/bootparam.h>
#include <asm/suspend.h>
+#include <asm/tlbflush.h>

#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
@@ -94,6 +95,9 @@ void common(void) {
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));

+ /* TLB state for the entry code */
+ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
+
/* Layout info for cpu_entry_area */
OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss);
OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -855,7 +855,7 @@ void __init zone_sizes_init(void)
free_area_init_nodes(max_zone_pfns);
}

-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
.loaded_mm = &init_mm,
.next_asid = 1,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -105,6 +105,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
unsigned long new_mm_cr3;

if (need_flush) {
+ invalidate_user_asid(new_asid);
new_mm_cr3 = build_cr3(pgdir, new_asid);
} else {
new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);


2018-01-01 15:09:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 024/146] x86/mm: Allow flushing for future ASID switches

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 2ea907c4fe7b78e5840c1dc07800eae93248cad1 upstream.

If changing the page tables in such a way that an invalidation of all
contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated
by:

1. INVPCID for each PCID (works for single pages too).

2. Load CR3 with each PCID without the NOFLUSH bit set

3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address.

But, none of these are really feasible since there are ~6 ASIDs (12 with
PAGE_TABLE_ISOLATION) at the time that invalidation is required.
Instead of actively invalidating them, invalidate the *current* context and
also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be
required.

At the next context-switch, look for this indicator
('invalidate_other' being set) invalidate all of the
cpu_tlbstate.ctxs[] entries.

This ensures that any future context switches will do a full flush
of the TLB, picking up the previous changes.

[ tglx: Folded more fixups from Peter ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/tlbflush.h | 37 +++++++++++++++++++++++++++++--------
arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -135,6 +135,17 @@ struct tlb_state {
bool is_lazy;

/*
+ * If set we changed the page tables in such a way that we
+ * needed an invalidation of all contexts (aka. PCIDs / ASIDs).
+ * This tells us to go invalidate all the non-loaded ctxs[]
+ * on the next context switch.
+ *
+ * The current ctx was kept up-to-date as it ran and does not
+ * need to be invalidated.
+ */
+ bool invalidate_other;
+
+ /*
* Access to this CR4 shadow and to H/W CR4 is protected by
* disabling interrupts when modifying either one.
*/
@@ -212,6 +223,14 @@ static inline unsigned long cr4_read_sha
}

/*
+ * Mark all other ASIDs as invalid, preserves the current.
+ */
+static inline void invalidate_other_asid(void)
+{
+ this_cpu_write(cpu_tlbstate.invalidate_other, true);
+}
+
+/*
* Save some of cr4 feature set we're using (e.g. Pentium 4MB
* enable and PPro Global page enable), so that any CPU's that boot
* up after us can get the correct flags. This should only be used
@@ -298,14 +317,6 @@ static inline void __flush_tlb_all(void)
*/
__flush_tlb();
}
-
- /*
- * Note: if we somehow had PCID but not PGE, then this wouldn't work --
- * we'd end up flushing kernel translations for the current ASID but
- * we might fail to flush kernel translations for other cached ASIDs.
- *
- * To avoid this issue, we force PCID off if PGE is off.
- */
}

/*
@@ -315,6 +326,16 @@ static inline void __flush_tlb_one(unsig
{
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
__flush_tlb_single(addr);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ /*
+ * __flush_tlb_single() will have cleared the TLB entry for this ASID,
+ * but since kernel space is replicated across all, we must also
+ * invalidate all others.
+ */
+ invalidate_other_asid();
}

#define TLB_FLUSH_ALL -1UL
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -28,6 +28,38 @@
* Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
*/

+/*
+ * We get here when we do something requiring a TLB invalidation
+ * but could not go invalidate all of the contexts. We do the
+ * necessary invalidation by clearing out the 'ctx_id' which
+ * forces a TLB flush when the context is loaded.
+ */
+void clear_asid_other(void)
+{
+ u16 asid;
+
+ /*
+ * This is only expected to be set if we have disabled
+ * kernel _PAGE_GLOBAL pages.
+ */
+ if (!static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
+ /* Do not need to flush the current asid */
+ if (asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid))
+ continue;
+ /*
+ * Make sure the next time we go to switch to
+ * this asid, we do a flush:
+ */
+ this_cpu_write(cpu_tlbstate.ctxs[asid].ctx_id, 0);
+ }
+ this_cpu_write(cpu_tlbstate.invalidate_other, false);
+}
+
atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);


@@ -42,6 +74,9 @@ static void choose_new_asid(struct mm_st
return;
}

+ if (this_cpu_read(cpu_tlbstate.invalidate_other))
+ clear_asid_other();
+
for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) !=
next->context.ctx_id)


2018-01-01 14:39:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 021/146] x86/mm/64: Make a full PGD-entry size hole in the memory map

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 upstream.

Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting
at 0xff90000000000000 to be a full PGD entry.

A subsequent patch will use this hole for the pagetable isolation LDT
alias.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/x86/x86_64/mm.txt | 4 ++--
arch/x86/include/asm/pgtable_64_types.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -29,8 +29,8 @@ Virtual memory map with 5 level page tab
hole caused by [56:63] sign extension
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff91ffffffffffff (=49 bits) hole
-ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
+ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
... unused hole ...
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -79,8 +79,8 @@ typedef struct { pteval_t pte; } pte_t;
#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)

#ifdef CONFIG_X86_5LEVEL
-# define VMALLOC_SIZE_TB _AC(16384, UL)
-# define __VMALLOC_BASE _AC(0xff92000000000000, UL)
+# define VMALLOC_SIZE_TB _AC(12800, UL)
+# define __VMALLOC_BASE _AC(0xffa0000000000000, UL)
# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
#else
# define VMALLOC_SIZE_TB _AC(32, UL)


2018-01-01 15:10:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 023/146] x86/pti: Map the vsyscall page if needed

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 85900ea51577e31b186e523c8f4e068c79ecc7d3 upstream.

Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
space visible page tables.

[ tglx: Hide unused functions (Patch by Arnd Bergmann) ]

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/vsyscall/vsyscall_64.c | 6 +--
arch/x86/include/asm/vsyscall.h | 1
arch/x86/mm/pti.c | 65 ++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 3 deletions(-)

--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -344,14 +344,14 @@ int in_gate_area_no_mm(unsigned long add
* vsyscalls but leave the page not present. If so, we skip calling
* this.
*/
-static void __init set_vsyscall_pgtable_user_bits(void)
+void __init set_vsyscall_pgtable_user_bits(pgd_t *root)
{
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;

- pgd = pgd_offset_k(VSYSCALL_ADDR);
+ pgd = pgd_offset_pgd(root, VSYSCALL_ADDR);
set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
p4d = p4d_offset(pgd, VSYSCALL_ADDR);
#if CONFIG_PGTABLE_LEVELS >= 5
@@ -373,7 +373,7 @@ void __init map_vsyscall(void)
vsyscall_mode == NATIVE
? PAGE_KERNEL_VSYSCALL
: PAGE_KERNEL_VVAR);
- set_vsyscall_pgtable_user_bits();
+ set_vsyscall_pgtable_user_bits(swapper_pg_dir);
}

BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -7,6 +7,7 @@

#ifdef CONFIG_X86_VSYSCALL_EMULATION
extern void map_vsyscall(void);
+extern void set_vsyscall_pgtable_user_bits(pgd_t *root);

/*
* Called on instruction fetch fault in vsyscall page.
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -38,6 +38,7 @@

#include <asm/cpufeature.h>
#include <asm/hypervisor.h>
+#include <asm/vsyscall.h>
#include <asm/cmdline.h>
#include <asm/pti.h>
#include <asm/pgtable.h>
@@ -223,6 +224,69 @@ static pmd_t *pti_user_pagetable_walk_pm
return pmd_offset(pud, address);
}

+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+/*
+ * Walk the shadow copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down. Does not support large pages.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables. It is never used for userspace data.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static __init pte_t *pti_user_pagetable_walk_pte(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ pmd_t *pmd = pti_user_pagetable_walk_pmd(address);
+ pte_t *pte;
+
+ /* We can't do anything sensible if we hit a large mapping. */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+
+ if (pmd_none(*pmd)) {
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ new_pte_page = 0;
+ }
+ if (new_pte_page)
+ free_page(new_pte_page);
+ }
+
+ pte = pte_offset_kernel(pmd, address);
+ if (pte_flags(*pte) & _PAGE_USER) {
+ WARN_ONCE(1, "attempt to walk to user pte\n");
+ return NULL;
+ }
+ return pte;
+}
+
+static void __init pti_setup_vsyscall(void)
+{
+ pte_t *pte, *target_pte;
+ unsigned int level;
+
+ pte = lookup_address(VSYSCALL_ADDR, &level);
+ if (!pte || WARN_ON(level != PG_LEVEL_4K) || pte_none(*pte))
+ return;
+
+ target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR);
+ if (WARN_ON(!target_pte))
+ return;
+
+ *target_pte = *pte;
+ set_vsyscall_pgtable_user_bits(kernel_to_user_pgdp(swapper_pg_dir));
+}
+#else
+static void __init pti_setup_vsyscall(void) { }
+#endif
+
static void __init
pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
{
@@ -319,4 +383,5 @@ void __init pti_init(void)
pti_clone_user_shared();
pti_clone_entry_text();
pti_setup_espfix64();
+ pti_setup_vsyscall();
}


2018-01-01 15:10:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 002/146] tracing: Fix possible double free on failure of allocating trace buffer

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.

Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <[email protected]>
Reported-by: Chunyan Zhang <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7580,6 +7580,7 @@ allocate_trace_buffer(struct trace_array
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}



2018-01-01 15:11:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 013/146] x86/mm/pti: Add functions to clone kernel PMDs

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 03f4424f348e8be95eb1bbeba09461cd7b867828 upstream.

Provide infrastructure to:

- find a kernel PMD for a mapping which must be visible to user space for
the entry/exit code to work.

- walk an address range and share the kernel PMD with it.

This reuses a small part of the original KAISER patches to populate the
user space page table.

[ tglx: Made it universally usable so it can be used for any kind of shared
mapping. Add a mechanism to clear specific bits in the user space
visible PMD entry. Folded Andys simplifactions ]

Originally-by: Dave Hansen <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/pti.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -48,6 +48,11 @@
#undef pr_fmt
#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt

+/* Backporting helper */
+#ifndef __GFP_NOTRACK
+#define __GFP_NOTRACK 0
+#endif
+
static void __init pti_print_if_insecure(const char *reason)
{
if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
@@ -138,6 +143,128 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pg
}

/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a P4D on success, or NULL on failure.
+ */
+static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+{
+ pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+
+ if (address < PAGE_OFFSET) {
+ WARN_ONCE(1, "attempt to walk user address\n");
+ return NULL;
+ }
+
+ if (pgd_none(*pgd)) {
+ unsigned long new_p4d_page = __get_free_page(gfp);
+ if (!new_p4d_page)
+ return NULL;
+
+ if (pgd_none(*pgd)) {
+ set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
+ new_p4d_page = 0;
+ }
+ if (new_p4d_page)
+ free_page(new_p4d_page);
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
+
+ return p4d_offset(pgd, address);
+}
+
+/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a PMD on success, or NULL on failure.
+ */
+static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
+ pud_t *pud;
+
+ BUILD_BUG_ON(p4d_large(*p4d) != 0);
+ if (p4d_none(*p4d)) {
+ unsigned long new_pud_page = __get_free_page(gfp);
+ if (!new_pud_page)
+ return NULL;
+
+ if (p4d_none(*p4d)) {
+ set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
+ new_pud_page = 0;
+ }
+ if (new_pud_page)
+ free_page(new_pud_page);
+ }
+
+ pud = pud_offset(p4d, address);
+ /* The user page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+
+ if (pud_none(*pud)) {
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ new_pmd_page = 0;
+ }
+ if (new_pmd_page)
+ free_page(new_pmd_page);
+ }
+
+ return pmd_offset(pud, address);
+}
+
+static void __init
+pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
+{
+ unsigned long addr;
+
+ /*
+ * Clone the populated PMDs which cover start to end. These PMD areas
+ * can have holes.
+ */
+ for (addr = start; addr < end; addr += PMD_SIZE) {
+ pmd_t *pmd, *target_pmd;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+
+ pgd = pgd_offset_k(addr);
+ if (WARN_ON(pgd_none(*pgd)))
+ return;
+ p4d = p4d_offset(pgd, addr);
+ if (WARN_ON(p4d_none(*p4d)))
+ return;
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud))
+ continue;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ continue;
+
+ target_pmd = pti_user_pagetable_walk_pmd(addr);
+ if (WARN_ON(!target_pmd))
+ return;
+
+ /*
+ * Copy the PMD. That is, the kernelmode and usermode
+ * tables will share the last-level page tables of this
+ * address range
+ */
+ *target_pmd = pmd_clear_flags(*pmd, clear);
+ }
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)


2018-01-01 15:12:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 011/146] x86/mm/pti: Allocate a separate user PGD

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit d9e9a6418065bb376e5de8d93ce346939b9a37a6 upstream.

Kernel page table isolation requires to have two PGDs. One for the kernel,
which contains the full kernel mapping plus the user space mapping and one
for user space which contains the user space mappings and the minimal set
of kernel mappings which are required by the architecture to be able to
transition from and to user space.

Add the necessary preliminaries.

[ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ]

Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgalloc.h | 11 +++++++++++
arch/x86/kernel/head_64.S | 30 +++++++++++++++++++++++++++---
arch/x86/mm/pgtable.c | 5 +++--
arch/x86/platform/efi/efi_64.c | 5 ++++-
4 files changed, 45 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -30,6 +30,17 @@ static inline void paravirt_release_p4d(
*/
extern gfp_t __userpte_alloc_gfp;

+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Instead of one PGD, we acquire two PGDs. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER 1
+#else
+#define PGD_ALLOCATION_ORDER 0
+#endif
+
/*
* Allocate and free page tables.
*/
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -341,6 +341,27 @@ GLOBAL(early_recursion_flag)
.balign PAGE_SIZE; \
GLOBAL(name)

+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define PTI_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
+#define NEXT_PGD_PAGE(name) \
+ .balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define PTI_USER_PGD_FILL 0
+#endif
+
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = 0 ; \
@@ -350,13 +371,14 @@ GLOBAL(name)
.endr

__INITDATA
-NEXT_PAGE(early_top_pgt)
+NEXT_PGD_PAGE(early_top_pgt)
.fill 511,8,0
#ifdef CONFIG_X86_5LEVEL
.quad level4_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#else
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#endif
+ .fill PTI_USER_PGD_FILL,8,0

NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -364,13 +386,14 @@ NEXT_PAGE(early_dynamic_pgts)
.data

#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+ .fill PTI_USER_PGD_FILL,8,0

NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
@@ -381,8 +404,9 @@ NEXT_PAGE(level2_ident_pgt)
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#else
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.fill 512,8,0
+ .fill PTI_USER_PGD_FILL,8,0
#endif

#ifdef CONFIG_X86_5LEVEL
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -355,14 +355,15 @@ static inline void _pgd_free(pgd_t *pgd)
kmem_cache_free(pgd_cache, pgd);
}
#else
+
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
}

static inline void _pgd_free(pgd_t *pgd)
{
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */

--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -195,6 +195,9 @@ static pgd_t *efi_pgd;
* because we want to avoid inserting EFI region mappings (EFI_VA_END
* to EFI_VA_START) into the standard kernel page tables. Everything
* else can be shared, see efi_sync_low_kernel_mappings().
+ *
+ * We don't want the pgd on the pgd_list and cannot use pgd_alloc() for the
+ * allocation.
*/
int __init efi_alloc_page_tables(void)
{
@@ -207,7 +210,7 @@ int __init efi_alloc_page_tables(void)
return 0;

gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO;
- efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
+ efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);
if (!efi_pgd)
return -ENOMEM;



2018-01-01 14:39:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 001/146] tracing: Remove extra zeroing out of the ring buffer page

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.

The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.

Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6769,7 +6769,7 @@ tracing_buffers_splice_read(struct file
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;

#ifdef CONFIG_TRACER_MAX_TRACE
@@ -6823,14 +6823,6 @@ tracing_buffers_splice_read(struct file
break;
}

- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);

spd.pages[i] = page;


2018-01-01 20:28:35

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/146] 4.14.11-stable review

On 1 January 2018 at 20:06, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.14.11 release.
> There are 146 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.11-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

NOTE:
kselftests: ldt_gdt_64 fails on x86
This failure is due to old test code in our build testing.
https://bugs.linaro.org/show_bug.cgi?id=3564

Summary
------------------------------------------------------------------------

kernel: 4.14.11-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 0d5392650ad3462d318fb160670cb1453cb367a2
git describe: v4.14.10-147-g0d5392650ad3
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.10-147-g0d5392650ad3


No regressions (compared to build v4.14.10-117-ge63742198102)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 46, skip: 16
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 983, skip: 121
* ltp-timers-tests - pass: 12,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 45, skip: 17
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 41, skip: 20
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1037, skip: 66
* ltp-timers-tests - pass: 12,

x86_64
* boot - pass: 20,
* kselftest - fail: 1, pass: 57, skip: 20
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 61, skip: 1
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 9, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1005, skip: 116
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju <[email protected]>

2018-01-02 08:59:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/146] 4.14.11-stable review

On Tue, Jan 02, 2018 at 01:58:30AM +0530, Naresh Kamboju wrote:
> On 1 January 2018 at 20:06, Greg Kroah-Hartman
> <[email protected]> wrote:
> > This is the start of the stable review cycle for the 4.14.11 release.
> > There are 146 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.11-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.
>
> NOTE:
> kselftests: ldt_gdt_64 fails on x86
> This failure is due to old test code in our build testing.
> https://bugs.linaro.org/show_bug.cgi?id=3564

Thanks for testing 3 of these releases and letting me know.

greg k-h

2018-01-02 17:40:24

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/146] 4.14.11-stable review

On Mon, Jan 01, 2018 at 03:36:31PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.11 release.
> There are 146 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 126 pass: 126 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2018-01-02 22:34:48

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/146] 4.14.11-stable review

On 01/01/2018 07:36 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.11 release.
> There are 146 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.11-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2018-01-03 10:00:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/146] 4.14.11-stable review

On Tue, Jan 02, 2018 at 03:34:38PM -0700, Shuah Khan wrote:
> On 01/01/2018 07:36 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.11 release.
> > There are 146 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed Jan 3 14:00:12 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.11-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h