This is the start of the stable review cycle for the 4.4.133 release.
There are 92 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <[email protected]>
Linux 4.4.133-rc1
Tetsuo Handa <[email protected]>
x86/kexec: Avoid double free_page() upon do_kexec_load() failure
Tetsuo Handa <[email protected]>
hfsplus: stop workqueue when fill_super() failed
Johannes Berg <[email protected]>
cfg80211: limit wiphy names to 128 bytes
Geert Uytterhoeven <[email protected]>
gpio: rcar: Add Runtime PM handling for interrupts
John Stultz <[email protected]>
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Vinod Koul <[email protected]>
dmaengine: ensure dmaengine helpers check valid callback
Jens Remus <[email protected]>
scsi: zfcp: fix infinite iteration on ERP ready list
Alexander Potapenko <[email protected]>
scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
Jason Yan <[email protected]>
scsi: libsas: defer ata device eh commands to libata
Martin Schwidefsky <[email protected]>
s390: use expoline thunks in the BPF JIT
Martin Schwidefsky <[email protected]>
s390: extend expoline to BC instructions
Martin Schwidefsky <[email protected]>
s390: move spectre sysfs attribute code
Martin Schwidefsky <[email protected]>
s390/kernel: use expoline for indirect branches
Martin Schwidefsky <[email protected]>
s390/ftrace: use expoline for indirect branches
Martin Schwidefsky <[email protected]>
s390/lib: use expoline for indirect branches
Martin Schwidefsky <[email protected]>
s390: move expoline assembler macros to a header
Martin Schwidefsky <[email protected]>
s390: add assembler macros for CPU alternatives
Al Viro <[email protected]>
ext2: fix a block leak
Eric Dumazet <[email protected]>
tcp: purge write queue in tcp_connect_init()
Eric Dumazet <[email protected]>
sock_diag: fix use-after-free read in __sk_free
Willem de Bruijn <[email protected]>
packet: in packet_snd start writing at link layer allocation
Willem de Bruijn <[email protected]>
net: test tailroom before appending to linear skb
Liu Bo <[email protected]>
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
Anand Jain <[email protected]>
btrfs: fix crash when trying to resume balance without the resume flag
Filipe Manana <[email protected]>
Btrfs: fix xattr loss after power failure
Masami Hiramatsu <[email protected]>
ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
Masami Hiramatsu <[email protected]>
ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
Masami Hiramatsu <[email protected]>
ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
Dexuan Cui <[email protected]>
tick/broadcast: Use for_each_cpu() specially on UP kernels
Masami Hiramatsu <[email protected]>
ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
Ard Biesheuvel <[email protected]>
efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
Martin Schwidefsky <[email protected]>
s390: remove indirect branch from do_softirq_own_stack
Julian Wiedmann <[email protected]>
s390/qdio: don't release memory in qdio_setup_irq()
Hendrik Brueckner <[email protected]>
s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
Julian Wiedmann <[email protected]>
s390/qdio: fix access to uninitialized qdio_q fields
Pavel Tatashin <[email protected]>
mm: don't allow deferred pages with NEED_PER_CPU_KM
Nicholas Piggin <[email protected]>
powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
Janis Danisevskis <[email protected]>
procfs: fix pthread cross-thread naming if !PR_DUMPABLE
Mateusz Guzik <[email protected]>
proc read mm's {arg,env}_{start,end} with mmap semaphore taken.
Steven Rostedt (VMware) <[email protected]>
tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Srinivas Pandruvada <[email protected]>
cpufreq: intel_pstate: Enable HWP by default
Waiman Long <[email protected]>
signals: avoid unnecessary taking of sighand->siglock
Mel Gorman <[email protected]>
mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read
Mel Gorman <[email protected]>
mm: filemap: remove redundant code in do_read_cache_page
Johannes Weiner <[email protected]>
proc: meminfo: estimate available memory more conservatively
Vladimir Davydov <[email protected]>
vmscan: do not force-scan file lru if its absolute size is small
Benjamin Herrenschmidt <[email protected]>
powerpc: Don't preempt_disable() in show_cpuinfo()
Anders Roxell <[email protected]>
cpuidle: coupled: remove unused define cpuidle_coupled_lock
Stewart Smith <[email protected]>
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Stewart Smith <[email protected]>
powerpc/powernv: Remove OPALv2 firmware define and references
Stewart Smith <[email protected]>
powerpc/powernv: panic() on OPAL < V3
Andy Shevchenko <[email protected]>
spi: pxa2xx: Allow 64-bit DMA
Wenwen Wang <[email protected]>
ALSA: control: fix a redundant-copy issue
Hans de Goede <[email protected]>
ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
Federico Cuello <[email protected]>
ALSA: usb: mixer: volume quirk for CM102-A+/102S+
Shuah Khan (Samsung OSG) <[email protected]>
usbip: usbip_host: fix bad unlock balance during stub_probe()
Shuah Khan (Samsung OSG) <[email protected]>
usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
Shuah Khan (Samsung OSG) <[email protected]>
usbip: usbip_host: run rebind from exit when module is removed
Shuah Khan (Samsung OSG) <[email protected]>
usbip: usbip_host: delete device from busid_table after rebind
Shuah Khan <[email protected]>
usbip: usbip_host: refine probe and disconnect debug msgs to be useful
zhongjiang <[email protected]>
kernel/exit.c: avoid undefined behaviour when calling wait4()
Jiri Slaby <[email protected]>
futex: futex_wake_op, fix sign_extend32 sign bits
Michael Kerrisk (man-pages) <[email protected]>
pipe: cap initial pipe capacity according to pipe-max-size limit
James Chapman <[email protected]>
l2tp: revert "l2tp: fix missing print session offset info"
Greg Kroah-Hartman <[email protected]>
Revert "ARM: dts: imx6qdl-wandboard: Fix audio channel swap"
Vasily Averin <[email protected]>
lockd: lost rollback of set_grace_period() in lockd_down_net()
Antony Antony <[email protected]>
xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)
Jiri Slaby <[email protected]>
futex: Remove duplicated code and fix undefined behaviour
Mel Gorman <[email protected]>
futex: Remove unnecessary warning from get_futex_key
Suzuki K Poulose <[email protected]>
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
Ard Biesheuvel <[email protected]>
arm64: introduce mov_q macro to move a constant into a 64-bit register
Richard Guy Briggs <[email protected]>
audit: move calcs after alloc and check when logging set loginuid
Takashi Iwai <[email protected]>
ALSA: timer: Call notifier in the same spinlock
Xin Long <[email protected]>
sctp: delay the authentication for the duplicated cookie-echo chunk
Xin Long <[email protected]>
sctp: fix the issue that the cookie-ack with auth can't get processed
Yuchung Cheng <[email protected]>
tcp: ignore Fast Open on repair mode
Debabrata Banerjee <[email protected]>
bonding: do not allow rlb updates to invalid mac
Michael Chan <[email protected]>
tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent().
Xin Long <[email protected]>
sctp: use the old asoc when making the cookie-ack chunk in dupcook_d
Xin Long <[email protected]>
sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr
Heiner Kallweit <[email protected]>
r8169: fix powering up RTL8168h
Bjørn Mork <[email protected]>
qmi_wwan: do not steal interfaces from class drivers
Stefano Brivio <[email protected]>
openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found
Lance Richardson <[email protected]>
net: support compat 64-bit time in {s,g}etsockopt
Eric Dumazet <[email protected]>
net_sched: fq: take care of throttled flows before reuse
Moshe Shemesh <[email protected]>
net/mlx4_en: Verify coalescing parameters are in range
Rob Taglang <[email protected]>
net: ethernet: sun: niu set correct packet size in skb
Eric Dumazet <[email protected]>
llc: better deal with too small mtu
Andrey Ignatov <[email protected]>
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Eric Dumazet <[email protected]>
dccp: fix tasklet usage
Hangbin Liu <[email protected]>
bridge: check iface upper dev when setting master via ioctl
Ingo Molnar <[email protected]>
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
-------------
Diffstat:
Makefile | 4 +-
arch/alpha/include/asm/futex.h | 26 +--
arch/arc/include/asm/futex.h | 40 +----
arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 1 -
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/futex.h | 26 +--
arch/arm/kernel/traps.c | 5 +-
arch/arm/lib/getuser.S | 10 ++
arch/arm/probes/kprobes/opt-arm.c | 4 +-
arch/arm64/Kconfig | 14 ++
arch/arm64/include/asm/assembler.h | 60 +++++++
arch/arm64/include/asm/cputype.h | 11 ++
arch/arm64/include/asm/futex.h | 26 +--
arch/arm64/mm/proc.S | 5 +
arch/frv/include/asm/futex.h | 3 +-
arch/frv/kernel/futex.c | 27 +--
arch/hexagon/include/asm/futex.h | 38 +---
arch/ia64/include/asm/futex.h | 25 +--
arch/microblaze/include/asm/futex.h | 38 +---
arch/mips/include/asm/futex.h | 25 +--
arch/parisc/include/asm/futex.h | 25 +--
arch/powerpc/include/asm/firmware.h | 5 +-
arch/powerpc/include/asm/futex.h | 26 +--
arch/powerpc/kernel/setup-common.c | 11 --
arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/powernv/opal-nvram.c | 14 +-
arch/powerpc/platforms/powernv/opal-xscom.c | 2 +-
arch/powerpc/platforms/powernv/opal.c | 36 ++--
arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
arch/powerpc/platforms/powernv/setup.c | 12 +-
arch/powerpc/platforms/powernv/smp.c | 74 ++++----
arch/s390/include/asm/alternative-asm.h | 108 ++++++++++++
arch/s390/include/asm/futex.h | 23 +--
arch/s390/include/asm/nospec-insn.h | 193 +++++++++++++++++++++
arch/s390/kernel/Makefile | 1 +
arch/s390/kernel/asm-offsets.c | 1 +
arch/s390/kernel/base.S | 24 +--
arch/s390/kernel/entry.S | 105 +++--------
arch/s390/kernel/irq.c | 5 +-
arch/s390/kernel/mcount.S | 14 +-
arch/s390/kernel/nospec-branch.c | 43 +++--
arch/s390/kernel/nospec-sysfs.c | 21 +++
arch/s390/kernel/perf_cpum_sf.c | 4 +
arch/s390/kernel/reipl.S | 5 +-
arch/s390/kernel/swsusp.S | 10 +-
arch/s390/lib/mem.S | 9 +-
arch/s390/net/bpf_jit.S | 16 +-
arch/s390/net/bpf_jit_comp.c | 63 ++++++-
arch/sh/include/asm/futex.h | 26 +--
arch/sparc/include/asm/futex_64.h | 26 +--
arch/tile/include/asm/futex.h | 40 +----
arch/x86/boot/compressed/eboot.c | 6 +-
arch/x86/include/asm/futex.h | 40 +----
arch/x86/kernel/machine_kexec_32.c | 6 +-
arch/x86/kernel/machine_kexec_64.c | 4 +-
arch/x86/xen/mmu.c | 4 -
arch/xtensa/include/asm/futex.h | 27 +--
drivers/cpufreq/intel_pstate.c | 34 ++--
drivers/cpufreq/powernv-cpufreq.c | 2 +-
drivers/cpuidle/coupled.c | 1 -
drivers/cpuidle/cpuidle-powernv.c | 2 +-
drivers/gpio/gpio-rcar.c | 46 +++++
drivers/net/bonding/bond_alb.c | 2 +-
drivers/net/ethernet/broadcom/tg3.c | 9 +-
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 16 ++
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 7 +-
drivers/net/ethernet/realtek/8139too.c | 2 +-
drivers/net/ethernet/realtek/r8169.c | 3 +
drivers/net/ethernet/sun/niu.c | 5 +-
drivers/net/usb/qmi_wwan.c | 12 ++
drivers/s390/cio/qdio_setup.c | 12 +-
drivers/s390/scsi/zfcp_dbf.c | 23 ++-
drivers/s390/scsi/zfcp_ext.h | 5 +-
drivers/s390/scsi/zfcp_scsi.c | 14 +-
drivers/scsi/libsas/sas_scsi_host.c | 33 ++--
drivers/scsi/sg.c | 2 +-
drivers/spi/spi-pxa2xx.h | 2 +-
drivers/usb/usbip/stub.h | 2 +
drivers/usb/usbip/stub_dev.c | 43 +++--
drivers/usb/usbip/stub_main.c | 105 +++++++++--
fs/btrfs/ctree.c | 6 +-
fs/btrfs/tree-log.c | 7 +
fs/btrfs/volumes.c | 9 +
fs/ext2/inode.c | 10 --
fs/hfsplus/super.c | 1 +
fs/lockd/svc.c | 2 +
fs/pipe.c | 3 +
fs/proc/base.c | 55 +++++-
fs/proc/meminfo.c | 5 +-
include/asm-generic/futex.h | 50 +-----
include/linux/dmaengine.h | 20 ++-
include/linux/efi.h | 8 +-
include/linux/signal.h | 17 ++
include/linux/timekeeper_internal.h | 4 +-
include/trace/events/xen.h | 16 --
include/uapi/linux/nl80211.h | 2 +
kernel/auditsc.c | 7 +-
kernel/exit.c | 4 +
kernel/futex.c | 44 ++++-
kernel/signal.c | 7 +
kernel/time/tick-broadcast.c | 8 +
kernel/time/timekeeping.c | 20 +--
mm/Kconfig | 1 +
mm/filemap.c | 90 ++++++----
mm/util.c | 16 +-
mm/vmscan.c | 12 +-
net/bridge/br_if.c | 4 +-
net/compat.c | 6 +-
net/core/sock.c | 2 +-
net/dccp/ccids/ccid2.c | 14 +-
net/dccp/timer.c | 2 +-
net/ipv4/ip_output.c | 3 +-
net/ipv4/ping.c | 7 +-
net/ipv4/tcp.c | 2 +-
net/ipv4/tcp_output.c | 7 +-
net/ipv4/udp.c | 7 +-
net/ipv6/ip6_output.c | 3 +-
net/l2tp/l2tp_netlink.c | 2 -
net/llc/af_llc.c | 3 +
net/openvswitch/flow_netlink.c | 9 +-
net/packet/af_packet.c | 4 +-
net/sched/sch_fq.c | 37 ++--
net/sctp/associola.c | 30 +++-
net/sctp/inqueue.c | 2 +-
net/sctp/ipv6.c | 3 +
net/sctp/sm_statefuns.c | 89 +++++-----
net/wireless/core.c | 3 +
net/xfrm/xfrm_state.c | 1 +
sound/core/control_compat.c | 3 +-
sound/core/timer.c | 220 +++++++++++-------------
sound/pci/hda/hda_intel.c | 2 +
sound/usb/mixer.c | 8 +
133 files changed, 1605 insertions(+), 1109 deletions(-)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu <[email protected]>
[ Upstream commit e8238fc2bd7b4c3c7554fa2df067e796610212fc ]
When we set a bond slave's master to bridge via ioctl, we only check
the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
at netdev_master_upper_dev_link() later, it already does some settings
and allocates some resources. It would be better to return as early
as possible.
v1 -> v2:
use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
to check if we have a master, because not all upper devs are masters,
e.g. vlan device.
Reported-by: [email protected]
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_if.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -456,8 +456,8 @@ int br_add_if(struct net_bridge *br, str
if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit)
return -ELOOP;
- /* Device is already being bridged */
- if (br_port_exists(dev))
+ /* Device has master upper dev */
+ if (netdev_master_upper_dev_get(dev))
return -EBUSY;
/* No bridging devices that dislike that (e.g. wireless) */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin <[email protected]>
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream.
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.
If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.
These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.
Fixes: efda760fe95e ("lockd: fix lockd shutdown race")
Cc: [email protected]
Signed-off-by: Vasily Averin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/lockd/svc.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -271,6 +271,8 @@ static void lockd_down_net(struct svc_se
if (ln->nlmsvc_users) {
if (--ln->nlmsvc_users == 0) {
nlm_shutdown_hosts_net(net);
+ cancel_delayed_work_sync(&ln->grace_period_end);
+ locks_end_grace(&ln->lockd_manager);
svc_shutdown_net(serv, net);
dprintk("lockd_down_net: per-net data destroyed; net=%p\n", net);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antony Antony <[email protected]>
commit 75bf50f4aaa1c78d769d854ab3d975884909e4fb upstream.
copy geniv when cloning the xfrm state.
x->geniv was not copied to the new state and migration would fail.
xfrm_do_migrate
..
xfrm_state_clone()
..
..
esp_init_aead()
crypto_alloc_aead()
crypto_alloc_tfm()
crypto_find_alg() return EAGAIN and failed
Signed-off-by: Antony Antony <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/xfrm/xfrm_state.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1159,6 +1159,7 @@ static struct xfrm_state *xfrm_state_clo
if (orig->aead) {
x->aead = xfrm_algo_aead_clone(orig->aead);
+ x->geniv = orig->geniv;
if (!x->aead)
goto error;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Guy Briggs <[email protected]>
commit 76a658c20efd541a62838d9ff68ce94170d7a549 upstream.
Move the calculations of values after the allocation in case the
allocation fails. This avoids wasting effort in the rare case that it
fails, but more importantly saves us extra logic to release the tty
ref.
Signed-off-by: Richard Guy Briggs <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/auditsc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1981,14 +1981,15 @@ static void audit_log_set_loginuid(kuid_
if (!audit_enabled)
return;
+ ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_LOGIN);
+ if (!ab)
+ return;
+
uid = from_kuid(&init_user_ns, task_uid(current));
oldloginuid = from_kuid(&init_user_ns, koldloginuid);
loginuid = from_kuid(&init_user_ns, kloginuid),
tty = audit_get_tty(current);
- ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_LOGIN);
- if (!ab)
- return;
audit_log_format(ab, "pid=%d uid=%u", task_pid_nr(current), uid);
audit_log_task_context(ab);
audit_log_format(ab, " old-auid=%u auid=%u tty=%s old-ses=%u ses=%u res=%d",
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ignatov <[email protected]>
[ Upstream commit 1b97013bfb11d66f041de691de6f0fec748ce016 ]
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in 919483096bfe.
* udp_sendmsg one was there since the beginning when linux sources were
first added to git;
* ping_v4_sendmsg one was copy/pasted in c319b4d76b9e.
Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.
Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.
Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ping.c | 7 +++++--
net/ipv4/udp.c | 7 +++++--
2 files changed, 10 insertions(+), 4 deletions(-)
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -777,8 +777,10 @@ static int ping_v4_sendmsg(struct sock *
ipc.addr = faddr = daddr;
if (ipc.opt && ipc.opt->opt.srr) {
- if (!daddr)
- return -EINVAL;
+ if (!daddr) {
+ err = -EINVAL;
+ goto out_free;
+ }
faddr = ipc.opt->opt.faddr;
}
tos = get_rttos(&ipc, inet);
@@ -843,6 +845,7 @@ back_from_confirm:
out:
ip_rt_put(rt);
+out_free:
if (free)
kfree(ipc.opt);
if (!err) {
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -991,8 +991,10 @@ int udp_sendmsg(struct sock *sk, struct
ipc.addr = faddr = daddr;
if (ipc.opt && ipc.opt->opt.srr) {
- if (!daddr)
- return -EINVAL;
+ if (!daddr) {
+ err = -EINVAL;
+ goto out_free;
+ }
faddr = ipc.opt->opt.faddr;
connected = 0;
}
@@ -1105,6 +1107,7 @@ do_append_data:
out:
ip_rt_put(rt);
+out_free:
if (free)
kfree(ipc.opt);
if (!err)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <[email protected]>
[ Upstream commit 7df40c2673a1307c3260aab6f9d4b9bf97ca8fd7 ]
Normally, a socket can not be freed/reused unless all its TX packets
left qdisc and were TX-completed. However connect(AF_UNSPEC) allows
this to happen.
With commit fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for
reused flows") we cleared f->time_next_packet but took no special
action if the flow was still in the throttled rb-tree.
Since f->time_next_packet is the key used in the rb-tree searches,
blindly clearing it might break rb-tree integrity. We need to make
sure the flow is no longer in the rb-tree to avoid this problem.
Fixes: fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_fq.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -126,6 +126,28 @@ static bool fq_flow_is_detached(const st
return f->next == &detached;
}
+static bool fq_flow_is_throttled(const struct fq_flow *f)
+{
+ return f->next == &throttled;
+}
+
+static void fq_flow_add_tail(struct fq_flow_head *head, struct fq_flow *flow)
+{
+ if (head->first)
+ head->last->next = flow;
+ else
+ head->first = flow;
+ head->last = flow;
+ flow->next = NULL;
+}
+
+static void fq_flow_unset_throttled(struct fq_sched_data *q, struct fq_flow *f)
+{
+ rb_erase(&f->rate_node, &q->delayed);
+ q->throttled_flows--;
+ fq_flow_add_tail(&q->old_flows, f);
+}
+
static void fq_flow_set_throttled(struct fq_sched_data *q, struct fq_flow *f)
{
struct rb_node **p = &q->delayed.rb_node, *parent = NULL;
@@ -153,15 +175,6 @@ static void fq_flow_set_throttled(struct
static struct kmem_cache *fq_flow_cachep __read_mostly;
-static void fq_flow_add_tail(struct fq_flow_head *head, struct fq_flow *flow)
-{
- if (head->first)
- head->last->next = flow;
- else
- head->first = flow;
- head->last = flow;
- flow->next = NULL;
-}
/* limit number of collected flows per round */
#define FQ_GC_MAX 8
@@ -265,6 +278,8 @@ static struct fq_flow *fq_classify(struc
f->socket_hash != sk->sk_hash)) {
f->credit = q->initial_quantum;
f->socket_hash = sk->sk_hash;
+ if (fq_flow_is_throttled(f))
+ fq_flow_unset_throttled(q, f);
f->time_next_packet = 0ULL;
}
return f;
@@ -419,9 +434,7 @@ static void fq_check_throttled(struct fq
q->time_next_delayed_flow = f->time_next_packet;
break;
}
- rb_erase(p, &q->delayed);
- q->throttled_flows--;
- fq_flow_add_tail(&q->old_flows, f);
+ fq_flow_unset_throttled(q, f);
}
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suzuki K Poulose <[email protected]>
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream.
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer
from an erratum 1024718, which causes incorrect updates when DBM/AP
bits in a page table entry is modified without a break-before-make
sequence. The work around is to skip enabling the hardware DBM feature
on the affected cores. The hardware Access Flag management features
is not affected. There are some other cores suffering from this
errata, which could be added to the midr_list to trigger the work
around.
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Reviewed-by: Dave Martin <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/Kconfig | 14 ++++++++++++
arch/arm64/include/asm/assembler.h | 40 +++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/cputype.h | 11 ++++++++++
arch/arm64/mm/proc.S | 5 ++++
4 files changed, 70 insertions(+)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -375,6 +375,20 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config ARM64_ERRATUM_1024718
+ bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update"
+ default y
+ help
+ This option adds work around for Arm Cortex-A55 Erratum 1024718.
+
+ Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect
+ update of the hardware dirty bit when the DBM/AP bits are updated
+ without a break-before-make. The work around is to disable the usage
+ of hardware DBM locally on the affected cores. CPUs not affected by
+ erratum will continue to use the feature.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -23,6 +23,7 @@
#ifndef __ASM_ASSEMBLER_H
#define __ASM_ASSEMBLER_H
+#include <asm/cputype.h>
#include <asm/ptrace.h>
#include <asm/thread_info.h>
@@ -224,4 +225,43 @@ lr .req x30 // link register
movk \reg, :abs_g0_nc:\val
.endm
+/*
+ * Check the MIDR_EL1 of the current CPU for a given model and a range of
+ * variant/revision. See asm/cputype.h for the macros used below.
+ *
+ * model: MIDR_CPU_PART of CPU
+ * rv_min: Minimum of MIDR_CPU_VAR_REV()
+ * rv_max: Maximum of MIDR_CPU_VAR_REV()
+ * res: Result register.
+ * tmp1, tmp2, tmp3: Temporary registers
+ *
+ * Corrupts: res, tmp1, tmp2, tmp3
+ * Returns: 0, if the CPU id doesn't match. Non-zero otherwise
+ */
+ .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3
+ mrs \res, midr_el1
+ mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK)
+ mov_q \tmp2, MIDR_CPU_PART_MASK
+ and \tmp3, \res, \tmp2 // Extract model
+ and \tmp1, \res, \tmp1 // rev & variant
+ mov_q \tmp2, \model
+ cmp \tmp3, \tmp2
+ cset \res, eq
+ cbz \res, .Ldone\@ // Model matches ?
+
+ .if (\rv_min != 0) // Skip min check if rv_min == 0
+ mov_q \tmp3, \rv_min
+ cmp \tmp1, \tmp3
+ cset \res, ge
+ .endif // \rv_min != 0
+ /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */
+ .if ((\rv_min != \rv_max) || \rv_min == 0)
+ mov_q \tmp2, \rv_max
+ cmp \tmp1, \tmp2
+ cset \tmp2, le
+ and \res, \res, \tmp2
+ .endif
+.Ldone\@:
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -57,6 +57,14 @@
#define MIDR_IMPLEMENTOR(midr) \
(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_CPU_VAR_REV(var, rev) \
+ (((var) << MIDR_VARIANT_SHIFT) | (rev))
+
+#define MIDR_CPU_PART_MASK \
+ (MIDR_IMPLEMENTOR_MASK | \
+ MIDR_ARCHITECTURE_MASK | \
+ MIDR_PARTNUM_MASK)
+
#define MIDR_CPU_PART(imp, partnum) \
(((imp) << MIDR_IMPLEMENTOR_SHIFT) | \
(0xf << MIDR_ARCHITECTURE_SHIFT) | \
@@ -70,11 +78,14 @@
#define ARM_CPU_PART_FOUNDATION 0xD00
#define ARM_CPU_PART_CORTEX_A57 0xD07
#define ARM_CPU_PART_CORTEX_A53 0xD03
+#define ARM_CPU_PART_CORTEX_A55 0xD05
#define APM_CPU_PART_POTENZA 0x000
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
+#define MIDR_CORTEX_A55 MIDR_CPU_PART(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55)
+
#ifndef __ASSEMBLY__
/*
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -221,6 +221,11 @@ ENTRY(__cpu_setup)
cbz x9, 2f
cmp x9, #2
b.lt 1f
+#ifdef CONFIG_ARM64_ERRATUM_1024718
+ /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */
+ cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4
+ cbnz x1, 1f
+#endif
orr x10, x10, #TCR_HD // hardware Dirty flag update
1: orr x10, x10, #TCR_HA // hardware Access flag update
2:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long <[email protected]>
[ Upstream commit 46e16d4b956867013e0bbd7f2bad206f4aa55752 ]
When processing a duplicate cookie-echo chunk, for case 'D', sctp will
not process the param from this chunk. It means old asoc has nothing
to be updated, and the new temp asoc doesn't have the complete info.
So there's no reason to use the new asoc when creating the cookie-ack
chunk. Otherwise, like when auth is enabled for cookie-ack, the chunk
can not be set with auth, and it will definitely be dropped by peer.
This issue is there since very beginning, and we fix it by using the
old asoc instead.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/sm_statefuns.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1959,7 +1959,7 @@ static sctp_disposition_t sctp_sf_do_dup
}
}
- repl = sctp_make_cookie_ack(new_asoc, chunk);
+ repl = sctp_make_cookie_ack(asoc, chunk);
if (!repl)
goto nomem;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lance Richardson <[email protected]>
[ Upstream commit 988bf7243e03ef69238381594e0334a79cef74a6 ]
For the x32 ABI, struct timeval has two 64-bit fields. However
the kernel currently interprets the user-space values used for
the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair
of 32-bit fields.
When the seconds portion of the requested timeout is less than 2**32,
the seconds portion of the effective timeout is correct but the
microseconds portion is zero. When the seconds portion of the
requested timeout is zero and the microseconds portion is non-zero,
the kernel interprets the timeout as zero (never timeout).
Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required
for the ABI.
The code included below demonstrates the problem.
Results before patch:
$ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo
recv time: 2.008181 seconds
send time: 2.015985 seconds
$ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
recv time: 2.016763 seconds
send time: 2.016062 seconds
$ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
recv time: 1.007239 seconds
send time: 1.023890 seconds
Results after patch:
$ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo
recv time: 2.010062 seconds
send time: 2.015836 seconds
$ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
recv time: 2.013974 seconds
send time: 2.015981 seconds
$ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
recv time: 2.030257 seconds
send time: 2.013383 seconds
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/time.h>
void checkrc(char *str, int rc)
{
if (rc >= 0)
return;
perror(str);
exit(1);
}
static char buf[1024];
int main(int argc, char **argv)
{
int rc;
int socks[2];
struct timeval tv;
struct timeval start, end, delta;
rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
checkrc("socketpair", rc);
/* set timeout to 1.999999 seconds */
tv.tv_sec = 1;
tv.tv_usec = 999999;
rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv);
rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv);
checkrc("setsockopt", rc);
/* measure actual receive timeout */
gettimeofday(&start, NULL);
rc = recv(socks[0], buf, sizeof buf, 0);
gettimeofday(&end, NULL);
timersub(&end, &start, &delta);
printf("recv time: %ld.%06ld seconds\n",
(long)delta.tv_sec, (long)delta.tv_usec);
/* fill send buffer */
do {
rc = send(socks[0], buf, sizeof buf, 0);
} while (rc > 0);
/* measure actual send timeout */
gettimeofday(&start, NULL);
rc = send(socks[0], buf, sizeof buf, 0);
gettimeofday(&end, NULL);
timersub(&end, &start, &delta);
printf("send time: %ld.%06ld seconds\n",
(long)delta.tv_sec, (long)delta.tv_usec);
exit(0);
}
Fixes: 515c7af85ed9 ("x32: Use compat shims for {g,s}etsockopt")
Reported-by: Gopal RajagopalSai <[email protected]>
Signed-off-by: Lance Richardson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/compat.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/compat.c
+++ b/net/compat.c
@@ -358,7 +358,8 @@ static int compat_sock_setsockopt(struct
if (optname == SO_ATTACH_FILTER)
return do_set_attach_filter(sock, level, optname,
optval, optlen);
- if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)
+ if (!COMPAT_USE_64BIT_TIME &&
+ (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
return do_set_sock_timeout(sock, level, optname, optval, optlen);
return sock_setsockopt(sock, level, optname, optval, optlen);
@@ -423,7 +424,8 @@ static int do_get_sock_timeout(struct so
static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)
+ if (!COMPAT_USE_64BIT_TIME &&
+ (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
return do_get_sock_timeout(sock, level, optname, optval, optlen);
return sock_getsockopt(sock, level, optname, optval, optlen);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) <[email protected]>
commit c171654caa875919be3c533d3518da8be5be966e upstream.
stub_probe() calls put_busid_priv() in an error path when device isn't
found in the busid_table. Fix it by making put_busid_priv() safe to be
called with null struct bus_id_priv pointer.
This problem happens when "usbip bind" is run without loading usbip_host
driver and then running modprobe. The first failed bind attempt unbinds
the device from the original driver and when usbip_host is modprobed,
stub_probe() runs and doesn't find the device in its busid table and calls
put_busid_priv(0 with null bus_id_priv pointer.
usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip!
[ 367.359679] =====================================
[ 367.359681] WARNING: bad unlock balance detected!
[ 367.359683] 4.17.0-rc4+ #5 Not tainted
[ 367.359685] -------------------------------------
[ 367.359688] modprobe/2768 is trying to release lock (
[ 367.359689]
==================================================================
[ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110
[ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768
[ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/usbip/stub_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/usbip/stub_main.c
+++ b/drivers/usb/usbip/stub_main.c
@@ -96,7 +96,8 @@ struct bus_id_priv *get_busid_priv(const
void put_busid_priv(struct bus_id_priv *bid)
{
- spin_unlock(&bid->busid_lock);
+ if (bid)
+ spin_unlock(&bid->busid_lock);
}
static int add_match_busid(char *busid)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann <[email protected]>
commit 2e68adcd2fb21b7188ba449f0fab3bee2910e500 upstream.
Calling qdio_release_memory() on error is just plain wrong. It frees
the main qdio_irq struct, when following code still uses it.
Also, no other error path in qdio_establish() does this. So trust
callers to clean up via qdio_free() if some step of the QDIO
initialization fails.
Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.")
Cc: <[email protected]> #v2.6.27+
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/cio/qdio_setup.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/s390/cio/qdio_setup.c
+++ b/drivers/s390/cio/qdio_setup.c
@@ -456,7 +456,6 @@ int qdio_setup_irq(struct qdio_initializ
{
struct ciw *ciw;
struct qdio_irq *irq_ptr = init_data->cdev->private->qdio_data;
- int rc;
memset(&irq_ptr->qib, 0, sizeof(irq_ptr->qib));
memset(&irq_ptr->siga_flag, 0, sizeof(irq_ptr->siga_flag));
@@ -493,16 +492,14 @@ int qdio_setup_irq(struct qdio_initializ
ciw = ccw_device_get_ciw(init_data->cdev, CIW_TYPE_EQUEUE);
if (!ciw) {
DBF_ERROR("%4x NO EQ", irq_ptr->schid.sch_no);
- rc = -EINVAL;
- goto out_err;
+ return -EINVAL;
}
irq_ptr->equeue = *ciw;
ciw = ccw_device_get_ciw(init_data->cdev, CIW_TYPE_AQUEUE);
if (!ciw) {
DBF_ERROR("%4x NO AQ", irq_ptr->schid.sch_no);
- rc = -EINVAL;
- goto out_err;
+ return -EINVAL;
}
irq_ptr->aqueue = *ciw;
@@ -510,9 +507,6 @@ int qdio_setup_irq(struct qdio_initializ
irq_ptr->orig_handler = init_data->cdev->handler;
init_data->cdev->handler = qdio_int_handler;
return 0;
-out_err:
- qdio_release_memory(irq_ptr);
- return rc;
}
void qdio_print_subchannel_info(struct qdio_irq *irq_ptr,
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu <[email protected]>
commit 69af7e23a6870df2ea6fa79ca16493d59b3eebeb upstream.
Since get_kprobe_ctlblk() uses smp_processor_id() to access
per-cpu variable, it hits smp_processor_id sanity check as below.
[ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[ 7.007859] caller is debug_smp_processor_id+0x20/0x24
[ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1
[ 7.008890] Hardware name: Generic DT based system
[ 7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24)
[ 7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98)
[ 7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c)
[ 7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24)
[ 7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4)
[ 7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)
To fix this issue, call get_kprobe_ctlblk() right after
irq-disabled since that disables preemption.
Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/probes/kprobes/opt-arm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/probes/kprobes/opt-arm.c
+++ b/arch/arm/probes/kprobes/opt-arm.c
@@ -165,13 +165,14 @@ optimized_callback(struct optimized_kpro
{
unsigned long flags;
struct kprobe *p = &op->kp;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ struct kprobe_ctlblk *kcb;
/* Save skipped registers */
regs->ARM_pc = (unsigned long)op->kp.addr;
regs->ARM_ORIG_r0 = ~0UL;
local_irq_save(flags);
+ kcb = get_kprobe_ctlblk();
if (kprobe_running()) {
kprobes_inc_nmissed_count(&op->kp);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Tatashin <[email protected]>
commit ab1e8d8960b68f54af42b6484b5950bd13a4054b upstream.
It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.
My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.
Below is a more detailed explanation of the problem.
We initialize struct pages in four places:
1. Early in boot a small set of struct pages is initialized to fill the
first section, and lower zones.
2. During mm_init() we initialize "struct pages" for all the memory that
is allocated, i.e reserved in memblock.
3. Using on-demand logic when pages are allocated after mm_init call
(when memblock is finished)
4. After smp_init() when the rest free deferred pages are initialized.
The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2. Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
The following path exposes the problem:
start_kernel()
trap_init()
setup_cpu_entry_areas()
setup_cpu_entry_area(cpu)
get_cpu_gdt_paddr(cpu)
per_cpu_ptr_to_phys(addr)
pcpu_addr_to_page(addr)
virt_to_page(addr)
pfn_to_page(__pa(addr) >> PAGE_SHIFT)
We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.
The problems are discussed in these threads:
http://lkml.kernel.org/r/[email protected]
http://lkml.kernel.org/r/[email protected]
http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/Kconfig | 1 +
1 file changed, 1 insertion(+)
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -628,6 +628,7 @@ config DEFERRED_STRUCT_PAGE_INIT
default n
depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
depends on MEMORY_HOTPLUG
+ depends on !NEED_PER_CPU_KM
help
Ordinarily all struct pages are initialised during early boot in a
single thread. On very large machines this can take a considerable
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anders Roxell <[email protected]>
commit 75274b33e779ae40a750bcb4bd0b07c4dfef4746 upstream.
This was found with the -RT patch enabled, but the fix should apply to
non-RT also.
Used multi_v7_defconfig+PREEMPT_RT_FULL=y and this caused a compilation
warning without this fix:
../drivers/cpuidle/coupled.c:122:21: warning: 'cpuidle_coupled_lock'
defined but not used [-Wunused-variable]
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/cpuidle/coupled.c | 1 -
1 file changed, 1 deletion(-)
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -119,7 +119,6 @@ struct cpuidle_coupled {
#define CPUIDLE_COUPLED_NOT_IDLE (-1)
-static DEFINE_MUTEX(cpuidle_coupled_lock);
static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Potapenko <[email protected]>
commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream.
This shall help avoid copying uninitialized memory to the userspace when
calling ioctl(fd, SG_IO) with an empty command.
Reported-by: [email protected]
Cc: [email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/sg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1903,7 +1903,7 @@ retry:
num = (rem_sz > scatter_elem_sz_prev) ?
scatter_elem_sz_prev : rem_sz;
- schp->pages[k] = alloc_pages(gfp_mask, order);
+ schp->pages[k] = alloc_pages(gfp_mask | __GFP_ZERO, order);
if (!schp->pages[k])
goto out;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit 97489e0663fa700d6e7febddc43b58df98d7bcda ]
The return from the memmove, memset, memcpy, __memset16, __memset32 and
__memset64 functions are done with "br %r14". These are indirect branches
as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
Cc: [email protected] # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Reviewed-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/lib/mem.S | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/s390/lib/mem.S
+++ b/arch/s390/lib/mem.S
@@ -5,6 +5,9 @@
*/
#include <linux/linkage.h>
+#include <asm/nospec-insn.h>
+
+ GEN_BR_THUNK %r14
/*
* memset implementation
@@ -38,7 +41,7 @@ ENTRY(memset)
.Lmemset_clear_rest:
larl %r3,.Lmemset_xc
ex %r4,0(%r3)
- br %r14
+ BR_EX %r14
.Lmemset_fill:
stc %r3,0(%r2)
cghi %r4,1
@@ -55,7 +58,7 @@ ENTRY(memset)
.Lmemset_fill_rest:
larl %r3,.Lmemset_mvc
ex %r4,0(%r3)
- br %r14
+ BR_EX %r14
.Lmemset_xc:
xc 0(1,%r1),0(%r1)
.Lmemset_mvc:
@@ -77,7 +80,7 @@ ENTRY(memcpy)
.Lmemcpy_rest:
larl %r5,.Lmemcpy_mvc
ex %r4,0(%r5)
- br %r14
+ BR_EX %r14
.Lmemcpy_loop:
mvc 0(256,%r1),0(%r3)
la %r1,256(%r1)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit fba9eb7946251d6e420df3bdf7bc45195be7be9a ]
Add a header with macros usable in assembler files to emit alternative
code sequences. It works analog to the alternatives for inline assmeblies
in C files, with the same restrictions and capabilities.
The syntax is
ALTERNATIVE "<default instructions sequence>", \
"<alternative instructions sequence>", \
"<features-bit>"
and
ALTERNATIVE_2 "<default instructions sequence>", \
"<alternative instructions sqeuence #1>", \
"<feature-bit #1>",
"<alternative instructions sqeuence #2>", \
"<feature-bit #2>"
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/alternative-asm.h | 108 ++++++++++++++++++++++++++++++++
1 file changed, 108 insertions(+)
create mode 100644 arch/s390/include/asm/alternative-asm.h
--- /dev/null
+++ b/arch/s390/include/asm/alternative-asm.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_ALTERNATIVE_ASM_H
+#define _ASM_S390_ALTERNATIVE_ASM_H
+
+#ifdef __ASSEMBLY__
+
+/*
+ * Check the length of an instruction sequence. The length may not be larger
+ * than 254 bytes and it has to be divisible by 2.
+ */
+.macro alt_len_check start,end
+ .if ( \end - \start ) > 254
+ .error "cpu alternatives does not support instructions blocks > 254 bytes\n"
+ .endif
+ .if ( \end - \start ) % 2
+ .error "cpu alternatives instructions length is odd\n"
+ .endif
+.endm
+
+/*
+ * Issue one struct alt_instr descriptor entry (need to put it into
+ * the section .altinstructions, see below). This entry contains
+ * enough information for the alternatives patching code to patch an
+ * instruction. See apply_alternatives().
+ */
+.macro alt_entry orig_start, orig_end, alt_start, alt_end, feature
+ .long \orig_start - .
+ .long \alt_start - .
+ .word \feature
+ .byte \orig_end - \orig_start
+ .byte \alt_end - \alt_start
+.endm
+
+/*
+ * Fill up @bytes with nops. The macro emits 6-byte nop instructions
+ * for the bulk of the area, possibly followed by a 4-byte and/or
+ * a 2-byte nop if the size of the area is not divisible by 6.
+ */
+.macro alt_pad_fill bytes
+ .fill ( \bytes ) / 6, 6, 0xc0040000
+ .fill ( \bytes ) % 6 / 4, 4, 0x47000000
+ .fill ( \bytes ) % 6 % 4 / 2, 2, 0x0700
+.endm
+
+/*
+ * Fill up @bytes with nops. If the number of bytes is larger
+ * than 6, emit a jg instruction to branch over all nops, then
+ * fill an area of size (@bytes - 6) with nop instructions.
+ */
+.macro alt_pad bytes
+ .if ( \bytes > 0 )
+ .if ( \bytes > 6 )
+ jg . + \bytes
+ alt_pad_fill \bytes - 6
+ .else
+ alt_pad_fill \bytes
+ .endif
+ .endif
+.endm
+
+/*
+ * Define an alternative between two instructions. If @feature is
+ * present, early code in apply_alternatives() replaces @oldinstr with
+ * @newinstr. ".skip" directive takes care of proper instruction padding
+ * in case @newinstr is longer than @oldinstr.
+ */
+.macro ALTERNATIVE oldinstr, newinstr, feature
+ .pushsection .altinstr_replacement,"ax"
+770: \newinstr
+771: .popsection
+772: \oldinstr
+773: alt_len_check 770b, 771b
+ alt_len_check 772b, 773b
+ alt_pad ( ( 771b - 770b ) - ( 773b - 772b ) )
+774: .pushsection .altinstructions,"a"
+ alt_entry 772b, 774b, 770b, 771b, \feature
+ .popsection
+.endm
+
+/*
+ * Define an alternative between two instructions. If @feature is
+ * present, early code in apply_alternatives() replaces @oldinstr with
+ * @newinstr. ".skip" directive takes care of proper instruction padding
+ * in case @newinstr is longer than @oldinstr.
+ */
+.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
+ .pushsection .altinstr_replacement,"ax"
+770: \newinstr1
+771: \newinstr2
+772: .popsection
+773: \oldinstr
+774: alt_len_check 770b, 771b
+ alt_len_check 771b, 772b
+ alt_len_check 773b, 774b
+ .if ( 771b - 770b > 772b - 771b )
+ alt_pad ( ( 771b - 770b ) - ( 774b - 773b ) )
+ .else
+ alt_pad ( ( 772b - 771b ) - ( 774b - 773b ) )
+ .endif
+775: .pushsection .altinstructions,"a"
+ alt_entry 773b, 775b, 770b, 771b,\feature1
+ alt_entry 773b, 775b, 771b, 772b,\feature2
+ .popsection
+.endm
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_S390_ALTERNATIVE_ASM_H */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz <[email protected]>
commit 3d88d56c5873f6eebe23e05c3da701960146b801 upstream.
Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.
This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.
Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.
Signed-off-by: John Stultz <[email protected]>
Tested-by: Daniel Mentz <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Kevin Brodsky <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "stable #4 . 8+" <[email protected]>
Cc: Miroslav Lichvar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
[fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
logarithmic_accumulation local variable "interval". Dropped
casting of "interval" variable]
Signed-off-by: Fabrizio Castro <[email protected]>
Signed-off-by: Biju Das <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/timekeeper_internal.h | 4 ++--
kernel/time/timekeeping.c | 20 ++++++++++----------
2 files changed, 12 insertions(+), 12 deletions(-)
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -56,7 +56,7 @@ struct tk_read_base {
* interval.
* @xtime_remainder: Shifted nano seconds left over when rounding
* @cycle_interval
- * @raw_interval: Raw nano seconds accumulated per NTP interval.
+ * @raw_interval: Shifted raw nano seconds accumulated per NTP interval.
* @ntp_error: Difference between accumulated time and NTP time in ntp
* shifted nano seconds.
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and
@@ -97,7 +97,7 @@ struct timekeeper {
cycle_t cycle_interval;
u64 xtime_interval;
s64 xtime_remainder;
- u32 raw_interval;
+ u64 raw_interval;
/* The ntp_tick_length() value currently being used.
* This cached copy ensures we consistently apply the tick
* length for an entire tick, as ntp_tick_length may change
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -277,8 +277,7 @@ static void tk_setup_internals(struct ti
/* Go back from cycles -> shifted ns */
tk->xtime_interval = (u64) interval * clock->mult;
tk->xtime_remainder = ntpinterval - tk->xtime_interval;
- tk->raw_interval =
- ((u64) interval * clock->mult) >> clock->shift;
+ tk->raw_interval = interval * clock->mult;
/* if changing clocks, convert xtime_nsec shift units */
if (old_clock) {
@@ -1767,7 +1766,7 @@ static cycle_t logarithmic_accumulation(
unsigned int *clock_set)
{
cycle_t interval = tk->cycle_interval << shift;
- u64 raw_nsecs;
+ u64 snsec_per_sec;
/* If the offset is smaller than a shifted interval, do nothing */
if (offset < interval)
@@ -1782,14 +1781,15 @@ static cycle_t logarithmic_accumulation(
*clock_set |= accumulate_nsecs_to_secs(tk);
/* Accumulate raw time */
- raw_nsecs = (u64)tk->raw_interval << shift;
- raw_nsecs += tk->raw_time.tv_nsec;
- if (raw_nsecs >= NSEC_PER_SEC) {
- u64 raw_secs = raw_nsecs;
- raw_nsecs = do_div(raw_secs, NSEC_PER_SEC);
- tk->raw_time.tv_sec += raw_secs;
+ tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
+ tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
+ snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+ while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) {
+ tk->tkr_raw.xtime_nsec -= snsec_per_sec;
+ tk->raw_time.tv_sec++;
}
- tk->raw_time.tv_nsec = raw_nsecs;
+ tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
+ tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
/* Accumulate error between NTP and clock interval */
tk->ntp_error += tk->ntp_tick << shift;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner <[email protected]>
commit 84ad5802a33a4964a49b8f7d24d80a214a096b19 upstream.
The MemAvailable item in /proc/meminfo is to give users a hint of how
much memory is allocatable without causing swapping, so it excludes the
zones' low watermarks as unavailable to userspace.
However, for a userspace allocation, kswapd will actually reclaim until
the free pages hit a combination of the high watermark and the page
allocator's lowmem protection that keeps a certain amount of DMA and
DMA32 memory from userspace as well.
Subtract the full amount we know to be unavailable to userspace from the
number of free pages when calculating MemAvailable.
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/proc/meminfo.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -57,11 +57,8 @@ static int meminfo_proc_show(struct seq_
/*
* Estimate the amount of memory available for userspace allocations,
* without causing swapping.
- *
- * Free memory cannot be taken below the low watermark, before the
- * system starts swapping.
*/
- available = i.freeram - wmark_low;
+ available = i.freeram - totalreserve_pages;
/*
* Not all the page cache can be freed, otherwise the system will
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro <[email protected]>
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream.
open file, unlink it, then use ioctl(2) to make it immutable or
append only. Now close it and watch the blocks *not* freed...
Immutable/append-only checks belong in ->setattr().
Note: the bug is old and backport to anything prior to 737f2e93b972
("ext2: convert to use the new truncate convention") will need
these checks lifted into ext2_setattr().
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ext2/inode.c | 10 ----------
1 file changed, 10 deletions(-)
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1175,21 +1175,11 @@ do_indirects:
static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
{
- /*
- * XXX: it seems like a bug here that we don't allow
- * IS_APPEND inode to have blocks-past-i_size trimmed off.
- * review and fix this.
- *
- * Also would be nice to be able to handle IO errors and such,
- * but that's probably too much to ask.
- */
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode)))
return;
if (ext2_inode_is_fast_symlink(inode))
return;
- if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- return;
dax_sem_down_write(EXT2_I(inode));
__ext2_truncate_blocks(inode, offset);
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
Just one objection: Why does stable care about this (and the previous
patch)? I've checked the stable queue and I don't see anything that would
have these patches as a prerequisite. And on their own, they are only
cleanups without substantial gains.
Honza
>
> ------------------
>
> From: Mel Gorman <[email protected]>
>
> commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
>
> In the generic read paths the kernel looks up a page in the page cache
> and if it's up to date, it is used. If not, the page lock is acquired
> to wait for IO to complete and then check the page. If multiple
> processes are waiting on IO, they all serialise against the lock and
> duplicate the checks. This is unnecessary.
>
> The page lock in itself does not give any guarantees to the callers
> about the page state as it can be immediately truncated or reclaimed
> after the page is unlocked. It's sufficient to wait_on_page_locked and
> then continue if the page is up to date on wakeup.
>
> It is possible that a truncated but up-to-date page is returned but the
> reference taken during read prevents it disappearing underneath the
> caller and the data is still valid if PageUptodate.
>
> The overall impact is small as even if processes serialise on the lock,
> the lock section is tiny once the IO is complete. Profiles indicated
> that unlock_page and friends are generally a tiny portion of a
> read-intensive workload. An artificial test was created that had
> instances of dd access a cache-cold file on an ext4 filesystem and
> measure how long the read took.
>
> paralleldd
> 4.4.0 4.4.0
> vanilla avoidlock
> Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
>
> The impact is small but intuitively, it makes sense to avoid unnecessary
> calls to lock_page.
>
> Signed-off-by: Mel Gorman <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1581,6 +1581,15 @@ find_page:
> index, last_index - index);
> }
> if (!PageUptodate(page)) {
> + /*
> + * See comment in do_read_cache_page on why
> + * wait_on_page_locked is used to avoid unnecessarily
> + * serialisations and why it's safe.
> + */
> + wait_on_page_locked_killable(page);
> + if (PageUptodate(page))
> + goto page_ok;
> +
> if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
> !mapping->a_ops->is_partially_uptodate)
> goto page_not_up_to_date;
> @@ -2253,12 +2262,52 @@ filler:
> if (PageUptodate(page))
> goto out;
>
> + /*
> + * Page is not up to date and may be locked due one of the following
> + * case a: Page is being filled and the page lock is held
> + * case b: Read/write error clearing the page uptodate status
> + * case c: Truncation in progress (page locked)
> + * case d: Reclaim in progress
> + *
> + * Case a, the page will be up to date when the page is unlocked.
> + * There is no need to serialise on the page lock here as the page
> + * is pinned so the lock gives no additional protection. Even if the
> + * the page is truncated, the data is still valid if PageUptodate as
> + * it's a race vs truncate race.
> + * Case b, the page will not be up to date
> + * Case c, the page may be truncated but in itself, the data may still
> + * be valid after IO completes as it's a read vs truncate race. The
> + * operation must restart if the page is not uptodate on unlock but
> + * otherwise serialising on page lock to stabilise the mapping gives
> + * no additional guarantees to the caller as the page lock is
> + * released before return.
> + * Case d, similar to truncation. If reclaim holds the page lock, it
> + * will be a race with remove_mapping that determines if the mapping
> + * is valid on unlock but otherwise the data is valid and there is
> + * no need to serialise with page lock.
> + *
> + * As the page lock gives no additional guarantee, we optimistically
> + * wait on the page to be unlocked and check if it's up to date and
> + * use the page if it is. Otherwise, the page lock is required to
> + * distinguish between the different cases. The motivation is that we
> + * avoid spurious serialisations and wakeups when multiple processes
> + * wait on the same page for IO to complete.
> + */
> + wait_on_page_locked(page);
> + if (PageUptodate(page))
> + goto out;
> +
> + /* Distinguish between all the cases under the safety of the lock */
> lock_page(page);
> +
> + /* Case c or d, restart the operation */
> if (!page->mapping) {
> unlock_page(page);
> page_cache_release(page);
> goto repeat;
> }
> +
> + /* Someone else locked and filled the page in a very small window */
> if (PageUptodate(page)) {
> unlock_page(page);
> goto out;
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch. If anyone has any objections, please let me know.
>
> Just one objection: Why does stable care about this (and the previous
> patch)? I've checked the stable queue and I don't see anything that would
> have these patches as a prerequisite. And on their own, they are only
> cleanups without substantial gains.
There's a small gain here:
> > paralleldd
> > 4.4.0 4.4.0
> > vanilla avoidlock
> > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> >
> > The impact is small but intuitively, it makes sense to avoid unnecessary
> > calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be
merged into stable", so obviously it matters to someone :)
thanks,
greg k-h
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
<[email protected]>
wrote:
> On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch. If anyone has any objections, please let me
know.
> >
> > Just one objection: Why does stable care about this (and the previous
> > patch)? I've checked the stable queue and I don't see anything that
would
> > have these patches as a prerequisite. And on their own, they are only
> > cleanups without substantial gains.
> There's a small gain here:
> > > paralleldd
> > > 4.4.0 4.4.0
> > > vanilla avoidlock
> > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> > >
> > > The impact is small but intuitively, it makes sense to avoid
unnecessary
> > > calls to lock_page.
> Yes, it's small, but it's marked in the SLES kernel as "needs to be
> merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they
made applying later patches easier, and didn't take the trouble he did to
find that's not so.
I've no wish to be disputatious, but it does seem that the definition of
"stable" has changed, and not necessarily for the better, if it's now a
home for small gains: I thought we left those to upstream.
Hugh
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
> Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
> <[email protected]>
> wrote:
>
> > On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > > 4.4-stable review patch. If anyone has any objections, please let me
> know.
> > >
> > > Just one objection: Why does stable care about this (and the previous
> > > patch)? I've checked the stable queue and I don't see anything that
> would
> > > have these patches as a prerequisite. And on their own, they are only
> > > cleanups without substantial gains.
>
> > There's a small gain here:
>
> > > > paralleldd
> > > > 4.4.0 4.4.0
> > > > vanilla avoidlock
> > > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> > > >
> > > > The impact is small but intuitively, it makes sense to avoid
> unnecessary
> > > > calls to lock_page.
>
> > Yes, it's small, but it's marked in the SLES kernel as "needs to be
> > merged into stable", so obviously it matters to someone :)
>
> Hmm. I had the same reaction to these two as Jan, but assumed that they
> made applying later patches easier, and didn't take the trouble he did to
> find that's not so.
>
> I've no wish to be disputatious, but it does seem that the definition of
> "stable" has changed, and not necessarily for the better, if it's now a
> home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section
that says "this should be pushed to stable". So if it's good enough for
the SLES kernel, why isn't it good enough for all users of this kernel
tree?
If you all think it should be dropped in both places, that's fine with
me :)
thanks,
greg k-h
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul <[email protected]>
commit 757d12e5849be549076901b0d33c60d5f360269c upstream.
dmaengine has various device callbacks and exposes helper
functions to invoke these. These helpers should check if channel,
device and callback is valid or not before invoking them.
Reported-by: Jon Hunter <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Fabrizio Castro <[email protected]>
Signed-off-by: Jianming Qiao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/dmaengine.h | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -767,6 +767,9 @@ static inline struct dma_async_tx_descri
sg_dma_address(&sg) = buf;
sg_dma_len(&sg) = len;
+ if (!chan || !chan->device || !chan->device->device_prep_slave_sg)
+ return NULL;
+
return chan->device->device_prep_slave_sg(chan, &sg, 1,
dir, flags, NULL);
}
@@ -775,6 +778,9 @@ static inline struct dma_async_tx_descri
struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
enum dma_transfer_direction dir, unsigned long flags)
{
+ if (!chan || !chan->device || !chan->device->device_prep_slave_sg)
+ return NULL;
+
return chan->device->device_prep_slave_sg(chan, sgl, sg_len,
dir, flags, NULL);
}
@@ -786,6 +792,9 @@ static inline struct dma_async_tx_descri
enum dma_transfer_direction dir, unsigned long flags,
struct rio_dma_ext *rio_ext)
{
+ if (!chan || !chan->device || !chan->device->device_prep_slave_sg)
+ return NULL;
+
return chan->device->device_prep_slave_sg(chan, sgl, sg_len,
dir, flags, rio_ext);
}
@@ -796,6 +805,9 @@ static inline struct dma_async_tx_descri
size_t period_len, enum dma_transfer_direction dir,
unsigned long flags)
{
+ if (!chan || !chan->device || !chan->device->device_prep_dma_cyclic)
+ return NULL;
+
return chan->device->device_prep_dma_cyclic(chan, buf_addr, buf_len,
period_len, dir, flags);
}
@@ -804,6 +816,9 @@ static inline struct dma_async_tx_descri
struct dma_chan *chan, struct dma_interleaved_template *xt,
unsigned long flags)
{
+ if (!chan || !chan->device || !chan->device->device_prep_interleaved_dma)
+ return NULL;
+
return chan->device->device_prep_interleaved_dma(chan, xt, flags);
}
@@ -811,7 +826,7 @@ static inline struct dma_async_tx_descri
struct dma_chan *chan, dma_addr_t dest, int value, size_t len,
unsigned long flags)
{
- if (!chan || !chan->device)
+ if (!chan || !chan->device || !chan->device->device_prep_dma_memset)
return NULL;
return chan->device->device_prep_dma_memset(chan, dest, value,
@@ -824,6 +839,9 @@ static inline struct dma_async_tx_descri
struct scatterlist *src_sg, unsigned int src_nents,
unsigned long flags)
{
+ if (!chan || !chan->device || !chan->device->device_prep_dma_sg)
+ return NULL;
+
return chan->device->device_prep_dma_sg(chan, dst_sg, dst_nents,
src_sg, src_nents, flags);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Remus <[email protected]>
commit fa89adba1941e4f3b213399b81732a5c12fd9131 upstream.
zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
processed asynchronously and concurrently.
Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock
and then iterates with list_for_each() over the adapter's ERP ready list
without holding the ERP lock. This opens a race window in which the
current list entry can be moved to another list, causing list_for_each()
to iterate forever on the wrong list, as the erp_ready_head is never
encountered as terminal condition.
Meanwhile the ERP action can be processed in the ERP thread by
zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
lock and then calls zfcp_erp_action_to_running() to move the ERP action
from the ready to the running list. zfcp_erp_action_to_running() can
move the ERP action using list_move() just during the aforementioned
race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
held by the infinitely looping kworker, it effectively spins forever.
Example Sequence Diagram:
Process ERP Thread rport_work
------------------- ------------------- -------------------
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
zfcp_scsi_schedule_rports_block()
lock ERP zfcp_scsi_rport_work()
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
list_add_tail() on ready !(rport_task==RPORT_ADD)
wake_up() ERP thread zfcp_scsi_rport_block()
zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig()
unlock ERP lock DBF REC
zfcp_erp_wait() lock ERP
| zfcp_erp_action_to_running()
| list_for_each() ready
| list_move() current entry
| ready to running
| zfcp_dbf_rec_run() endless loop over running
| zfcp_dbf_rec_run_lvl()
| lock DBF REC spins forever
Any adapter recovery can trigger this, such as setting the device offline
or reboot.
V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport
during rport gone") introduced additional tracing of (un)blocking of
rports. It missed that the adapter->erp_lock must be held when calling
zfcp_dbf_rec_trig().
This fix uses the approach formerly introduced by commit aa0fec62391c
("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.").
Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
acquires and releases the adapter->erp_lock for read.
Reported-by: Sebastian Ott <[email protected]>
Signed-off-by: Jens Remus <[email protected]>
Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone")
Cc: <[email protected]> # 2.6.32+
Reviewed-by: Benjamin Block <[email protected]>
Signed-off-by: Steffen Maier <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/scsi/zfcp_dbf.c | 23 ++++++++++++++++++++++-
drivers/s390/scsi/zfcp_ext.h | 5 ++++-
drivers/s390/scsi/zfcp_scsi.c | 14 +++++++-------
3 files changed, 33 insertions(+), 9 deletions(-)
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -3,7 +3,7 @@
*
* Debug traces for zfcp.
*
- * Copyright IBM Corp. 2002, 2017
+ * Copyright IBM Corp. 2002, 2018
*/
#define KMSG_COMPONENT "zfcp"
@@ -287,6 +287,27 @@ void zfcp_dbf_rec_trig(char *tag, struct
spin_unlock_irqrestore(&dbf->rec_lock, flags);
}
+/**
+ * zfcp_dbf_rec_trig_lock - trace event related to triggered recovery with lock
+ * @tag: identifier for event
+ * @adapter: adapter on which the erp_action should run
+ * @port: remote port involved in the erp_action
+ * @sdev: scsi device involved in the erp_action
+ * @want: wanted erp_action
+ * @need: required erp_action
+ *
+ * The adapter->erp_lock must not be held.
+ */
+void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter,
+ struct zfcp_port *port, struct scsi_device *sdev,
+ u8 want, u8 need)
+{
+ unsigned long flags;
+
+ read_lock_irqsave(&adapter->erp_lock, flags);
+ zfcp_dbf_rec_trig(tag, adapter, port, sdev, want, need);
+ read_unlock_irqrestore(&adapter->erp_lock, flags);
+}
/**
* zfcp_dbf_rec_run_lvl - trace event related to running recovery
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -3,7 +3,7 @@
*
* External function declarations.
*
- * Copyright IBM Corp. 2002, 2016
+ * Copyright IBM Corp. 2002, 2018
*/
#ifndef ZFCP_EXT_H
@@ -34,6 +34,9 @@ extern int zfcp_dbf_adapter_register(str
extern void zfcp_dbf_adapter_unregister(struct zfcp_adapter *);
extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *,
struct zfcp_port *, struct scsi_device *, u8, u8);
+extern void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter,
+ struct zfcp_port *port,
+ struct scsi_device *sdev, u8 want, u8 need);
extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *);
extern void zfcp_dbf_rec_run_lvl(int level, char *tag,
struct zfcp_erp_action *erp);
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -3,7 +3,7 @@
*
* Interface to Linux SCSI midlayer.
*
- * Copyright IBM Corp. 2002, 2017
+ * Copyright IBM Corp. 2002, 2018
*/
#define KMSG_COMPONENT "zfcp"
@@ -616,9 +616,9 @@ static void zfcp_scsi_rport_register(str
ids.port_id = port->d_id;
ids.roles = FC_RPORT_ROLE_FCP_TARGET;
- zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL,
- ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD,
- ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD);
+ zfcp_dbf_rec_trig_lock("scpaddy", port->adapter, port, NULL,
+ ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD,
+ ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD);
rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids);
if (!rport) {
dev_err(&port->adapter->ccw_device->dev,
@@ -640,9 +640,9 @@ static void zfcp_scsi_rport_block(struct
struct fc_rport *rport = port->rport;
if (rport) {
- zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL,
- ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL,
- ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL);
+ zfcp_dbf_rec_trig_lock("scpdely", port->adapter, port, NULL,
+ ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL,
+ ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL);
fc_remote_port_delete(rport);
port->rport = NULL;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit 4253b0e0627ee3461e64c2495c616f1c8f6b127b ]
The nospec-branch.c file is compiled without the gcc options to
generate expoline thunks. The return branch of the sysfs show
functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect
branch as well. These need to be compiled with expolines.
Move the sysfs functions for spectre reporting to a separate file
and loose an '.' for one of the messages.
Cc: [email protected] # 4.16
Fixes: d424986f1d ("s390: add sysfs attributes for spectre")
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/Makefile | 1 +
arch/s390/kernel/nospec-branch.c | 18 ------------------
arch/s390/kernel/nospec-sysfs.c | 21 +++++++++++++++++++++
3 files changed, 22 insertions(+), 18 deletions(-)
create mode 100644 arch/s390/kernel/nospec-sysfs.c
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -49,6 +49,7 @@ obj-y += nospec-branch.o
extra-y += head.o head64.o vmlinux.lds
+obj-$(CONFIG_SYSFS) += nospec-sysfs.o
CFLAGS_REMOVE_nospec-branch.o += $(CC_FLAGS_EXPOLINE)
obj-$(CONFIG_MODULES) += s390_ksyms.o module.o
--- a/arch/s390/kernel/nospec-branch.c
+++ b/arch/s390/kernel/nospec-branch.c
@@ -44,24 +44,6 @@ static int __init nospec_report(void)
}
arch_initcall(nospec_report);
-#ifdef CONFIG_SYSFS
-ssize_t cpu_show_spectre_v1(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sprintf(buf, "Mitigation: __user pointer sanitization\n");
-}
-
-ssize_t cpu_show_spectre_v2(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable)
- return sprintf(buf, "Mitigation: execute trampolines\n");
- if (__test_facility(82, S390_lowcore.alt_stfle_fac_list))
- return sprintf(buf, "Mitigation: limited branch prediction.\n");
- return sprintf(buf, "Vulnerable\n");
-}
-#endif
-
#ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
--- /dev/null
+++ b/arch/s390/kernel/nospec-sysfs.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/device.h>
+#include <linux/cpu.h>
+#include <asm/facility.h>
+#include <asm/nospec-branch.h>
+
+ssize_t cpu_show_spectre_v1(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "Mitigation: __user pointer sanitization\n");
+}
+
+ssize_t cpu_show_spectre_v2(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable)
+ return sprintf(buf, "Mitigation: execute trampolines\n");
+ if (__test_facility(82, S390_lowcore.alt_stfle_fac_list))
+ return sprintf(buf, "Mitigation: limited branch prediction\n");
+ return sprintf(buf, "Vulnerable\n");
+}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit de5cb6eb514ebe241e3edeb290cb41deb380b81d ]
The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx
assembler stubs and the indirect branches generated by the JIT itself
need to be converted to expolines.
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/net/bpf_jit.S | 16 ++++++----
arch/s390/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++++++++--
2 files changed, 71 insertions(+), 8 deletions(-)
--- a/arch/s390/net/bpf_jit.S
+++ b/arch/s390/net/bpf_jit.S
@@ -8,6 +8,7 @@
*/
#include <linux/linkage.h>
+#include <asm/nospec-insn.h>
#include "bpf_jit.h"
/*
@@ -53,7 +54,7 @@ ENTRY(sk_load_##NAME##_pos); \
clg %r3,STK_OFF_HLEN(%r15); /* Offset + SIZE > hlen? */ \
jh sk_load_##NAME##_slow; \
LOAD %r14,-SIZE(%r3,%r12); /* Get data from skb */ \
- b OFF_OK(%r6); /* Return */ \
+ B_EX OFF_OK,%r6; /* Return */ \
\
sk_load_##NAME##_slow:; \
lgr %r2,%r7; /* Arg1 = skb pointer */ \
@@ -63,11 +64,14 @@ sk_load_##NAME##_slow:; \
brasl %r14,skb_copy_bits; /* Get data from skb */ \
LOAD %r14,STK_OFF_TMP(%r15); /* Load from temp bufffer */ \
ltgr %r2,%r2; /* Set cc to (%r2 != 0) */ \
- br %r6; /* Return */
+ BR_EX %r6; /* Return */
sk_load_common(word, 4, llgf) /* r14 = *(u32 *) (skb->data+offset) */
sk_load_common(half, 2, llgh) /* r14 = *(u16 *) (skb->data+offset) */
+ GEN_BR_THUNK %r6
+ GEN_B_THUNK OFF_OK,%r6
+
/*
* Load 1 byte from SKB (optimized version)
*/
@@ -79,7 +83,7 @@ ENTRY(sk_load_byte_pos)
clg %r3,STK_OFF_HLEN(%r15) # Offset >= hlen?
jnl sk_load_byte_slow
llgc %r14,0(%r3,%r12) # Get byte from skb
- b OFF_OK(%r6) # Return OK
+ B_EX OFF_OK,%r6 # Return OK
sk_load_byte_slow:
lgr %r2,%r7 # Arg1 = skb pointer
@@ -89,7 +93,7 @@ sk_load_byte_slow:
brasl %r14,skb_copy_bits # Get data from skb
llgc %r14,STK_OFF_TMP(%r15) # Load result from temp buffer
ltgr %r2,%r2 # Set cc to (%r2 != 0)
- br %r6 # Return cc
+ BR_EX %r6 # Return cc
#define sk_negative_common(NAME, SIZE, LOAD) \
sk_load_##NAME##_slow_neg:; \
@@ -103,7 +107,7 @@ sk_load_##NAME##_slow_neg:; \
jz bpf_error; \
LOAD %r14,0(%r2); /* Get data from pointer */ \
xr %r3,%r3; /* Set cc to zero */ \
- br %r6; /* Return cc */
+ BR_EX %r6; /* Return cc */
sk_negative_common(word, 4, llgf)
sk_negative_common(half, 2, llgh)
@@ -112,4 +116,4 @@ sk_negative_common(byte, 1, llgc)
bpf_error:
# force a return 0 from jit handler
ltgr %r15,%r15 # Set condition code
- br %r6
+ BR_EX %r6
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -24,6 +24,8 @@
#include <linux/bpf.h>
#include <asm/cacheflush.h>
#include <asm/dis.h>
+#include <asm/facility.h>
+#include <asm/nospec-branch.h>
#include "bpf_jit.h"
int bpf_jit_enable __read_mostly;
@@ -41,6 +43,8 @@ struct bpf_jit {
int base_ip; /* Base address for literal pool */
int ret0_ip; /* Address of return 0 */
int exit_ip; /* Address of exit */
+ int r1_thunk_ip; /* Address of expoline thunk for 'br %r1' */
+ int r14_thunk_ip; /* Address of expoline thunk for 'br %r14' */
int tail_call_start; /* Tail call start offset */
int labels[1]; /* Labels for local jumps */
};
@@ -248,6 +252,19 @@ static inline void reg_set_seen(struct b
REG_SET_SEEN(b2); \
})
+#define EMIT6_PCREL_RILB(op, b, target) \
+({ \
+ int rel = (target - jit->prg) / 2; \
+ _EMIT6(op | reg_high(b) << 16 | rel >> 16, rel & 0xffff); \
+ REG_SET_SEEN(b); \
+})
+
+#define EMIT6_PCREL_RIL(op, target) \
+({ \
+ int rel = (target - jit->prg) / 2; \
+ _EMIT6(op | rel >> 16, rel & 0xffff); \
+})
+
#define _EMIT6_IMM(op, imm) \
({ \
unsigned int __imm = (imm); \
@@ -475,8 +492,45 @@ static void bpf_jit_epilogue(struct bpf_
EMIT4(0xb9040000, REG_2, BPF_REG_0);
/* Restore registers */
save_restore_regs(jit, REGS_RESTORE);
+ if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) {
+ jit->r14_thunk_ip = jit->prg;
+ /* Generate __s390_indirect_jump_r14 thunk */
+ if (test_facility(35)) {
+ /* exrl %r0,.+10 */
+ EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10);
+ } else {
+ /* larl %r1,.+14 */
+ EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14);
+ /* ex 0,0(%r1) */
+ EMIT4_DISP(0x44000000, REG_0, REG_1, 0);
+ }
+ /* j . */
+ EMIT4_PCREL(0xa7f40000, 0);
+ }
/* br %r14 */
_EMIT2(0x07fe);
+
+ if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable &&
+ (jit->seen & SEEN_FUNC)) {
+ jit->r1_thunk_ip = jit->prg;
+ /* Generate __s390_indirect_jump_r1 thunk */
+ if (test_facility(35)) {
+ /* exrl %r0,.+10 */
+ EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10);
+ /* j . */
+ EMIT4_PCREL(0xa7f40000, 0);
+ /* br %r1 */
+ _EMIT2(0x07f1);
+ } else {
+ /* larl %r1,.+14 */
+ EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14);
+ /* ex 0,S390_lowcore.br_r1_tampoline */
+ EMIT4_DISP(0x44000000, REG_0, REG_0,
+ offsetof(struct _lowcore, br_r1_trampoline));
+ /* j . */
+ EMIT4_PCREL(0xa7f40000, 0);
+ }
+ }
}
/*
@@ -980,8 +1034,13 @@ static noinline int bpf_jit_insn(struct
/* lg %w1,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_W1, REG_0, REG_L,
EMIT_CONST_U64(func));
- /* basr %r14,%w1 */
- EMIT2(0x0d00, REG_14, REG_W1);
+ if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) {
+ /* brasl %r14,__s390_indirect_jump_r1 */
+ EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip);
+ } else {
+ /* basr %r14,%w1 */
+ EMIT2(0x0d00, REG_14, REG_W1);
+ }
/* lgr %b0,%r2: load return value into %b0 */
EMIT4(0xb9040000, BPF_REG_0, REG_2);
if (bpf_helper_changes_skb_data((void *)func)) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Yan <[email protected]>
commit 318aaf34f1179b39fa9c30fa0f3288b645beee39 upstream.
When ata device doing EH, some commands still attached with tasks are
not passed to libata when abort failed or recover failed, so libata did
not handle these commands. After these commands done, sas task is freed,
but ata qc is not freed. This will cause ata qc leak and trigger a
warning like below:
WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
......
Call trace:
[<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
[<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
[<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
[<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
[<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
[<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
[<ffff0000080ebd70>] process_one_work+0x144/0x390
[<ffff0000080ec100>] worker_thread+0x144/0x418
[<ffff0000080f2c98>] kthread+0x10c/0x138
[<ffff0000080855dc>] ret_from_fork+0x10/0x18
If ata qc leaked too many, ata tag allocation will fail and io blocked
for ever.
As suggested by Dan Williams, defer ata device commands to libata and
merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
ata qcs correctly after this.
Signed-off-by: Jason Yan <[email protected]>
CC: Xiaofei Tan <[email protected]>
CC: John Garry <[email protected]>
CC: Dan Williams <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/libsas/sas_scsi_host.c | 33 +++++++++++++--------------------
1 file changed, 13 insertions(+), 20 deletions(-)
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -222,6 +222,7 @@ out_done:
static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
{
struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
+ struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_task *task = TO_SAS_TASK(cmd);
/* At this point, we only get called following an actual abort
@@ -230,6 +231,14 @@ static void sas_eh_finish_cmd(struct scs
*/
sas_end_task(cmd, task);
+ if (dev_is_sata(dev)) {
+ /* defer commands to libata so that libata EH can
+ * handle ata qcs correctly
+ */
+ list_move_tail(&cmd->eh_entry, &sas_ha->eh_ata_q);
+ return;
+ }
+
/* now finish the command and move it on to the error
* handler done list, this also takes it off the
* error handler pending list.
@@ -237,22 +246,6 @@ static void sas_eh_finish_cmd(struct scs
scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q);
}
-static void sas_eh_defer_cmd(struct scsi_cmnd *cmd)
-{
- struct domain_device *dev = cmd_to_domain_dev(cmd);
- struct sas_ha_struct *ha = dev->port->ha;
- struct sas_task *task = TO_SAS_TASK(cmd);
-
- if (!dev_is_sata(dev)) {
- sas_eh_finish_cmd(cmd);
- return;
- }
-
- /* report the timeout to libata */
- sas_end_task(cmd, task);
- list_move_tail(&cmd->eh_entry, &ha->eh_ata_q);
-}
-
static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd *my_cmd)
{
struct scsi_cmnd *cmd, *n;
@@ -260,7 +253,7 @@ static void sas_scsi_clear_queue_lu(stru
list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
if (cmd->device->sdev_target == my_cmd->device->sdev_target &&
cmd->device->lun == my_cmd->device->lun)
- sas_eh_defer_cmd(cmd);
+ sas_eh_finish_cmd(cmd);
}
}
@@ -622,12 +615,12 @@ static void sas_eh_handle_sas_errors(str
case TASK_IS_DONE:
SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
task);
- sas_eh_defer_cmd(cmd);
+ sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_ABORTED:
SAS_DPRINTK("%s: task 0x%p is aborted\n",
__func__, task);
- sas_eh_defer_cmd(cmd);
+ sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_AT_LU:
SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task);
@@ -638,7 +631,7 @@ static void sas_eh_handle_sas_errors(str
"recovered\n",
SAS_ADDR(task->dev),
cmd->device->lun);
- sas_eh_defer_cmd(cmd);
+ sas_eh_finish_cmd(cmd);
sas_scsi_clear_queue_lu(work_q, cmd);
goto Again;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit c50c84c3ac4d5db683904bdb3257798b6ef980ae ]
The assember code in arch/s390/kernel uses a few more indirect branches
which need to be done with execute trampolines for CONFIG_EXPOLINE=y.
Cc: [email protected] # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Reviewed-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/base.S | 24 ++++++++++++++----------
arch/s390/kernel/reipl.S | 5 ++++-
arch/s390/kernel/swsusp.S | 10 ++++++----
3 files changed, 24 insertions(+), 15 deletions(-)
--- a/arch/s390/kernel/base.S
+++ b/arch/s390/kernel/base.S
@@ -8,18 +8,22 @@
#include <linux/linkage.h>
#include <asm/asm-offsets.h>
+#include <asm/nospec-insn.h>
#include <asm/ptrace.h>
#include <asm/sigp.h>
+ GEN_BR_THUNK %r9
+ GEN_BR_THUNK %r14
+
ENTRY(s390_base_mcck_handler)
basr %r13,0
0: lg %r15,__LC_PANIC_STACK # load panic stack
aghi %r15,-STACK_FRAME_OVERHEAD
larl %r1,s390_base_mcck_handler_fn
- lg %r1,0(%r1)
- ltgr %r1,%r1
+ lg %r9,0(%r1)
+ ltgr %r9,%r9
jz 1f
- basr %r14,%r1
+ BASR_EX %r14,%r9
1: la %r1,4095
lmg %r0,%r15,__LC_GPREGS_SAVE_AREA-4095(%r1)
lpswe __LC_MCK_OLD_PSW
@@ -36,10 +40,10 @@ ENTRY(s390_base_ext_handler)
basr %r13,0
0: aghi %r15,-STACK_FRAME_OVERHEAD
larl %r1,s390_base_ext_handler_fn
- lg %r1,0(%r1)
- ltgr %r1,%r1
+ lg %r9,0(%r1)
+ ltgr %r9,%r9
jz 1f
- basr %r14,%r1
+ BASR_EX %r14,%r9
1: lmg %r0,%r15,__LC_SAVE_AREA_ASYNC
ni __LC_EXT_OLD_PSW+1,0xfd # clear wait state bit
lpswe __LC_EXT_OLD_PSW
@@ -56,10 +60,10 @@ ENTRY(s390_base_pgm_handler)
basr %r13,0
0: aghi %r15,-STACK_FRAME_OVERHEAD
larl %r1,s390_base_pgm_handler_fn
- lg %r1,0(%r1)
- ltgr %r1,%r1
+ lg %r9,0(%r1)
+ ltgr %r9,%r9
jz 1f
- basr %r14,%r1
+ BASR_EX %r14,%r9
lmg %r0,%r15,__LC_SAVE_AREA_SYNC
lpswe __LC_PGM_OLD_PSW
1: lpswe disabled_wait_psw-0b(%r13)
@@ -116,7 +120,7 @@ ENTRY(diag308_reset)
larl %r4,.Lcontinue_psw # Restore PSW flags
lpswe 0(%r4)
.Lcontinue:
- br %r14
+ BR_EX %r14
.align 16
.Lrestart_psw:
.long 0x00080000,0x80000000 + .Lrestart_part2
--- a/arch/s390/kernel/reipl.S
+++ b/arch/s390/kernel/reipl.S
@@ -6,8 +6,11 @@
#include <linux/linkage.h>
#include <asm/asm-offsets.h>
+#include <asm/nospec-insn.h>
#include <asm/sigp.h>
+ GEN_BR_THUNK %r14
+
#
# store_status
#
@@ -62,7 +65,7 @@ ENTRY(store_status)
st %r3,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 4(%r1)
larl %r2,store_status
stg %r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 8(%r1)
- br %r14
+ BR_EX %r14
.section .bss
.align 8
--- a/arch/s390/kernel/swsusp.S
+++ b/arch/s390/kernel/swsusp.S
@@ -12,6 +12,7 @@
#include <asm/ptrace.h>
#include <asm/thread_info.h>
#include <asm/asm-offsets.h>
+#include <asm/nospec-insn.h>
#include <asm/sigp.h>
/*
@@ -23,6 +24,8 @@
* (see below) in the resume process.
* This function runs with disabled interrupts.
*/
+ GEN_BR_THUNK %r14
+
.section .text
ENTRY(swsusp_arch_suspend)
stmg %r6,%r15,__SF_GPRS(%r15)
@@ -102,7 +105,7 @@ ENTRY(swsusp_arch_suspend)
spx 0x318(%r1)
lmg %r6,%r15,STACK_FRAME_OVERHEAD + __SF_GPRS(%r15)
lghi %r2,0
- br %r14
+ BR_EX %r14
/*
* Restore saved memory image to correct place and restore register context.
@@ -196,11 +199,10 @@ pgm_check_entry:
larl %r15,init_thread_union
ahi %r15,1<<(PAGE_SHIFT+THREAD_ORDER)
larl %r2,.Lpanic_string
- larl %r3,_sclp_print_early
lghi %r1,0
sam31
sigp %r1,%r0,SIGP_SET_ARCHITECTURE
- basr %r14,%r3
+ brasl %r14,_sclp_print_early
larl %r3,.Ldisabled_wait_31
lpsw 0(%r3)
4:
@@ -266,7 +268,7 @@ restore_registers:
/* Return 0 */
lmg %r6,%r15,STACK_FRAME_OVERHEAD + __SF_GPRS(%r15)
lghi %r2,0
- br %r14
+ BR_EX %r14
.section .data..nosave,"aw",@progbits
.align 8
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit 6deaa3bbca804b2a3627fd685f75de64da7be535 ]
The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition
of the sk_load_word and sk_load_half functions.
Add support for branch-on-condition instructions contained in the
thunk code of an expoline.
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/nospec-insn.h | 57 ++++++++++++++++++++++++++++++++++++
arch/s390/kernel/nospec-branch.c | 25 ++++++++++++---
2 files changed, 77 insertions(+), 5 deletions(-)
--- a/arch/s390/include/asm/nospec-insn.h
+++ b/arch/s390/include/asm/nospec-insn.h
@@ -32,10 +32,18 @@
__THUNK_PROLOG_NAME __s390x_indirect_jump_r\r2\()use_r\r1
.endm
+ .macro __THUNK_PROLOG_BC d0,r1,r2
+ __THUNK_PROLOG_NAME __s390x_indirect_branch_\d0\()_\r2\()use_\r1
+ .endm
+
.macro __THUNK_BR r1,r2
jg __s390x_indirect_jump_r\r2\()use_r\r1
.endm
+ .macro __THUNK_BC d0,r1,r2
+ jg __s390x_indirect_branch_\d0\()_\r2\()use_\r1
+ .endm
+
.macro __THUNK_BRASL r1,r2,r3
brasl \r1,__s390x_indirect_jump_r\r3\()use_r\r2
.endm
@@ -78,6 +86,23 @@
.endif
.endm
+ .macro __DECODE_DRR expand,disp,reg,ruse
+ .set __decode_fail,1
+ .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \reg,%r\r1
+ .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \ruse,%r\r2
+ \expand \disp,\r1,\r2
+ .set __decode_fail,0
+ .endif
+ .endr
+ .endif
+ .endr
+ .if __decode_fail == 1
+ .error "__DECODE_DRR failed"
+ .endif
+ .endm
+
.macro __THUNK_EX_BR reg,ruse
# Be very careful when adding instructions to this macro!
# The ALTERNATIVE replacement code has a .+10 which targets
@@ -98,12 +123,30 @@
555: br \reg
.endm
+ .macro __THUNK_EX_BC disp,reg,ruse
+#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES
+ exrl 0,556f
+ j .
+#else
+ larl \ruse,556f
+ ex 0,0(\ruse)
+ j .
+#endif
+556: b \disp(\reg)
+ .endm
+
.macro GEN_BR_THUNK reg,ruse=%r1
__DECODE_RR __THUNK_PROLOG_BR,\reg,\ruse
__THUNK_EX_BR \reg,\ruse
__THUNK_EPILOG
.endm
+ .macro GEN_B_THUNK disp,reg,ruse=%r1
+ __DECODE_DRR __THUNK_PROLOG_BC,\disp,\reg,\ruse
+ __THUNK_EX_BC \disp,\reg,\ruse
+ __THUNK_EPILOG
+ .endm
+
.macro BR_EX reg,ruse=%r1
557: __DECODE_RR __THUNK_BR,\reg,\ruse
.pushsection .s390_indirect_branches,"a",@progbits
@@ -111,6 +154,13 @@
.popsection
.endm
+ .macro B_EX disp,reg,ruse=%r1
+558: __DECODE_DRR __THUNK_BC,\disp,\reg,\ruse
+ .pushsection .s390_indirect_branches,"a",@progbits
+ .long 558b-.
+ .popsection
+ .endm
+
.macro BASR_EX rsave,rtarget,ruse=%r1
559: __DECODE_RRR __THUNK_BRASL,\rsave,\rtarget,\ruse
.pushsection .s390_indirect_branches,"a",@progbits
@@ -122,10 +172,17 @@
.macro GEN_BR_THUNK reg,ruse=%r1
.endm
+ .macro GEN_B_THUNK disp,reg,ruse=%r1
+ .endm
+
.macro BR_EX reg,ruse=%r1
br \reg
.endm
+ .macro B_EX disp,reg,ruse=%r1
+ b \disp(\reg)
+ .endm
+
.macro BASR_EX rsave,rtarget,ruse=%r1
basr \rsave,\rtarget
.endm
--- a/arch/s390/kernel/nospec-branch.c
+++ b/arch/s390/kernel/nospec-branch.c
@@ -94,7 +94,6 @@ static void __init_or_module __nospec_re
s32 *epo;
/* Second part of the instruction replace is always a nop */
- memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x00, 0x00 }, 4);
for (epo = start; epo < end; epo++) {
instr = (u8 *) epo + *epo;
if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x04)
@@ -115,18 +114,34 @@ static void __init_or_module __nospec_re
br = thunk + (*(int *)(thunk + 2)) * 2;
else
continue;
- if (br[0] != 0x07 || (br[1] & 0xf0) != 0xf0)
+ /* Check for unconditional branch 0x07f? or 0x47f???? */
+ if ((br[0] & 0xbf) != 0x07 || (br[1] & 0xf0) != 0xf0)
continue;
+
+ memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x07, 0x00 }, 4);
switch (type) {
case BRCL_EXPOLINE:
- /* brcl to thunk, replace with br + nop */
insnbuf[0] = br[0];
insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f);
+ if (br[0] == 0x47) {
+ /* brcl to b, replace with bc + nopr */
+ insnbuf[2] = br[2];
+ insnbuf[3] = br[3];
+ } else {
+ /* brcl to br, replace with bcr + nop */
+ }
break;
case BRASL_EXPOLINE:
- /* brasl to thunk, replace with basr + nop */
- insnbuf[0] = 0x0d;
insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f);
+ if (br[0] == 0x47) {
+ /* brasl to b, replace with bas + nopr */
+ insnbuf[0] = 0x4d;
+ insnbuf[2] = br[2];
+ insnbuf[3] = br[3];
+ } else {
+ /* brasl to br, replace with basr + nop */
+ insnbuf[0] = 0x0d;
+ }
break;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa <[email protected]>
commit 66072c29328717072fd84aaff3e070e3f008ba77 upstream.
syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().
[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Al Viro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Ernesto A. Fernandez <[email protected]>
Cc: Vyacheslav Dubeyko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/hfsplus/super.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -585,6 +585,7 @@ static int hfsplus_fill_super(struct sup
return 0;
out_put_hidden_dir:
+ cancel_delayed_work_sync(&sbi->sync_work);
iput(sbi->hidden_dir);
out_put_root:
dput(sb->s_root);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg <[email protected]>
commit a7cfebcb7594a24609268f91299ab85ba064bf82 upstream.
There's currently no limit on wiphy names, other than netlink
message size and memory limitations, but that causes issues when,
for example, the wiphy name is used in a uevent, e.g. in rfkill
where we use the same name for the rfkill instance, and then the
buffer there is "only" 2k for the environment variables.
This was reported by syzkaller, which used a 4k name.
Limit the name to something reasonable, I randomly picked 128.
Reported-by: [email protected]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/uapi/linux/nl80211.h | 2 ++
net/wireless/core.c | 3 +++
2 files changed, 5 insertions(+)
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -2195,6 +2195,8 @@ enum nl80211_attrs {
#define NL80211_ATTR_KEYS NL80211_ATTR_KEYS
#define NL80211_ATTR_FEATURE_FLAGS NL80211_ATTR_FEATURE_FLAGS
+#define NL80211_WIPHY_NAME_MAXLEN 128
+
#define NL80211_MAX_SUPP_RATES 32
#define NL80211_MAX_SUPP_HT_RATES 77
#define NL80211_MAX_SUPP_REG_RULES 64
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -94,6 +94,9 @@ static int cfg80211_dev_check_name(struc
ASSERT_RTNL();
+ if (strlen(newname) > NL80211_WIPHY_NAME_MAXLEN)
+ return -EINVAL;
+
/* prohibit calling the thing phy%d when %d is not its number */
sscanf(newname, PHY_NAME "%d%n", &wiphy_idx, &taken);
if (taken == strlen(newname) && wiphy_idx != rdev->wiphy_idx) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven <[email protected]>
commit b26a719bdba9aa926ceaadecc66e07623d2b8a53 upstream.
The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
When using a GPIO purely as an interrupt source, no Runtime PM handling
is done, and the GPIO module's clock may not be enabled.
To fix this:
- Add .irq_request_resources() and .irq_release_resources() callbacks
to handle Runtime PM when an interrupt is requested,
- Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM
when e.g. disabling/enabling an interrupt, or configuring the
interrupt type.
Fixes: d5c3d84657db57bd "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
[fabrizio: cherry-pick to v4.4.y. Use container_of instead of
gpiochip_get_data.]
Signed-off-by: Fabrizio Castro <[email protected]>
Reviewed-by: Biju Das <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpio/gpio-rcar.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
--- a/drivers/gpio/gpio-rcar.c
+++ b/drivers/gpio/gpio-rcar.c
@@ -200,6 +200,48 @@ static int gpio_rcar_irq_set_wake(struct
return 0;
}
+static void gpio_rcar_irq_bus_lock(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv,
+ gpio_chip);
+
+ pm_runtime_get_sync(&p->pdev->dev);
+}
+
+static void gpio_rcar_irq_bus_sync_unlock(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv,
+ gpio_chip);
+
+ pm_runtime_put(&p->pdev->dev);
+}
+
+
+static int gpio_rcar_irq_request_resources(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv,
+ gpio_chip);
+ int error;
+
+ error = pm_runtime_get_sync(&p->pdev->dev);
+ if (error < 0)
+ return error;
+
+ return 0;
+}
+
+static void gpio_rcar_irq_release_resources(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv,
+ gpio_chip);
+
+ pm_runtime_put(&p->pdev->dev);
+}
+
static irqreturn_t gpio_rcar_irq_handler(int irq, void *dev_id)
{
struct gpio_rcar_priv *p = dev_id;
@@ -460,6 +502,10 @@ static int gpio_rcar_probe(struct platfo
irq_chip->irq_unmask = gpio_rcar_irq_enable;
irq_chip->irq_set_type = gpio_rcar_irq_set_type;
irq_chip->irq_set_wake = gpio_rcar_irq_set_wake;
+ irq_chip->irq_bus_lock = gpio_rcar_irq_bus_lock;
+ irq_chip->irq_bus_sync_unlock = gpio_rcar_irq_bus_sync_unlock;
+ irq_chip->irq_request_resources = gpio_rcar_irq_request_resources;
+ irq_chip->irq_release_resources = gpio_rcar_irq_release_resources;
irq_chip->flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_MASK_ON_SUSPEND;
ret = gpiochip_add(gpio_chip);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith <[email protected]>
commit 786842b62f81f20d14894925e8c225328ee8144b upstream.
The OpenPower Abstraction Layer firmware went through a couple
of iterations in the lab before being released. What we now know
as OPAL advertises itself as OPALv3.
OPALv2 and OPALv1 never made it outside the lab, and the possibility
of anyone at all ever building a mainline kernel today and expecting
it to boot on such hardware is zero.
Signed-off-by: Stewart Smith <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/platforms/powernv/opal.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -103,11 +103,8 @@ int __init early_init_dt_scan_opal(unsig
powerpc_firmware_features |= FW_FEATURE_OPALv2;
powerpc_firmware_features |= FW_FEATURE_OPALv3;
pr_info("OPAL V3 detected !\n");
- } else if (of_flat_dt_is_compatible(node, "ibm,opal-v2")) {
- powerpc_firmware_features |= FW_FEATURE_OPALv2;
- pr_info("OPAL V2 detected !\n");
} else {
- pr_info("OPAL V1 detected !\n");
+ panic("OPAL != V3 detected, no longer supported.\n");
}
/* Reinit all cores with the right endian */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa <[email protected]>
commit a466ef76b815b86748d9870ef2a430af7b39c710 upstream.
>From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <[email protected]>
Date: Wed, 9 May 2018 12:12:39 +0900
Subject: [PATCH 4.4 92/92] x86/kexec: Avoid double free_page() upon do_kexec_load() failure
syzbot is reporting crashes after memory allocation failure inside
do_kexec_load() [1]. This is because free_transition_pgtable() is called
by both init_transition_pgtable() and machine_kexec_cleanup() when memory
allocation failed inside init_transition_pgtable().
Regarding 32bit code, machine_kexec_free_page_tables() is called by both
machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
allocation failed inside machine_kexec_alloc_page_tables().
Fix this by leaving the error handling to machine_kexec_cleanup()
(and optionally setting NULL after free_page()).
[1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40
Fixes: f5deb79679af6eb4 ("x86: kexec: Use one page table in x86_64 machine_kexec")
Fixes: 92be3d6bdf2cb349 ("kexec/i386: allocate page table pages dynamically")
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Huang Ying <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/machine_kexec_32.c | 6 +++++-
arch/x86/kernel/machine_kexec_64.c | 4 +++-
2 files changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -71,12 +71,17 @@ static void load_segments(void)
static void machine_kexec_free_page_tables(struct kimage *image)
{
free_page((unsigned long)image->arch.pgd);
+ image->arch.pgd = NULL;
#ifdef CONFIG_X86_PAE
free_page((unsigned long)image->arch.pmd0);
+ image->arch.pmd0 = NULL;
free_page((unsigned long)image->arch.pmd1);
+ image->arch.pmd1 = NULL;
#endif
free_page((unsigned long)image->arch.pte0);
+ image->arch.pte0 = NULL;
free_page((unsigned long)image->arch.pte1);
+ image->arch.pte1 = NULL;
}
static int machine_kexec_alloc_page_tables(struct kimage *image)
@@ -93,7 +98,6 @@ static int machine_kexec_alloc_page_tabl
!image->arch.pmd0 || !image->arch.pmd1 ||
#endif
!image->arch.pte0 || !image->arch.pte1) {
- machine_kexec_free_page_tables(image);
return -ENOMEM;
}
return 0;
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -37,8 +37,11 @@ static struct kexec_file_ops *kexec_file
static void free_transition_pgtable(struct kimage *image)
{
free_page((unsigned long)image->arch.pud);
+ image->arch.pud = NULL;
free_page((unsigned long)image->arch.pmd);
+ image->arch.pmd = NULL;
free_page((unsigned long)image->arch.pte);
+ image->arch.pte = NULL;
}
static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -79,7 +82,6 @@ static int init_transition_pgtable(struc
set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
return 0;
err:
- free_transition_pgtable(image);
return result;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit 23a4d7fd34856da8218c4cfc23dba7a6ec0a423a ]
The return from the ftrace_stub, _mcount, ftrace_caller and
return_to_handler functions is done with "br %r14" and "br %r1".
These are indirect branches as well and need to use execute
trampolines for CONFIG_EXPOLINE=y.
The ftrace_caller function is a special case as it returns to the
start of a function and may only use %r0 and %r1. For a pre z10
machine the standard execute trampoline uses a LARL + EX to do
this, but this requires *two* registers in the range %r1..%r15.
To get around this the 'br %r1' located in the lowcore is used,
then the EX instruction does not need an address register.
But the lowcore trick may only be used for pre z14 machines,
with noexec=on the mapping for the first page may not contain
instructions. The solution for that is an ALTERNATIVE in the
expoline THUNK generated by 'GEN_BR_THUNK %r1' to switch to
EXRL, this relies on the fact that a machine that supports
noexec=on has EXRL as well.
Cc: [email protected] # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/nospec-insn.h | 11 +++++++++++
arch/s390/kernel/asm-offsets.c | 1 +
arch/s390/kernel/mcount.S | 14 +++++++++-----
3 files changed, 21 insertions(+), 5 deletions(-)
--- a/arch/s390/include/asm/nospec-insn.h
+++ b/arch/s390/include/asm/nospec-insn.h
@@ -2,6 +2,9 @@
#ifndef _ASM_S390_NOSPEC_ASM_H
#define _ASM_S390_NOSPEC_ASM_H
+#include <asm/alternative-asm.h>
+#include <asm/asm-offsets.h>
+
#ifdef __ASSEMBLY__
#ifdef CONFIG_EXPOLINE
@@ -76,13 +79,21 @@
.endm
.macro __THUNK_EX_BR reg,ruse
+ # Be very careful when adding instructions to this macro!
+ # The ALTERNATIVE replacement code has a .+10 which targets
+ # the "br \reg" after the code has been patched.
#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES
exrl 0,555f
j .
#else
+ .ifc \reg,%r1
+ ALTERNATIVE "ex %r0,_LC_BR_R1", ".insn ril,0xc60000000000,0,.+10", 35
+ j .
+ .else
larl \ruse,555f
ex 0,0(\ruse)
j .
+ .endif
#endif
555: br \reg
.endm
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -170,6 +170,7 @@ int main(void)
OFFSET(__LC_MACHINE_FLAGS, _lowcore, machine_flags);
OFFSET(__LC_GMAP, _lowcore, gmap);
OFFSET(__LC_PASTE, _lowcore, paste);
+ OFFSET(__LC_BR_R1, _lowcore, br_r1_trampoline);
/* software defined ABI-relevant lowcore locations 0xe00 - 0xe20 */
OFFSET(__LC_DUMP_REIPL, _lowcore, ipib);
/* hardware defined lowcore locations 0x1000 - 0x18ff */
--- a/arch/s390/kernel/mcount.S
+++ b/arch/s390/kernel/mcount.S
@@ -8,12 +8,16 @@
#include <linux/linkage.h>
#include <asm/asm-offsets.h>
#include <asm/ftrace.h>
+#include <asm/nospec-insn.h>
#include <asm/ptrace.h>
+ GEN_BR_THUNK %r1
+ GEN_BR_THUNK %r14
+
.section .kprobes.text, "ax"
ENTRY(ftrace_stub)
- br %r14
+ BR_EX %r14
#define STACK_FRAME_SIZE (STACK_FRAME_OVERHEAD + __PT_SIZE)
#define STACK_PTREGS (STACK_FRAME_OVERHEAD)
@@ -21,7 +25,7 @@ ENTRY(ftrace_stub)
#define STACK_PTREGS_PSW (STACK_PTREGS + __PT_PSW)
ENTRY(_mcount)
- br %r14
+ BR_EX %r14
ENTRY(ftrace_caller)
.globl ftrace_regs_caller
@@ -49,7 +53,7 @@ ENTRY(ftrace_caller)
#endif
lgr %r3,%r14
la %r5,STACK_PTREGS(%r15)
- basr %r14,%r1
+ BASR_EX %r14,%r1
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
# The j instruction gets runtime patched to a nop instruction.
# See ftrace_enable_ftrace_graph_caller.
@@ -64,7 +68,7 @@ ftrace_graph_caller_end:
#endif
lg %r1,(STACK_PTREGS_PSW+8)(%r15)
lmg %r2,%r15,(STACK_PTREGS_GPRS+2*8)(%r15)
- br %r1
+ BR_EX %r1
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -77,6 +81,6 @@ ENTRY(return_to_handler)
aghi %r15,STACK_FRAME_OVERHEAD
lgr %r14,%r2
lmg %r2,%r5,32(%r15)
- br %r14
+ BR_EX %r14
#endif
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long <[email protected]>
commit c7be96af89d4b53211862d8599b2430e8900ed92 upstream.
When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.
9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2
[k] _raw_spin_lock_irq
|
---_raw_spin_lock_irq
|
|--99.34%-- __set_current_blocked
| sigprocmask
| sys_rt_sigprocmask
| system_call_fastpath
| |
| |--50.63%-- __swapcontext
| | |
| | |--99.91%-- upsleepgeneric
| |
| |--49.36%-- __setcontext
| | ktskRun
Looking further into the swapcontext function in glibc, it was found that
the function always call sigprocmask() without checking if there are
changes in the signal mask.
A check was added to the __set_current_blocked() function to avoid taking
the sighand->siglock spinlock if there is no change in the signal mask.
This will prevent unneeded spinlock contention when many threads are
trying to call sigprocmask().
With this patch applied, the spinlock contention in sigprocmask() was
gone.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Douglas Hatch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/signal.h | 17 +++++++++++++++++
kernel/signal.c | 7 +++++++
2 files changed, 24 insertions(+)
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -97,6 +97,23 @@ static inline int sigisemptyset(sigset_t
}
}
+static inline int sigequalsets(const sigset_t *set1, const sigset_t *set2)
+{
+ switch (_NSIG_WORDS) {
+ case 4:
+ return (set1->sig[3] == set2->sig[3]) &&
+ (set1->sig[2] == set2->sig[2]) &&
+ (set1->sig[1] == set2->sig[1]) &&
+ (set1->sig[0] == set2->sig[0]);
+ case 2:
+ return (set1->sig[1] == set2->sig[1]) &&
+ (set1->sig[0] == set2->sig[0]);
+ case 1:
+ return set1->sig[0] == set2->sig[0];
+ }
+ return 0;
+}
+
#define sigmask(sig) (1UL << ((sig) - 1))
#ifndef __HAVE_ARCH_SIG_SETOPS
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2495,6 +2495,13 @@ void __set_current_blocked(const sigset_
{
struct task_struct *tsk = current;
+ /*
+ * In case the signal mask hasn't changed, there is nothing we need
+ * to do. The current->blocked shouldn't be modified by other task.
+ */
+ if (sigequalsets(&tsk->blocked, newset))
+ return;
+
spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, newset);
spin_unlock_irq(&tsk->sighand->siglock);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <[email protected]>
commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
In the generic read paths the kernel looks up a page in the page cache
and if it's up to date, it is used. If not, the page lock is acquired
to wait for IO to complete and then check the page. If multiple
processes are waiting on IO, they all serialise against the lock and
duplicate the checks. This is unnecessary.
The page lock in itself does not give any guarantees to the callers
about the page state as it can be immediately truncated or reclaimed
after the page is unlocked. It's sufficient to wait_on_page_locked and
then continue if the page is up to date on wakeup.
It is possible that a truncated but up-to-date page is returned but the
reference taken during read prevents it disappearing underneath the
caller and the data is still valid if PageUptodate.
The overall impact is small as even if processes serialise on the lock,
the lock section is tiny once the IO is complete. Profiles indicated
that unlock_page and friends are generally a tiny portion of a
read-intensive workload. An artificial test was created that had
instances of dd access a cache-cold file on an ext4 filesystem and
measure how long the read took.
paralleldd
4.4.0 4.4.0
vanilla avoidlock
Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid unnecessary
calls to lock_page.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1581,6 +1581,15 @@ find_page:
index, last_index - index);
}
if (!PageUptodate(page)) {
+ /*
+ * See comment in do_read_cache_page on why
+ * wait_on_page_locked is used to avoid unnecessarily
+ * serialisations and why it's safe.
+ */
+ wait_on_page_locked_killable(page);
+ if (PageUptodate(page))
+ goto page_ok;
+
if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
!mapping->a_ops->is_partially_uptodate)
goto page_not_up_to_date;
@@ -2253,12 +2262,52 @@ filler:
if (PageUptodate(page))
goto out;
+ /*
+ * Page is not up to date and may be locked due one of the following
+ * case a: Page is being filled and the page lock is held
+ * case b: Read/write error clearing the page uptodate status
+ * case c: Truncation in progress (page locked)
+ * case d: Reclaim in progress
+ *
+ * Case a, the page will be up to date when the page is unlocked.
+ * There is no need to serialise on the page lock here as the page
+ * is pinned so the lock gives no additional protection. Even if the
+ * the page is truncated, the data is still valid if PageUptodate as
+ * it's a race vs truncate race.
+ * Case b, the page will not be up to date
+ * Case c, the page may be truncated but in itself, the data may still
+ * be valid after IO completes as it's a read vs truncate race. The
+ * operation must restart if the page is not uptodate on unlock but
+ * otherwise serialising on page lock to stabilise the mapping gives
+ * no additional guarantees to the caller as the page lock is
+ * released before return.
+ * Case d, similar to truncation. If reclaim holds the page lock, it
+ * will be a race with remove_mapping that determines if the mapping
+ * is valid on unlock but otherwise the data is valid and there is
+ * no need to serialise with page lock.
+ *
+ * As the page lock gives no additional guarantee, we optimistically
+ * wait on the page to be unlocked and check if it's up to date and
+ * use the page if it is. Otherwise, the page lock is required to
+ * distinguish between the different cases. The motivation is that we
+ * avoid spurious serialisations and wakeups when multiple processes
+ * wait on the same page for IO to complete.
+ */
+ wait_on_page_locked(page);
+ if (PageUptodate(page))
+ goto out;
+
+ /* Distinguish between all the cases under the safety of the lock */
lock_page(page);
+
+ /* Case c or d, restart the operation */
if (!page->mapping) {
unlock_page(page);
page_cache_release(page);
goto repeat;
}
+
+ /* Someone else locked and filled the page in a very small window */
if (PageUptodate(page)) {
unlock_page(page);
goto out;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <[email protected]>
commit 32b635298ff4e991d8d8f64dc23782b02eec29c3 upstream.
do_read_cache_page and __read_cache_page duplicate page filler code when
filling the page for the first time. This patch simply removes the
duplicate logic.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/filemap.c | 43 ++++++++++++-------------------------------
1 file changed, 12 insertions(+), 31 deletions(-)
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2215,7 +2215,7 @@ static struct page *wait_on_page_read(st
return page;
}
-static struct page *__read_cache_page(struct address_space *mapping,
+static struct page *do_read_cache_page(struct address_space *mapping,
pgoff_t index,
int (*filler)(void *, struct page *),
void *data,
@@ -2237,31 +2237,19 @@ repeat:
/* Presumably ENOMEM for radix tree node */
return ERR_PTR(err);
}
+
+filler:
err = filler(data, page);
if (err < 0) {
page_cache_release(page);
- page = ERR_PTR(err);
- } else {
- page = wait_on_page_read(page);
+ return ERR_PTR(err);
}
- }
- return page;
-}
-
-static struct page *do_read_cache_page(struct address_space *mapping,
- pgoff_t index,
- int (*filler)(void *, struct page *),
- void *data,
- gfp_t gfp)
-{
- struct page *page;
- int err;
-
-retry:
- page = __read_cache_page(mapping, index, filler, data, gfp);
- if (IS_ERR(page))
- return page;
+ page = wait_on_page_read(page);
+ if (IS_ERR(page))
+ return page;
+ goto out;
+ }
if (PageUptodate(page))
goto out;
@@ -2269,21 +2257,14 @@ retry:
if (!page->mapping) {
unlock_page(page);
page_cache_release(page);
- goto retry;
+ goto repeat;
}
if (PageUptodate(page)) {
unlock_page(page);
goto out;
}
- err = filler(data, page);
- if (err < 0) {
- page_cache_release(page);
- return ERR_PTR(err);
- } else {
- page = wait_on_page_read(page);
- if (IS_ERR(page))
- return page;
- }
+ goto filler;
+
out:
mark_page_accessed(page);
return page;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Davydov <[email protected]>
commit 316bda0e6cc5f36f94b4af8bded16d642c90ad75 upstream.
We assume there is enough inactive page cache if the size of inactive
file lru is greater than the size of active file lru, in which case we
force-scan file lru ignoring anonymous pages. While this logic works
fine when there are plenty of page cache pages, it fails if the size of
file lru is small (several MB): in this case (lru_size >> prio) will be
0 for normal scan priorities, as a result, if inactive file lru happens
to be larger than active file lru, anonymous pages of a cgroup will
never get evicted unless the system experiences severe memory pressure,
even if there are gigabytes of unused anonymous memory there, which is
unfair in respect to other cgroups, whose workloads might be page cache
oriented.
This patch attempts to fix this by elaborating the "enough inactive page
cache" check: it makes it not only check that inactive lru size > active
lru size, but also that we will scan something from the cgroup at the
current scan priority. If these conditions do not hold, we proceed to
SCAN_FRACT as usual.
Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/vmscan.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2057,10 +2057,16 @@ static void get_scan_count(struct lruvec
}
/*
- * There is enough inactive page cache, do not reclaim
- * anything from the anonymous working set right now.
+ * If there is enough inactive page cache, i.e. if the size of the
+ * inactive list is greater than that of the active list *and* the
+ * inactive list actually has some pages to scan on this priority, we
+ * do not reclaim anything from the anonymous working set right now.
+ * Without the second condition we could end up never scanning an
+ * lruvec even if it has plenty of old anonymous pages unless the
+ * system is under heavy pressure.
*/
- if (!inactive_file_is_low(lruvec)) {
+ if (!inactive_file_is_low(lruvec) &&
+ get_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
scan_balance = SCAN_FILE;
goto out;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <[email protected]>
[ Upstream commit 7f582b248d0a86bae5788c548d7bb5bca6f7691a ]
syzkaller found a reliable way to crash the host, hitting a BUG()
in __tcp_retransmit_skb()
Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
in tcp_connect_init() at the point we init snd_una/write_seq.
This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
kernel BUG at net/ipv4/tcp_output.c:2837!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Neal Cardwell <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_output.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2587,8 +2587,10 @@ int __tcp_retransmit_skb(struct sock *sk
return -EBUSY;
if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
- if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
- BUG();
+ if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) {
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
return -ENOMEM;
}
@@ -3117,6 +3119,7 @@ static void tcp_connect_init(struct sock
sock_reset_flag(sk, SOCK_DONE);
tp->snd_wnd = 0;
tcp_init_wl(tp, 0);
+ tcp_write_queue_purge(sk);
tp->snd_una = tp->write_seq;
tp->snd_sml = tp->write_seq;
tp->snd_up = tp->write_seq;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
[ Upstream commit 6dd85fbb87d1d6b87a3b1f02ca28d7b2abd2e7ba ]
To be able to use the expoline branches in different assembler
files move the associated macros from entry.S to a new header
nospec-insn.h.
While we are at it make the macros a bit nicer to use.
Cc: [email protected] # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/nospec-insn.h | 125 ++++++++++++++++++++++++++++++++++++
arch/s390/kernel/entry.S | 105 ++++++------------------------
2 files changed, 149 insertions(+), 81 deletions(-)
create mode 100644 arch/s390/include/asm/nospec-insn.h
--- /dev/null
+++ b/arch/s390/include/asm/nospec-insn.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_NOSPEC_ASM_H
+#define _ASM_S390_NOSPEC_ASM_H
+
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_EXPOLINE
+
+/*
+ * The expoline macros are used to create thunks in the same format
+ * as gcc generates them. The 'comdat' section flag makes sure that
+ * the various thunks are merged into a single copy.
+ */
+ .macro __THUNK_PROLOG_NAME name
+ .pushsection .text.\name,"axG",@progbits,\name,comdat
+ .globl \name
+ .hidden \name
+ .type \name,@function
+\name:
+ .cfi_startproc
+ .endm
+
+ .macro __THUNK_EPILOG
+ .cfi_endproc
+ .popsection
+ .endm
+
+ .macro __THUNK_PROLOG_BR r1,r2
+ __THUNK_PROLOG_NAME __s390x_indirect_jump_r\r2\()use_r\r1
+ .endm
+
+ .macro __THUNK_BR r1,r2
+ jg __s390x_indirect_jump_r\r2\()use_r\r1
+ .endm
+
+ .macro __THUNK_BRASL r1,r2,r3
+ brasl \r1,__s390x_indirect_jump_r\r3\()use_r\r2
+ .endm
+
+ .macro __DECODE_RR expand,reg,ruse
+ .set __decode_fail,1
+ .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \reg,%r\r1
+ .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \ruse,%r\r2
+ \expand \r1,\r2
+ .set __decode_fail,0
+ .endif
+ .endr
+ .endif
+ .endr
+ .if __decode_fail == 1
+ .error "__DECODE_RR failed"
+ .endif
+ .endm
+
+ .macro __DECODE_RRR expand,rsave,rtarget,ruse
+ .set __decode_fail,1
+ .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \rsave,%r\r1
+ .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \rtarget,%r\r2
+ .irp r3,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ .ifc \ruse,%r\r3
+ \expand \r1,\r2,\r3
+ .set __decode_fail,0
+ .endif
+ .endr
+ .endif
+ .endr
+ .endif
+ .endr
+ .if __decode_fail == 1
+ .error "__DECODE_RRR failed"
+ .endif
+ .endm
+
+ .macro __THUNK_EX_BR reg,ruse
+#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES
+ exrl 0,555f
+ j .
+#else
+ larl \ruse,555f
+ ex 0,0(\ruse)
+ j .
+#endif
+555: br \reg
+ .endm
+
+ .macro GEN_BR_THUNK reg,ruse=%r1
+ __DECODE_RR __THUNK_PROLOG_BR,\reg,\ruse
+ __THUNK_EX_BR \reg,\ruse
+ __THUNK_EPILOG
+ .endm
+
+ .macro BR_EX reg,ruse=%r1
+557: __DECODE_RR __THUNK_BR,\reg,\ruse
+ .pushsection .s390_indirect_branches,"a",@progbits
+ .long 557b-.
+ .popsection
+ .endm
+
+ .macro BASR_EX rsave,rtarget,ruse=%r1
+559: __DECODE_RRR __THUNK_BRASL,\rsave,\rtarget,\ruse
+ .pushsection .s390_indirect_branches,"a",@progbits
+ .long 559b-.
+ .popsection
+ .endm
+
+#else
+ .macro GEN_BR_THUNK reg,ruse=%r1
+ .endm
+
+ .macro BR_EX reg,ruse=%r1
+ br \reg
+ .endm
+
+ .macro BASR_EX rsave,rtarget,ruse=%r1
+ basr \rsave,\rtarget
+ .endm
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_S390_NOSPEC_ASM_H */
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -23,6 +23,7 @@
#include <asm/vx-insn.h>
#include <asm/setup.h>
#include <asm/nmi.h>
+#include <asm/nospec-insn.h>
__PT_R0 = __PT_GPRS
__PT_R1 = __PT_GPRS + 8
@@ -225,74 +226,16 @@ _PIF_WORK = (_PIF_PER_TRAP)
.popsection
.endm
-#ifdef CONFIG_EXPOLINE
-
- .macro GEN_BR_THUNK name,reg,tmp
- .section .text.\name,"axG",@progbits,\name,comdat
- .globl \name
- .hidden \name
- .type \name,@function
-\name:
- .cfi_startproc
-#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES
- exrl 0,0f
-#else
- larl \tmp,0f
- ex 0,0(\tmp)
-#endif
- j .
-0: br \reg
- .cfi_endproc
- .endm
-
- GEN_BR_THUNK __s390x_indirect_jump_r1use_r9,%r9,%r1
- GEN_BR_THUNK __s390x_indirect_jump_r1use_r14,%r14,%r1
- GEN_BR_THUNK __s390x_indirect_jump_r11use_r14,%r14,%r11
-
- .macro BASR_R14_R9
-0: brasl %r14,__s390x_indirect_jump_r1use_r9
- .pushsection .s390_indirect_branches,"a",@progbits
- .long 0b-.
- .popsection
- .endm
-
- .macro BR_R1USE_R14
-0: jg __s390x_indirect_jump_r1use_r14
- .pushsection .s390_indirect_branches,"a",@progbits
- .long 0b-.
- .popsection
- .endm
-
- .macro BR_R11USE_R14
-0: jg __s390x_indirect_jump_r11use_r14
- .pushsection .s390_indirect_branches,"a",@progbits
- .long 0b-.
- .popsection
- .endm
-
-#else /* CONFIG_EXPOLINE */
-
- .macro BASR_R14_R9
- basr %r14,%r9
- .endm
-
- .macro BR_R1USE_R14
- br %r14
- .endm
-
- .macro BR_R11USE_R14
- br %r14
- .endm
-
-#endif /* CONFIG_EXPOLINE */
-
+ GEN_BR_THUNK %r9
+ GEN_BR_THUNK %r14
+ GEN_BR_THUNK %r14,%r11
.section .kprobes.text, "ax"
ENTRY(__bpon)
.globl __bpon
BPON
- BR_R1USE_R14
+ BR_EX %r14
/*
* Scheduler resume function, called by switch_to
@@ -322,7 +265,7 @@ ENTRY(__switch_to)
TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_LPP
jz 0f
.insn s,0xb2800000,__LC_LPP # set program parameter
-0: BR_R1USE_R14
+0: BR_EX %r14
.L__critical_start:
@@ -388,7 +331,7 @@ sie_exit:
xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers
lg %r2,__SF_EMPTY+16(%r15) # return exit reason code
- BR_R1USE_R14
+ BR_EX %r14
.Lsie_fault:
lghi %r14,-EFAULT
stg %r14,__SF_EMPTY+16(%r15) # set exit reason code
@@ -445,7 +388,7 @@ ENTRY(system_call)
lgf %r9,0(%r8,%r10) # get system call add.
TSTMSK __TI_flags(%r12),_TIF_TRACE
jnz .Lsysc_tracesys
- BASR_R14_R9 # call sys_xxxx
+ BASR_EX %r14,%r9 # call sys_xxxx
stg %r2,__PT_R2(%r11) # store return value
.Lsysc_return:
@@ -585,7 +528,7 @@ ENTRY(system_call)
lmg %r3,%r7,__PT_R3(%r11)
stg %r7,STACK_FRAME_OVERHEAD(%r15)
lg %r2,__PT_ORIG_GPR2(%r11)
- BASR_R14_R9 # call sys_xxx
+ BASR_EX %r14,%r9 # call sys_xxx
stg %r2,__PT_R2(%r11) # store return value
.Lsysc_tracenogo:
TSTMSK __TI_flags(%r12),_TIF_TRACE
@@ -609,7 +552,7 @@ ENTRY(ret_from_fork)
lmg %r9,%r10,__PT_R9(%r11) # load gprs
ENTRY(kernel_thread_starter)
la %r2,0(%r10)
- BASR_R14_R9
+ BASR_EX %r14,%r9
j .Lsysc_tracenogo
/*
@@ -685,7 +628,7 @@ ENTRY(pgm_check_handler)
je .Lpgm_return
lgf %r9,0(%r10,%r1) # load address of handler routine
lgr %r2,%r11 # pass pointer to pt_regs
- BASR_R14_R9 # branch to interrupt-handler
+ BASR_EX %r14,%r9 # branch to interrupt-handler
.Lpgm_return:
LOCKDEP_SYS_EXIT
tm __PT_PSW+1(%r11),0x01 # returning to user ?
@@ -962,7 +905,7 @@ ENTRY(psw_idle)
stpt __TIMER_IDLE_ENTER(%r2)
.Lpsw_idle_lpsw:
lpswe __SF_EMPTY(%r15)
- BR_R1USE_R14
+ BR_EX %r14
.Lpsw_idle_end:
/*
@@ -1007,7 +950,7 @@ ENTRY(save_fpu_regs)
.Lsave_fpu_regs_done:
oi __LC_CPU_FLAGS+7,_CIF_FPU
.Lsave_fpu_regs_exit:
- BR_R1USE_R14
+ BR_EX %r14
.Lsave_fpu_regs_end:
/*
@@ -1054,7 +997,7 @@ load_fpu_regs:
.Lload_fpu_regs_done:
ni __LC_CPU_FLAGS+7,255-_CIF_FPU
.Lload_fpu_regs_exit:
- BR_R1USE_R14
+ BR_EX %r14
.Lload_fpu_regs_end:
.L__critical_end:
@@ -1227,7 +1170,7 @@ cleanup_critical:
jl 0f
clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end
jl .Lcleanup_load_fpu_regs
-0: BR_R11USE_R14
+0: BR_EX %r14
.align 8
.Lcleanup_table:
@@ -1257,7 +1200,7 @@ cleanup_critical:
ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
larl %r9,sie_exit # skip forward to sie_exit
- BR_R11USE_R14
+ BR_EX %r14
#endif
.Lcleanup_system_call:
@@ -1315,7 +1258,7 @@ cleanup_critical:
stg %r15,56(%r11) # r15 stack pointer
# set new psw address and exit
larl %r9,.Lsysc_do_svc
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_system_call_insn:
.quad system_call
.quad .Lsysc_stmg
@@ -1325,7 +1268,7 @@ cleanup_critical:
.Lcleanup_sysc_tif:
larl %r9,.Lsysc_tif
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_sysc_restore:
# check if stpt has been executed
@@ -1342,14 +1285,14 @@ cleanup_critical:
mvc 0(64,%r11),__PT_R8(%r9)
lmg %r0,%r7,__PT_R0(%r9)
1: lmg %r8,%r9,__LC_RETURN_PSW
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_sysc_restore_insn:
.quad .Lsysc_exit_timer
.quad .Lsysc_done - 4
.Lcleanup_io_tif:
larl %r9,.Lio_tif
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_io_restore:
# check if stpt has been executed
@@ -1363,7 +1306,7 @@ cleanup_critical:
mvc 0(64,%r11),__PT_R8(%r9)
lmg %r0,%r7,__PT_R0(%r9)
1: lmg %r8,%r9,__LC_RETURN_PSW
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_io_restore_insn:
.quad .Lio_exit_timer
.quad .Lio_done - 4
@@ -1415,17 +1358,17 @@ cleanup_critical:
# prepare return psw
nihh %r8,0xfcfd # clear irq & wait state bits
lg %r9,48(%r11) # return from psw_idle
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_idle_insn:
.quad .Lpsw_idle_lpsw
.Lcleanup_save_fpu_regs:
larl %r9,save_fpu_regs
- BR_R11USE_R14
+ BR_EX %r14,%r11
.Lcleanup_load_fpu_regs:
larl %r9,load_fpu_regs
- BR_R11USE_R14
+ BR_EX %r14,%r11
/*
* Integer constants
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Herrenschmidt <[email protected]>
commit 349524bc0da698ec77f2057cf4a4948eb6349265 upstream.
This causes warnings from cpufreq mutex code. This is also rather
unnecessary and ineffective. If we really want to prevent concurrent
unplug, we could take the unplug read lock but I don't see this being
critical.
Fixes: cd77b5ce208c ("powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo")
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Michal Suchanek <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/kernel/setup-common.c | 11 -----------
1 file changed, 11 deletions(-)
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -217,14 +217,6 @@ static int show_cpuinfo(struct seq_file
unsigned short maj;
unsigned short min;
- /* We only show online cpus: disable preempt (overzealous, I
- * knew) to prevent cpu going down. */
- preempt_disable();
- if (!cpu_online(cpu_id)) {
- preempt_enable();
- return 0;
- }
-
#ifdef CONFIG_SMP
pvr = per_cpu(cpu_pvr, cpu_id);
#else
@@ -329,9 +321,6 @@ static int show_cpuinfo(struct seq_file
#ifdef CONFIG_SMP
seq_printf(m, "\n");
#endif
-
- preempt_enable();
-
/* If this is the last cpu, print the summary */
if (cpumask_next(cpu_id, cpu_online_mask) >= nr_cpu_ids)
show_cpuinfo_summary(m);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liu Bo <[email protected]>
commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream.
If a btree block, aka. extent buffer, is not available in the extent
buffer cache, it'll be read out from the disk instead, i.e.
btrfs_search_slot()
read_block_for_search() # hold parent and its lock, go to read child
btrfs_release_path()
read_tree_block() # read child
Unfortunately, the parent lock got released before reading child, so
commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had
used 0 as parent transid to read the child block. It forces
read_tree_block() not to check if parent transid is different with the
generation id of the child that it reads out from disk.
A simple PoC is included in btrfs/124,
0. A two-disk raid1 btrfs,
1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
2. Mount this filesystem and put it in use, after a while, device tree's
root got COW but block A hasn't been allocated/overwritten yet.
3. Umount it and reload the btrfs module to remove both disks from the
global @fs_devices list.
4. mount -odegraded dev1 and write some data, so now block A is allocated
to be a leaf in checksum tree. Note that only dev1 has the latest
metadata of this filesystem.
5. Umount it and mount it again normally (with both disks), since raid1
can pick up one disk by the writer task's pid, if btrfs_search_slot()
needs to read block A, dev2 which does NOT have the latest metadata
might be read for block A, then we got a stale block A.
6. As parent transid is not checked, block A is marked as uptodate and
put into the extent buffer cache, so the future search won't bother
to read disk again, which means it'll make changes on this stale
one and make it dirty and flush it onto disk.
To avoid the problem, parent transid needs to be passed to
read_tree_block().
In order to get a valid parent transid, we need to hold the parent's
lock until finishing reading child.
This patch needs to be slightly adapted for stable kernels, the
&first_key parameter added to read_tree_block() is from 4.16+
(581c1760415c4). The fix is to replace 0 by 'gen'.
Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race")
CC: [email protected] # 4.4+
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
[ update changelog ]
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/ctree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2497,10 +2497,8 @@ read_block_for_search(struct btrfs_trans
if (p->reada)
reada_for_search(root, p, level, slot, key->objectid);
- btrfs_release_path(p);
-
ret = -EAGAIN;
- tmp = read_tree_block(root, blocknr, 0);
+ tmp = read_tree_block(root, blocknr, gen);
if (!IS_ERR(tmp)) {
/*
* If the read above didn't mark this buffer up to date,
@@ -2512,6 +2510,8 @@ read_block_for_search(struct btrfs_trans
ret = -EIO;
free_extent_buffer(tmp);
}
+
+ btrfs_release_path(p);
return ret;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana <[email protected]>
commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream.
If a file has xattrs, we fsync it, to ensure we clear the flags
BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
inode, the current transaction commits and then we fsync it (without
either of those bits being set in its inode), we end up not logging
all its xattrs. This results in deleting all xattrs when replying the
log after a power failure.
Trivial reproducer
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foobar
$ setfattr -n user.xa -v qwerty /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
$ sync
$ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
<power failure>
$ mount /dev/sdb /mnt
$ getfattr --absolute-names --dump /mnt/foobar
<empty output>
$
So fix this by making sure all xattrs are logged if we log a file's inode
item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
BTRFS_INODE_COPY_EVERYTHING were set in the inode.
Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path")
Cc: <[email protected]> # 4.2+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/tree-log.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4568,6 +4568,7 @@ static int btrfs_log_inode(struct btrfs_
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
u64 logged_isize = 0;
bool need_log_inode_item = true;
+ bool xattrs_logged = false;
path = btrfs_alloc_path();
if (!path)
@@ -4808,6 +4809,7 @@ next_slot:
err = btrfs_log_all_xattrs(trans, root, inode, path, dst_path);
if (err)
goto out_unlock;
+ xattrs_logged = true;
if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) {
btrfs_release_path(path);
btrfs_release_path(dst_path);
@@ -4820,6 +4822,11 @@ log_extents:
btrfs_release_path(dst_path);
if (need_log_inode_item) {
err = log_inode_item(trans, log, dst_path, inode);
+ if (!err && !xattrs_logged) {
+ err = btrfs_log_all_xattrs(trans, root, inode, path,
+ dst_path);
+ btrfs_release_path(path);
+ }
if (err)
goto out_unlock;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn <[email protected]>
[ Upstream commit 113f99c3358564a0647d444c2ae34e8b1abfd5b9 ]
Device features may change during transmission. In particular with
corking, a device may toggle scatter-gather in between allocating
and writing to an skb.
Do not unconditionally assume that !NETIF_F_SG at write time implies
that the same held at alloc time and thus the skb has sufficient
tailroom.
This issue predates git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_output.c | 3 ++-
net/ipv6/ip6_output.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1062,7 +1062,8 @@ alloc_new_skb:
if (copy > length)
copy = length;
- if (!(rt->dst.dev->features&NETIF_F_SG)) {
+ if (!(rt->dst.dev->features&NETIF_F_SG) &&
+ skb_tailroom(skb) >= copy) {
unsigned int off;
off = skb->len;
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1529,7 +1529,8 @@ alloc_new_skb:
if (copy > length)
copy = length;
- if (!(rt->dst.dev->features&NETIF_F_SG)) {
+ if (!(rt->dst.dev->features&NETIF_F_SG) &&
+ skb_tailroom(skb) >= copy) {
unsigned int off;
off = skb->len;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <[email protected]>
[ Upstream commit 9709020c86f6bf8439ca3effc58cfca49a5de192 ]
We must not call sock_diag_has_destroy_listeners(sk) on a socket
that has no reference on net structure.
BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
__sk_free+0x329/0x340 net/core/sock.c:1609
sk_free+0x42/0x50 net/core/sock.c:1623
sock_put include/net/sock.h:1664 [inline]
reqsk_free include/net/request_sock.h:116 [inline]
reqsk_put include/net/request_sock.h:124 [inline]
inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
cpuidle_idle_call kernel/sched/idle.c:153 [inline]
do_idle+0x395/0x560 kernel/sched/idle.c:262
cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Allocated by task 4557:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
kmem_cache_zalloc include/linux/slab.h:691 [inline]
net_alloc net/core/net_namespace.c:383 [inline]
copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
ksys_unshare+0x708/0xf90 kernel/fork.c:2408
__do_sys_unshare kernel/fork.c:2476 [inline]
__se_sys_unshare kernel/fork.c:2474 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 69:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
net_free net/core/net_namespace.c:399 [inline]
net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
net_drop_ns net/core/net_namespace.c:405 [inline]
cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff88018a02c140
which belongs to the cache net_namespace of size 8832
The buggy address is located 8800 bytes inside of
8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
The buggy address belongs to the page:
page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
page dumped because: kasan: bad access detected
Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Craig Gallek <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1474,7 +1474,7 @@ void sk_destruct(struct sock *sk)
static void __sk_free(struct sock *sk)
{
- if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt))
+ if (unlikely(sk->sk_net_refcnt && sock_diag_has_destroy_listeners(sk)))
sock_diag_broadcast_destroy(sk);
else
sk_destruct(sk);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anand Jain <[email protected]>
commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream.
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the bug:
kernel: kernel BUG at fs/btrfs/volumes.c:3890!
::
kernel: balance_kthread+0x51/0x60 [btrfs]
kernel: kthread+0x111/0x130
::
kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
Reproducer:
On a mounted filesystem:
btrfs balance start --full-balance /btrfs
btrfs balance pause /btrfs
mount -o remount,ro /dev/sdb /btrfs
mount -o remount,rw /dev/sdb /btrfs
To fix this set the BTRFS_BALANCE_RESUME flag in
btrfs_resume_balance_async().
CC: [email protected] # 4.4+
Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/volumes.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3850,6 +3850,15 @@ int btrfs_resume_balance_async(struct bt
return 0;
}
+ /*
+ * A ro->rw remount sequence should continue with the paused balance
+ * regardless of who pauses it, system or the user as of now, so set
+ * the resume flag.
+ */
+ spin_lock(&fs_info->balance_lock);
+ fs_info->balance_ctl->flags |= BTRFS_BALANCE_RESUME;
+ spin_unlock(&fs_info->balance_lock);
+
tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance");
return PTR_ERR_OR_ZERO(tsk);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn <[email protected]>
[ Upstream commit b84bbaf7a6c8cca24f8acf25a2c8e46913a947ba ]
Packet sockets allow construction of packets shorter than
dev->hard_header_len to accommodate protocols with variable length
link layer headers. These packets are padded to dev->hard_header_len,
because some device drivers interpret that as a minimum packet size.
packet_snd reserves dev->hard_header_len bytes on allocation.
SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
link layer headers are stored in the reserved range. SOCK_RAW sockets
do the same in tpacket_snd, but not in packet_snd.
Syzbot was able to send a zero byte packet to a device with massive
116B link layer header, causing padding to cross over into skb_shinfo.
Fix this by writing from the start of the llheader reserved range also
in the case of packet_snd/SOCK_RAW.
Update skb_set_network_header to the new offset. This also corrects
it for SOCK_DGRAM, where it incorrectly double counted reserve due to
the skb_push in dev_hard_header.
Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: [email protected]
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2771,13 +2771,15 @@ static int packet_snd(struct socket *soc
if (skb == NULL)
goto out_unlock;
- skb_set_network_header(skb, reserve);
+ skb_reset_network_header(skb);
err = -EINVAL;
if (sock->type == SOCK_DGRAM) {
offset = dev_hard_header(skb, dev, ntohs(proto), addr, NULL, len);
if (unlikely(offset < 0))
goto out_free;
+ } else if (reserve) {
+ skb_push(skb, reserve);
}
/* Returns -EFAULT on error */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu <[email protected]>
commit 0d73c3f8e7f6ee2aab1bb350f60c180f5ae21a2c upstream.
Since do_undefinstr() uses get_user to get the undefined
instruction, it can be called before kprobes processes
recursive check. This can cause an infinit recursive
exception.
Prohibit probing on get_user functions.
Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/include/asm/assembler.h | 10 ++++++++++
arch/arm/lib/getuser.S | 10 ++++++++++
2 files changed, 20 insertions(+)
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -530,4 +530,14 @@ THUMB( orr \reg , \reg , #PSR_T_BIT )
#endif
.endm
+#ifdef CONFIG_KPROBES
+#define _ASM_NOKPROBE(entry) \
+ .pushsection "_kprobe_blacklist", "aw" ; \
+ .balign 4 ; \
+ .long entry; \
+ .popsection
+#else
+#define _ASM_NOKPROBE(entry)
+#endif
+
#endif /* __ASM_ASSEMBLER_H__ */
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -38,6 +38,7 @@ ENTRY(__get_user_1)
mov r0, #0
ret lr
ENDPROC(__get_user_1)
+_ASM_NOKPROBE(__get_user_1)
ENTRY(__get_user_2)
check_uaccess r0, 2, r1, r2, __get_user_bad
@@ -58,6 +59,7 @@ rb .req r0
mov r0, #0
ret lr
ENDPROC(__get_user_2)
+_ASM_NOKPROBE(__get_user_2)
ENTRY(__get_user_4)
check_uaccess r0, 4, r1, r2, __get_user_bad
@@ -65,6 +67,7 @@ ENTRY(__get_user_4)
mov r0, #0
ret lr
ENDPROC(__get_user_4)
+_ASM_NOKPROBE(__get_user_4)
ENTRY(__get_user_8)
check_uaccess r0, 8, r1, r2, __get_user_bad8
@@ -78,6 +81,7 @@ ENTRY(__get_user_8)
mov r0, #0
ret lr
ENDPROC(__get_user_8)
+_ASM_NOKPROBE(__get_user_8)
#ifdef __ARMEB__
ENTRY(__get_user_32t_8)
@@ -91,6 +95,7 @@ ENTRY(__get_user_32t_8)
mov r0, #0
ret lr
ENDPROC(__get_user_32t_8)
+_ASM_NOKPROBE(__get_user_32t_8)
ENTRY(__get_user_64t_1)
check_uaccess r0, 1, r1, r2, __get_user_bad8
@@ -98,6 +103,7 @@ ENTRY(__get_user_64t_1)
mov r0, #0
ret lr
ENDPROC(__get_user_64t_1)
+_ASM_NOKPROBE(__get_user_64t_1)
ENTRY(__get_user_64t_2)
check_uaccess r0, 2, r1, r2, __get_user_bad8
@@ -114,6 +120,7 @@ rb .req r0
mov r0, #0
ret lr
ENDPROC(__get_user_64t_2)
+_ASM_NOKPROBE(__get_user_64t_2)
ENTRY(__get_user_64t_4)
check_uaccess r0, 4, r1, r2, __get_user_bad8
@@ -121,6 +128,7 @@ ENTRY(__get_user_64t_4)
mov r0, #0
ret lr
ENDPROC(__get_user_64t_4)
+_ASM_NOKPROBE(__get_user_64t_4)
#endif
__get_user_bad8:
@@ -131,6 +139,8 @@ __get_user_bad:
ret lr
ENDPROC(__get_user_bad)
ENDPROC(__get_user_bad8)
+_ASM_NOKPROBE(__get_user_bad)
+_ASM_NOKPROBE(__get_user_bad8)
.pushsection __ex_table, "a"
.long 1b, __get_user_bad
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu <[email protected]>
commit eb0146daefdde65665b7f076fbff7b49dade95b9 upstream.
Prohibit kprobes on do_undefinstr because kprobes on
arm is implemented by undefined instruction. This means
if we probe do_undefinstr(), it can cause infinit
recursive exception.
Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/kernel/traps.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -19,6 +19,7 @@
#include <linux/uaccess.h>
#include <linux/hardirq.h>
#include <linux/kdebug.h>
+#include <linux/kprobes.h>
#include <linux/module.h>
#include <linux/kexec.h>
#include <linux/bug.h>
@@ -395,7 +396,8 @@ void unregister_undef_hook(struct undef_
raw_spin_unlock_irqrestore(&undef_lock, flags);
}
-static int call_undef_hook(struct pt_regs *regs, unsigned int instr)
+static nokprobe_inline
+int call_undef_hook(struct pt_regs *regs, unsigned int instr)
{
struct undef_hook *hook;
unsigned long flags;
@@ -468,6 +470,7 @@ die_sig:
arm_notify_die("Oops - undefined instruction", regs, &info, 0, 6);
}
+NOKPROBE_SYMBOL(do_undefinstr)
/*
* Handle FIQ similarly to NMI on x86 systems.
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu <[email protected]>
commit 70948c05fdde0aac32f9667856a88725c192fa40 upstream.
Prohibit probing on optimized_callback() because
it is called from kprobes itself. If we put a kprobes
on it, that will cause a recursive call loop.
Mark it NOKPROBE_SYMBOL.
Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/probes/kprobes/opt-arm.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/arm/probes/kprobes/opt-arm.c
+++ b/arch/arm/probes/kprobes/opt-arm.c
@@ -192,6 +192,7 @@ optimized_callback(struct optimized_kpro
local_irq_restore(flags);
}
+NOKPROBE_SYMBOL(optimized_callback)
int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *orig)
{
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui <[email protected]>
commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.
for_each_cpu() unintuitively reports CPU0 as set independent of the actual
cpumask content on UP kernels. This causes an unexpected PIT interrupt
storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
a result, the virtual machine can suffer from a strange random delay of 1~20
minutes during boot-up, and sometimes it can hang forever.
Protect if by checking whether the cpumask is empty before entering the
for_each_cpu() loop.
[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Josh Poulson <[email protected]>
Cc: "Michael Kelley (EOSG)" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: [email protected]
Cc: Rakib Mullick <[email protected]>
Cc: Jork Loeser <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: KY Srinivasan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/time/tick-broadcast.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -610,6 +610,14 @@ static void tick_handle_oneshot_broadcas
now = ktime_get();
/* Find all expired events */
for_each_cpu(cpu, tick_broadcast_oneshot_mask) {
+ /*
+ * Required for !SMP because for_each_cpu() reports
+ * unconditionally CPU0 as set on UP kernels.
+ */
+ if (!IS_ENABLED(CONFIG_SMP) &&
+ cpumask_empty(tick_broadcast_oneshot_mask))
+ break;
+
td = &per_cpu(tick_cpu_device, cpu);
if (td->evtdev->next_event.tv64 <= now.tv64) {
cpumask_set_cpu(cpu, tmpmask);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel <[email protected]>
commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream.
Mixed mode allows a kernel built for x86_64 to interact with 32-bit
EFI firmware, but requires us to define all struct definitions carefully
when it comes to pointer sizes.
'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
'romimage' field, which will be interpreted as a 64-bit field
on such kernels, potentially resulting in bogus memory references
and subsequent crashes.
Tested-by: Hans de Goede <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Cc: <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/boot/compressed/eboot.c | 6 ++++--
include/linux/efi.h | 8 ++++----
2 files changed, 8 insertions(+), 6 deletions(-)
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -364,7 +364,8 @@ __setup_efi_pci32(efi_pci_io_protocol_32
if (status != EFI_SUCCESS)
goto free_struct;
- memcpy(rom->romdata, pci->romimage, pci->romsize);
+ memcpy(rom->romdata, (void *)(unsigned long)pci->romimage,
+ pci->romsize);
return status;
free_struct:
@@ -470,7 +471,8 @@ __setup_efi_pci64(efi_pci_io_protocol_64
if (status != EFI_SUCCESS)
goto free_struct;
- memcpy(rom->romdata, pci->romimage, pci->romsize);
+ memcpy(rom->romdata, (void *)(unsigned long)pci->romimage,
+ pci->romsize);
return status;
free_struct:
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -364,8 +364,8 @@ typedef struct {
u32 attributes;
u32 get_bar_attributes;
u32 set_bar_attributes;
- uint64_t romsize;
- void *romimage;
+ u64 romsize;
+ u32 romimage;
} efi_pci_io_protocol_32;
typedef struct {
@@ -384,8 +384,8 @@ typedef struct {
u64 attributes;
u64 get_bar_attributes;
u64 set_bar_attributes;
- uint64_t romsize;
- void *romimage;
+ u64 romsize;
+ u64 romimage;
} efi_pci_io_protocol_64;
typedef struct {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hendrik Brueckner <[email protected]>
commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream.
Correct a trinity finding for the perf_event_open() system call with
a perf event attribute structure that uses a frequency but has the
sampling frequency set to zero. This causes a FP divide exception during
the sample rate initialization for the hardware sampling facility.
Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility")
Cc: [email protected] # 3.14+
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/perf_cpum_sf.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -744,6 +744,10 @@ static int __hw_perf_event_init(struct p
*/
rate = 0;
if (attr->freq) {
+ if (!attr->sample_freq) {
+ err = -EINVAL;
+ goto out;
+ }
rate = freq_to_sample_rate(&si, attr->sample_freq);
rate = hw_limit_rate(&si, rate);
attr->freq = 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith <[email protected]>
commit e4d54f71d29997344b4c4c8d47708240f9f23a5c upstream.
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
be cared about or supported by mainline kernels.
So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
exclusively.
Signed-off-by: Stewart Smith <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/include/asm/firmware.h | 3 -
arch/powerpc/platforms/powernv/eeh-powernv.c | 4 -
arch/powerpc/platforms/powernv/idle.c | 2
arch/powerpc/platforms/powernv/opal-xscom.c | 2
arch/powerpc/platforms/powernv/opal.c | 25 ++++-----
arch/powerpc/platforms/powernv/pci-ioda.c | 2
arch/powerpc/platforms/powernv/setup.c | 8 +-
arch/powerpc/platforms/powernv/smp.c | 74 +++++++++++----------------
drivers/cpufreq/powernv-cpufreq.c | 2
drivers/cpuidle/cpuidle-powernv.c | 2
10 files changed, 54 insertions(+), 70 deletions(-)
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,7 +51,6 @@
#define FW_FEATURE_BEST_ENERGY ASM_CONST(0x0000000080000000)
#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
#define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000)
-#define FW_FEATURE_OPALv3 ASM_CONST(0x0000000400000000)
#ifndef __ASSEMBLY__
@@ -69,7 +68,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
FW_FEATURE_PSERIES_ALWAYS = 0,
- FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv3,
+ FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -48,8 +48,8 @@ static int pnv_eeh_init(void)
struct pci_controller *hose;
struct pnv_phb *phb;
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
- pr_warn("%s: OPALv3 is required !\n",
+ if (!firmware_has_feature(FW_FEATURE_OPAL)) {
+ pr_warn("%s: OPAL is required !\n",
__func__);
return -EINVAL;
}
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -242,7 +242,7 @@ static int __init pnv_init_idle_states(v
if (cpuidle_disable != IDLE_NO_OVERRIDE)
goto out;
- if (!firmware_has_feature(FW_FEATURE_OPALv3))
+ if (!firmware_has_feature(FW_FEATURE_OPAL))
goto out;
power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
--- a/arch/powerpc/platforms/powernv/opal-xscom.c
+++ b/arch/powerpc/platforms/powernv/opal-xscom.c
@@ -126,7 +126,7 @@ static const struct scom_controller opal
static int opal_xscom_init(void)
{
- if (firmware_has_feature(FW_FEATURE_OPALv3))
+ if (firmware_has_feature(FW_FEATURE_OPAL))
scom_init(&opal_scom_controller);
return 0;
}
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -98,10 +98,9 @@ int __init early_init_dt_scan_opal(unsig
pr_debug("OPAL Entry = 0x%llx (sizep=%p runtimesz=%d)\n",
opal.size, sizep, runtimesz);
- powerpc_firmware_features |= FW_FEATURE_OPAL;
if (of_flat_dt_is_compatible(node, "ibm,opal-v3")) {
- powerpc_firmware_features |= FW_FEATURE_OPALv3;
- pr_info("OPAL V3 detected !\n");
+ powerpc_firmware_features |= FW_FEATURE_OPAL;
+ pr_info("OPAL detected !\n");
} else {
panic("OPAL != V3 detected, no longer supported.\n");
}
@@ -348,17 +347,15 @@ int opal_put_chars(uint32_t vtermno, con
* enough room and be done with it
*/
spin_lock_irqsave(&opal_write_lock, flags);
- if (firmware_has_feature(FW_FEATURE_OPALv3)) {
- rc = opal_console_write_buffer_space(vtermno, &olen);
- len = be64_to_cpu(olen);
- if (rc || len < total_len) {
- spin_unlock_irqrestore(&opal_write_lock, flags);
- /* Closed -> drop characters */
- if (rc)
- return total_len;
- opal_poll_events(NULL);
- return -EAGAIN;
- }
+ rc = opal_console_write_buffer_space(vtermno, &olen);
+ len = be64_to_cpu(olen);
+ if (rc || len < total_len) {
+ spin_unlock_irqrestore(&opal_write_lock, flags);
+ /* Closed -> drop characters */
+ if (rc)
+ return total_len;
+ opal_poll_events(NULL);
+ return -EAGAIN;
}
/* We still try to handle partial completions, though they
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -344,7 +344,7 @@ static void __init pnv_ioda_parse_m64_wi
return;
}
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+ if (!firmware_has_feature(FW_FEATURE_OPAL)) {
pr_info(" Firmware too old to support M64 window\n");
return;
}
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -140,8 +140,8 @@ static void pnv_show_cpuinfo(struct seq_
if (root)
model = of_get_property(root, "model", NULL);
seq_printf(m, "machine\t\t: PowerNV %s\n", model);
- if (firmware_has_feature(FW_FEATURE_OPALv3))
- seq_printf(m, "firmware\t: OPAL v3\n");
+ if (firmware_has_feature(FW_FEATURE_OPAL))
+ seq_printf(m, "firmware\t: OPAL\n");
else
seq_printf(m, "firmware\t: BML\n");
of_node_put(root);
@@ -270,9 +270,9 @@ static void pnv_kexec_cpu_down(int crash
{
xics_kexec_teardown_cpu(secondary);
- /* On OPAL v3, we return all CPUs to firmware */
+ /* On OPAL, we return all CPUs to firmware */
- if (!firmware_has_feature(FW_FEATURE_OPALv3))
+ if (!firmware_has_feature(FW_FEATURE_OPAL))
return;
if (secondary) {
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -61,14 +61,15 @@ static int pnv_smp_kick_cpu(int nr)
unsigned long start_here =
__pa(ppc_function_entry(generic_secondary_smp_init));
long rc;
+ uint8_t status;
BUG_ON(nr < 0 || nr >= NR_CPUS);
/*
- * If we already started or OPALv3 is not supported, we just
+ * If we already started or OPAL is not supported, we just
* kick the CPU via the PACA
*/
- if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv3))
+ if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPAL))
goto kick;
/*
@@ -77,55 +78,42 @@ static int pnv_smp_kick_cpu(int nr)
* first time. OPAL v3 allows us to query OPAL to know if it
* has the CPUs, so we do that
*/
- if (firmware_has_feature(FW_FEATURE_OPALv3)) {
- uint8_t status;
-
- rc = opal_query_cpu_status(pcpu, &status);
- if (rc != OPAL_SUCCESS) {
- pr_warn("OPAL Error %ld querying CPU %d state\n",
- rc, nr);
- return -ENODEV;
- }
+ rc = opal_query_cpu_status(pcpu, &status);
+ if (rc != OPAL_SUCCESS) {
+ pr_warn("OPAL Error %ld querying CPU %d state\n", rc, nr);
+ return -ENODEV;
+ }
- /*
- * Already started, just kick it, probably coming from
- * kexec and spinning
- */
- if (status == OPAL_THREAD_STARTED)
- goto kick;
+ /*
+ * Already started, just kick it, probably coming from
+ * kexec and spinning
+ */
+ if (status == OPAL_THREAD_STARTED)
+ goto kick;
- /*
- * Available/inactive, let's kick it
- */
- if (status == OPAL_THREAD_INACTIVE) {
- pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n",
- nr, pcpu);
- rc = opal_start_cpu(pcpu, start_here);
- if (rc != OPAL_SUCCESS) {
- pr_warn("OPAL Error %ld starting CPU %d\n",
- rc, nr);
- return -ENODEV;
- }
- } else {
- /*
- * An unavailable CPU (or any other unknown status)
- * shouldn't be started. It should also
- * not be in the possible map but currently it can
- * happen
- */
- pr_devel("OPAL: CPU %d (HW 0x%x) is unavailable"
- " (status %d)...\n", nr, pcpu, status);
+ /*
+ * Available/inactive, let's kick it
+ */
+ if (status == OPAL_THREAD_INACTIVE) {
+ pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu);
+ rc = opal_start_cpu(pcpu, start_here);
+ if (rc != OPAL_SUCCESS) {
+ pr_warn("OPAL Error %ld starting CPU %d\n", rc, nr);
return -ENODEV;
}
} else {
/*
- * On OPAL v2, we just kick it and hope for the best,
- * we must not test the error from opal_start_cpu() or
- * we would fail to get CPUs from kexec.
+ * An unavailable CPU (or any other unknown status)
+ * shouldn't be started. It should also
+ * not be in the possible map but currently it can
+ * happen
*/
- opal_start_cpu(pcpu, start_here);
+ pr_devel("OPAL: CPU %d (HW 0x%x) is unavailable"
+ " (status %d)...\n", nr, pcpu, status);
+ return -ENODEV;
}
- kick:
+
+kick:
return smp_generic_kick_cpu(nr);
}
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -592,7 +592,7 @@ static int __init powernv_cpufreq_init(v
int rc = 0;
/* Don't probe on pseries (guest) platforms */
- if (!firmware_has_feature(FW_FEATURE_OPALv3))
+ if (!firmware_has_feature(FW_FEATURE_OPAL))
return -ENODEV;
/* Discover pstates from device tree and init */
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -282,7 +282,7 @@ static int powernv_idle_probe(void)
if (cpuidle_disable != IDLE_NO_OVERRIDE)
return -ENODEV;
- if (firmware_has_feature(FW_FEATURE_OPALv3)) {
+ if (firmware_has_feature(FW_FEATURE_OPAL)) {
cpuidle_state_table = powernv_states;
/* Device tree can indicate more idle states */
max_idle_state = powernv_add_idle_states();
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky <[email protected]>
commit 9f18fff63cfd6f559daa1eaae60640372c65f84b upstream.
The inline assembly to call __do_softirq on the irq stack uses
an indirect branch. This can be replaced with a normal relative
branch.
Cc: [email protected] # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Reviewed-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/irq.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -173,10 +173,9 @@ void do_softirq_own_stack(void)
new -= STACK_FRAME_OVERHEAD;
((struct stack_frame *) new)->back_chain = old;
asm volatile(" la 15,0(%0)\n"
- " basr 14,%2\n"
+ " brasl 14,__do_softirq\n"
" la 15,0(%1)\n"
- : : "a" (new), "a" (old),
- "a" (__do_softirq)
+ : : "a" (new), "a" (old)
: "0", "1", "2", "3", "4", "5", "14",
"cc", "memory" );
} else {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mateusz Guzik <[email protected]>
commit a3b609ef9f8b1dbfe97034ccad6cd3fe71fbe7ab upstream.
Only functions doing more than one read are modified. Consumeres
happened to deal with possibly changing data, but it does not seem like
a good thing to rely on.
Signed-off-by: Mateusz Guzik <[email protected]>
Acked-by: Cyrill Gorcunov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Jarod Wilson <[email protected]>
Cc: Jan Stancek <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/proc/base.c | 13 ++++++++++---
mm/util.c | 16 ++++++++++++----
2 files changed, 22 insertions(+), 7 deletions(-)
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -953,6 +953,7 @@ static ssize_t environ_read(struct file
unsigned long src = *ppos;
int ret = 0;
struct mm_struct *mm = file->private_data;
+ unsigned long env_start, env_end;
/* Ensure the process spawned far enough to have an environment. */
if (!mm || !mm->env_end)
@@ -965,19 +966,25 @@ static ssize_t environ_read(struct file
ret = 0;
if (!atomic_inc_not_zero(&mm->mm_users))
goto free;
+
+ down_read(&mm->mmap_sem);
+ env_start = mm->env_start;
+ env_end = mm->env_end;
+ up_read(&mm->mmap_sem);
+
while (count > 0) {
size_t this_len, max_len;
int retval;
- if (src >= (mm->env_end - mm->env_start))
+ if (src >= (env_end - env_start))
break;
- this_len = mm->env_end - (mm->env_start + src);
+ this_len = env_end - (env_start + src);
max_len = min_t(size_t, PAGE_SIZE, count);
this_len = min(max_len, this_len);
- retval = access_remote_vm(mm, (mm->env_start + src),
+ retval = access_remote_vm(mm, (env_start + src),
page, this_len, 0);
if (retval <= 0) {
--- a/mm/util.c
+++ b/mm/util.c
@@ -428,17 +428,25 @@ int get_cmdline(struct task_struct *task
int res = 0;
unsigned int len;
struct mm_struct *mm = get_task_mm(task);
+ unsigned long arg_start, arg_end, env_start, env_end;
if (!mm)
goto out;
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */
- len = mm->arg_end - mm->arg_start;
+ down_read(&mm->mmap_sem);
+ arg_start = mm->arg_start;
+ arg_end = mm->arg_end;
+ env_start = mm->env_start;
+ env_end = mm->env_end;
+ up_read(&mm->mmap_sem);
+
+ len = arg_end - arg_start;
if (len > buflen)
len = buflen;
- res = access_process_vm(task, mm->arg_start, buffer, len, 0);
+ res = access_process_vm(task, arg_start, buffer, len, 0);
/*
* If the nul at the end of args has been overwritten, then
@@ -449,10 +457,10 @@ int get_cmdline(struct task_struct *task
if (len < res) {
res = len;
} else {
- len = mm->env_end - mm->env_start;
+ len = env_end - env_start;
if (len > buflen - res)
len = buflen - res;
- res += access_process_vm(task, mm->env_start,
+ res += access_process_vm(task, env_start,
buffer+res, len, 0);
res = strnlen(buffer, res);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann <[email protected]>
commit e521813468f786271a87e78e8644243bead48fad upstream.
Ever since CQ/QAOB support was added, calling qdio_free() straight after
qdio_alloc() results in qdio_release_memory() accessing uninitialized
memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a
kmem_cache_free() on the random AOB addresses.
For older kernels that don't have 6e30c549f6ca, the same applies if
qdio_establish() fails in the DEV_STATE_ONLINE check.
While initializing q->u.out.use_cq would be enough to fix this
particular bug, the more future-proof change is to just zero-alloc the
whole struct.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: <[email protected]> #v3.2+
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/s390/cio/qdio_setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/cio/qdio_setup.c
+++ b/drivers/s390/cio/qdio_setup.c
@@ -140,7 +140,7 @@ static int __qdio_allocate_qs(struct qdi
int i;
for (i = 0; i < nr_queues; i++) {
- q = kmem_cache_alloc(qdio_q_cache, GFP_KERNEL);
+ q = kmem_cache_zalloc(qdio_q_cache, GFP_KERNEL);
if (!q)
return -ENOMEM;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada <[email protected]>
commit 7791e4aa59ad724e0b4c8b4dea547a5735108972 upstream.
If the processor supports HWP, enable it by default without checking
for the cpu model. This will allow to enable HWP in all supported
processors without driver change.
Signed-off-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Thomas Renninger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1361,6 +1361,11 @@ static inline bool intel_pstate_platform
static inline bool intel_pstate_has_acpi_ppc(void) { return false; }
#endif /* CONFIG_ACPI */
+static const struct x86_cpu_id hwp_support_ids[] __initconst = {
+ { X86_VENDOR_INTEL, 6, X86_MODEL_ANY, X86_FEATURE_HWP },
+ {}
+};
+
static int __init intel_pstate_init(void)
{
int cpu, rc = 0;
@@ -1370,17 +1375,16 @@ static int __init intel_pstate_init(void
if (no_load)
return -ENODEV;
+ if (x86_match_cpu(hwp_support_ids) && !no_hwp) {
+ copy_cpu_funcs(&core_params.funcs);
+ hwp_active++;
+ goto hwp_cpu_matched;
+ }
+
id = x86_match_cpu(intel_pstate_cpu_ids);
if (!id)
return -ENODEV;
- /*
- * The Intel pstate driver will be ignored if the platform
- * firmware has its own power management modes.
- */
- if (intel_pstate_platform_pwr_mgmt_exists())
- return -ENODEV;
-
cpu_def = (struct cpu_defaults *)id->driver_data;
copy_pid_params(&cpu_def->pid_policy);
@@ -1389,17 +1393,20 @@ static int __init intel_pstate_init(void
if (intel_pstate_msrs_not_valid())
return -ENODEV;
+hwp_cpu_matched:
+ /*
+ * The Intel pstate driver will be ignored if the platform
+ * firmware has its own power management modes.
+ */
+ if (intel_pstate_platform_pwr_mgmt_exists())
+ return -ENODEV;
+
pr_info("Intel P-state driver initializing.\n");
all_cpu_data = vzalloc(sizeof(void *) * num_possible_cpus());
if (!all_cpu_data)
return -ENOMEM;
- if (static_cpu_has_safe(X86_FEATURE_HWP) && !no_hwp) {
- pr_info("intel_pstate: HWP enabled\n");
- hwp_active++;
- }
-
if (!hwp_active && hwp_only)
goto out;
@@ -1410,6 +1417,9 @@ static int __init intel_pstate_init(void
intel_pstate_debug_expose_params();
intel_pstate_sysfs_expose_params();
+ if (hwp_active)
+ pr_info("intel_pstate: HWP enabled\n");
+
return rc;
out:
get_online_cpus();
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Piggin <[email protected]>
commit c1d2a31397ec51f0370f6bd17b19b39152c263cb upstream.
Similarly to opal_event_shutdown, opal_nvram_write can be called in
the crash path with irqs disabled. Special case the delay to avoid
sleeping in invalid context.
Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
Cc: [email protected] # v3.2
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/platforms/powernv/opal-nvram.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal-nvram.c
+++ b/arch/powerpc/platforms/powernv/opal-nvram.c
@@ -44,6 +44,10 @@ static ssize_t opal_nvram_read(char *buf
return count;
}
+/*
+ * This can be called in the panic path with interrupts off, so use
+ * mdelay in that case.
+ */
static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
{
s64 rc = OPAL_BUSY;
@@ -58,10 +62,16 @@ static ssize_t opal_nvram_write(char *bu
while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
rc = opal_write_nvram(__pa(buf), count, off);
if (rc == OPAL_BUSY_EVENT) {
- msleep(OPAL_BUSY_DELAY_MS);
+ if (in_interrupt() || irqs_disabled())
+ mdelay(OPAL_BUSY_DELAY_MS);
+ else
+ msleep(OPAL_BUSY_DELAY_MS);
opal_poll_events(NULL);
} else if (rc == OPAL_BUSY) {
- msleep(OPAL_BUSY_DELAY_MS);
+ if (in_interrupt() || irqs_disabled())
+ mdelay(OPAL_BUSY_DELAY_MS);
+ else
+ msleep(OPAL_BUSY_DELAY_MS);
}
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janis Danisevskis <[email protected]>
commit 1b3044e39a89cb1d4d5313da477e8dfea2b5232d upstream.
The PR_DUMPABLE flag causes the pid related paths of the proc file
system to be owned by ROOT.
The implementation of pthread_set/getname_np however needs access to
/proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this
implementation is locked out.
This patch installs a special permission function for the file "comm"
that grants read and write access to all threads of the same group
regardless of the ownership of the inode. For all other threads the
function falls back to the generic inode permission check.
[[email protected]: fix spello in comment]
Signed-off-by: Janis Danisevskis <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Minfei Huang <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Calvin Owens <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/proc/base.c | 42 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3083,6 +3083,44 @@ int proc_pid_readdir(struct file *file,
}
/*
+ * proc_tid_comm_permission is a special permission function exclusively
+ * used for the node /proc/<pid>/task/<tid>/comm.
+ * It bypasses generic permission checks in the case where a task of the same
+ * task group attempts to access the node.
+ * The rationale behind this is that glibc and bionic access this node for
+ * cross thread naming (pthread_set/getname_np(!self)). However, if
+ * PR_SET_DUMPABLE gets set to 0 this node among others becomes uid=0 gid=0,
+ * which locks out the cross thread naming implementation.
+ * This function makes sure that the node is always accessible for members of
+ * same thread group.
+ */
+static int proc_tid_comm_permission(struct inode *inode, int mask)
+{
+ bool is_same_tgroup;
+ struct task_struct *task;
+
+ task = get_proc_task(inode);
+ if (!task)
+ return -ESRCH;
+ is_same_tgroup = same_thread_group(current, task);
+ put_task_struct(task);
+
+ if (likely(is_same_tgroup && !(mask & MAY_EXEC))) {
+ /* This file (/proc/<pid>/task/<tid>/comm) can always be
+ * read or written by the members of the corresponding
+ * thread group.
+ */
+ return 0;
+ }
+
+ return generic_permission(inode, mask);
+}
+
+static const struct inode_operations proc_tid_comm_inode_operations = {
+ .permission = proc_tid_comm_permission,
+};
+
+/*
* Tasks
*/
static const struct pid_entry tid_base_stuff[] = {
@@ -3100,7 +3138,9 @@ static const struct pid_entry tid_base_s
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
- REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
+ NOD("comm", S_IFREG|S_IRUGO|S_IWUSR,
+ &proc_tid_comm_inode_operations,
+ &proc_pid_set_comm_operations, {}),
#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
ONE("syscall", S_IRUSR, proc_pid_syscall),
#endif
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith <[email protected]>
commit 7261aafc095763b119136a562540dea7b1ccf657 upstream.
OPALv2 only ever existed in the lab and didn't escape to the world.
All OPAL systems in the wild are OPALv3.
The probability of there being an OPALv2 system still powered on
anywhere inside IBM is approximately zero, let alone anyone
expecting to run mainline kernels.
So, start to remove references to OPALv2.
Signed-off-by: Stewart Smith <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/include/asm/firmware.h | 4 +---
arch/powerpc/platforms/powernv/opal.c | 8 ++------
arch/powerpc/platforms/powernv/setup.c | 4 ----
arch/powerpc/platforms/powernv/smp.c | 4 ++--
4 files changed, 5 insertions(+), 15 deletions(-)
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -47,7 +47,6 @@
#define FW_FEATURE_VPHN ASM_CONST(0x0000000004000000)
#define FW_FEATURE_XCMO ASM_CONST(0x0000000008000000)
#define FW_FEATURE_OPAL ASM_CONST(0x0000000010000000)
-#define FW_FEATURE_OPALv2 ASM_CONST(0x0000000020000000)
#define FW_FEATURE_SET_MODE ASM_CONST(0x0000000040000000)
#define FW_FEATURE_BEST_ENERGY ASM_CONST(0x0000000080000000)
#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
@@ -70,8 +69,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
FW_FEATURE_PSERIES_ALWAYS = 0,
- FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2 |
- FW_FEATURE_OPALv3,
+ FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv3,
FW_FEATURE_POWERNV_ALWAYS = 0,
FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -100,7 +100,6 @@ int __init early_init_dt_scan_opal(unsig
powerpc_firmware_features |= FW_FEATURE_OPAL;
if (of_flat_dt_is_compatible(node, "ibm,opal-v3")) {
- powerpc_firmware_features |= FW_FEATURE_OPALv2;
powerpc_firmware_features |= FW_FEATURE_OPALv3;
pr_info("OPAL V3 detected !\n");
} else {
@@ -349,7 +348,7 @@ int opal_put_chars(uint32_t vtermno, con
* enough room and be done with it
*/
spin_lock_irqsave(&opal_write_lock, flags);
- if (firmware_has_feature(FW_FEATURE_OPALv2)) {
+ if (firmware_has_feature(FW_FEATURE_OPALv3)) {
rc = opal_console_write_buffer_space(vtermno, &olen);
len = be64_to_cpu(olen);
if (rc || len < total_len) {
@@ -693,10 +692,7 @@ static int __init opal_init(void)
}
/* Register OPAL consoles if any ports */
- if (firmware_has_feature(FW_FEATURE_OPALv2))
- consoles = of_find_node_by_path("/ibm,opal/consoles");
- else
- consoles = of_node_get(opal_node);
+ consoles = of_find_node_by_path("/ibm,opal/consoles");
if (consoles) {
for_each_child_of_node(consoles, np) {
if (strcmp(np->name, "serial"))
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -142,10 +142,6 @@ static void pnv_show_cpuinfo(struct seq_
seq_printf(m, "machine\t\t: PowerNV %s\n", model);
if (firmware_has_feature(FW_FEATURE_OPALv3))
seq_printf(m, "firmware\t: OPAL v3\n");
- else if (firmware_has_feature(FW_FEATURE_OPALv2))
- seq_printf(m, "firmware\t: OPAL v2\n");
- else if (firmware_has_feature(FW_FEATURE_OPAL))
- seq_printf(m, "firmware\t: OPAL v1\n");
else
seq_printf(m, "firmware\t: BML\n");
of_node_put(root);
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -65,10 +65,10 @@ static int pnv_smp_kick_cpu(int nr)
BUG_ON(nr < 0 || nr >= NR_CPUS);
/*
- * If we already started or OPALv2 is not supported, we just
+ * If we already started or OPALv3 is not supported, we just
* kick the CPU via the PACA
*/
- if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv2))
+ if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv3))
goto kick;
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (VMware) <[email protected]>
commit 45dd9b0666a162f8e4be76096716670cf1741f0e upstream.
Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.
Worse yet, the hack used was:
__array(char, x, 0)
Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.
Nuke the trace events!
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/xen/mmu.c | 4 ----
include/trace/events/xen.h | 16 ----------------
2 files changed, 20 deletions(-)
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1316,8 +1316,6 @@ void xen_flush_tlb_all(void)
struct mmuext_op *op;
struct multicall_space mcs;
- trace_xen_mmu_flush_tlb_all(0);
-
preempt_disable();
mcs = xen_mc_entry(sizeof(*op));
@@ -1335,8 +1333,6 @@ static void xen_flush_tlb(void)
struct mmuext_op *op;
struct multicall_space mcs;
- trace_xen_mmu_flush_tlb(0);
-
preempt_disable();
mcs = xen_mc_entry(sizeof(*op));
--- a/include/trace/events/xen.h
+++ b/include/trace/events/xen.h
@@ -377,22 +377,6 @@ DECLARE_EVENT_CLASS(xen_mmu_pgd,
DEFINE_XEN_MMU_PGD_EVENT(xen_mmu_pgd_pin);
DEFINE_XEN_MMU_PGD_EVENT(xen_mmu_pgd_unpin);
-TRACE_EVENT(xen_mmu_flush_tlb_all,
- TP_PROTO(int x),
- TP_ARGS(x),
- TP_STRUCT__entry(__array(char, x, 0)),
- TP_fast_assign((void)x),
- TP_printk("%s", "")
- );
-
-TRACE_EVENT(xen_mmu_flush_tlb,
- TP_PROTO(int x),
- TP_ARGS(x),
- TP_STRUCT__entry(__array(char, x, 0)),
- TP_fast_assign((void)x),
- TP_printk("%s", "")
- );
-
TRACE_EVENT(xen_mmu_flush_tlb_single,
TP_PROTO(unsigned long addr),
TP_ARGS(addr),
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <[email protected]>
[ Upstream commit 2c5d5b13c6eb79f5677e206b8aad59b3a2097f60 ]
syzbot loves to set very small mtu on devices, since it brings joy.
We must make llc_ui_sendmsg() fool proof.
usercopy: Kernel memory overwrite attempt detected to wrapped address (offset 0, size 18446612139802320068)!
kernel BUG at mm/usercopy.c:100!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 17464 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #36
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:usercopy_abort+0xbb/0xbd mm/usercopy.c:88
RSP: 0018:ffff8801868bf800 EFLAGS: 00010282
RAX: 000000000000006c RBX: ffffffff87d2fb00 RCX: 0000000000000000
RDX: 000000000000006c RSI: ffffffff81610731 RDI: ffffed0030d17ef6
RBP: ffff8801868bf858 R08: ffff88018daa4200 R09: ffffed003b5c4fb0
R10: ffffed003b5c4fb0 R11: ffff8801dae27d87 R12: ffffffff87d2f8e0
R13: ffffffff87d2f7a0 R14: ffffffff87d2f7a0 R15: ffffffff87d2f7a0
FS: 00007f56a14ac700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2bc21000 CR3: 00000001abeb1000 CR4: 00000000001426f0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000030602
Call Trace:
check_bogus_address mm/usercopy.c:153 [inline]
__check_object_size+0x5d9/0x5d9 mm/usercopy.c:256
check_object_size include/linux/thread_info.h:108 [inline]
check_copy_size include/linux/thread_info.h:139 [inline]
copy_from_iter_full include/linux/uio.h:121 [inline]
memcpy_from_msg include/linux/skbuff.h:3305 [inline]
llc_ui_sendmsg+0x4b1/0x1530 net/llc/af_llc.c:941
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:639
__sys_sendto+0x3d7/0x670 net/socket.c:1789
__do_sys_sendto net/socket.c:1801 [inline]
__se_sys_sendto net/socket.c:1797 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455979
RSP: 002b:00007f56a14abc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f56a14ac6d4 RCX: 0000000000455979
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000018
RBP: 000000000072bea0 R08: 00000000200012c0 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000548 R14: 00000000006fbf60 R15: 0000000000000000
Code: 55 c0 e8 c0 55 bb ff ff 75 c8 48 8b 55 c0 4d 89 f9 ff 75 d0 4d 89 e8 48 89 d9 4c 89 e6 41 56 48 c7 c7 80 fa d2 87 e8 a0 0b a3 ff <0f> 0b e8 95 55 bb ff e8 c0 a8 f7 ff 8b 95 14 ff ff ff 4d 89 e8
RIP: usercopy_abort+0xbb/0xbd mm/usercopy.c:88 RSP: ffff8801868bf800
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/llc/af_llc.c | 3 +++
1 file changed, 3 insertions(+)
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -926,6 +926,9 @@ static int llc_ui_sendmsg(struct socket
if (size > llc->dev->mtu)
size = llc->dev->mtu;
copied = size - hdrlen;
+ rc = -EINVAL;
+ if (copied < 0)
+ goto release;
release_sock(sk);
skb = sock_alloc_send_skb(sk, size, noblock, &rc);
lock_sock(sk);
On Thu 24-05-18 13:28:41, Greg KH wrote:
> On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
> > Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
> > <[email protected]>
> > wrote:
> >
> > > On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > > > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > > > 4.4-stable review patch. If anyone has any objections, please let me
> > know.
> > > >
> > > > Just one objection: Why does stable care about this (and the previous
> > > > patch)? I've checked the stable queue and I don't see anything that
> > would
> > > > have these patches as a prerequisite. And on their own, they are only
> > > > cleanups without substantial gains.
> >
> > > There's a small gain here:
> >
> > > > > paralleldd
> > > > > 4.4.0 4.4.0
> > > > > vanilla avoidlock
> > > > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > > > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > > > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > > > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > > > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > > > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > > > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > > > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > > > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > > > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> > > > >
> > > > > The impact is small but intuitively, it makes sense to avoid
> > unnecessary
> > > > > calls to lock_page.
> >
> > > Yes, it's small, but it's marked in the SLES kernel as "needs to be
> > > merged into stable", so obviously it matters to someone :)
> >
> > Hmm. I had the same reaction to these two as Jan, but assumed that they
> > made applying later patches easier, and didn't take the trouble he did to
> > find that's not so.
> >
> > I've no wish to be disputatious, but it does seem that the definition of
> > "stable" has changed, and not necessarily for the better, if it's now a
> > home for small gains: I thought we left those to upstream.
>
> This is in the SLES kernel for a reason, and again, it's in the section
> that says "this should be pushed to stable". So if it's good enough for
> the SLES kernel, why isn't it good enough for all users of this kernel
> tree?
Heh, fair enough. I guess Mel in the end didn't find patches worthy enough
to be pushed to stable tree. But at least now I know they are well tested
with 4.4 base so they should do no harm in the stable tree so my stance is
closer to neutral.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio <[email protected]>
[ Upstream commit 72f17baf2352ded6a1d3f4bb2d15da8c678cd2cb ]
If an OVS_ATTR_NESTED attribute type is found while walking
through netlink attributes, we call nlattr_set() recursively
passing the length table for the following nested attributes, if
different from the current one.
However, once we're done with those sub-nested attributes, we
should continue walking through attributes using the current
table, instead of using the one related to the sub-nested
attributes.
For example, given this sequence:
1 OVS_KEY_ATTR_PRIORITY
2 OVS_KEY_ATTR_TUNNEL
3 OVS_TUNNEL_KEY_ATTR_ID
4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC
5 OVS_TUNNEL_KEY_ATTR_IPV4_DST
6 OVS_TUNNEL_KEY_ATTR_TTL
7 OVS_TUNNEL_KEY_ATTR_TP_SRC
8 OVS_TUNNEL_KEY_ATTR_TP_DST
9 OVS_KEY_ATTR_IN_PORT
10 OVS_KEY_ATTR_SKB_MARK
11 OVS_KEY_ATTR_MPLS
we switch to the 'ovs_tunnel_key_lens' table on attribute #3,
and we don't switch back to 'ovs_key_lens' while setting
attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS
evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is
15, we also get this kind of KASan splat while accessing the
wrong table:
[ 7654.586496] ==================================================================
[ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430
[ 7654.610983]
[ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1
[ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[ 7654.631379] Call Trace:
[ 7654.634108] [<ffffffffb65a7c50>] dump_stack+0x19/0x1b
[ 7654.639843] [<ffffffffb53ff373>] print_address_description+0x33/0x290
[ 7654.647129] [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.654607] [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330
[ 7654.661406] [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40
[ 7654.668789] [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.676076] [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch]
[ 7654.684234] [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40
[ 7654.689968] [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590
[ 7654.696574] [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch]
[ 7654.705122] [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0
[ 7654.712503] [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21
[ 7654.719401] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.726298] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.733195] [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50
[ 7654.740187] [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0
[ 7654.746406] [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20
[ 7654.752914] [<ffffffffb53fe711>] ? memset+0x31/0x40
[ 7654.758456] [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]
[snip]
[ 7655.132484] The buggy address belongs to the variable:
[ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch]
[ 7655.145507]
[ 7655.147166] Memory state around the buggy address:
[ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
[ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
[ 7655.176701] ^
[ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05
[ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.200490] ==================================================================
Reported-by: Hangbin Liu <[email protected]>
Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.")
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/openvswitch/flow_netlink.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1141,13 +1141,10 @@ static void nlattr_set(struct nlattr *at
/* The nlattr stream should already have been validated */
nla_for_each_nested(nla, attr, rem) {
- if (tbl[nla_type(nla)].len == OVS_ATTR_NESTED) {
- if (tbl[nla_type(nla)].next)
- tbl = tbl[nla_type(nla)].next;
- nlattr_set(nla, val, tbl);
- } else {
+ if (tbl[nla_type(nla)].len == OVS_ATTR_NESTED)
+ nlattr_set(nla, val, tbl[nla_type(nla)].next ? : tbl);
+ else
memset(nla_data(nla), val, nla_len(nla));
- }
if (nla_type(nla) == OVS_KEY_ATTR_CT_STATE)
*(u32 *)nla_data(nla) &= CT_SUPPORTED_MASK;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit <[email protected]>
[ Upstream commit 3148dedfe79e422f448a10250d3e2cdf8b7ee617 ]
Since commit a92a08499b1f "r8169: improve runtime pm in general and
suspend unused ports" interfaces w/o link are runtime-suspended after
10s. On systems where drivers take longer to load this can lead to the
situation that the interface is runtime-suspended already when it's
initially brought up.
This shouldn't be a problem because rtl_open() resumes MAC/PHY.
However with at least one chip version the interface doesn't properly
come up, as reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=199549
The vendor driver uses a delay to give certain chip versions some
time to resume before starting the PHY configuration. So let's do
the same. I don't know which chip versions may be affected,
therefore apply this delay always.
This patch was reported to fix the issue for RTL8168h.
I was able to reproduce the issue on an Asus H310I-Plus which also
uses a RTL8168h. Also in my case the patch fixed the issue.
Reported-by: Slava Kardakov <[email protected]>
Tested-by: Slava Kardakov <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/realtek/r8169.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4832,6 +4832,9 @@ static void rtl_pll_power_down(struct rt
static void rtl_pll_power_up(struct rtl8169_private *tp)
{
rtl_generic_op(tp, tp->pll_power_ops.up);
+
+ /* give MAC/PHY some time to resume */
+ msleep(20);
}
static void rtl_init_pll_power_ops(struct rtl8169_private *tp)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko <[email protected]>
commit efc4a13724b852ddaa3358402a8dec024ffbcb17 upstream.
Currently the 32-bit device address only is supported for DMA. However,
starting from Intel Sunrisepoint PCH the DMA address of the device FIFO
can be 64-bit.
Change the respective variable to be compatible with DMA engine
expectations, i.e. to phys_addr_t.
Fixes: 34cadd9c1bcb ("spi: pxa2xx: Add support for Intel Sunrisepoint")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/spi/spi-pxa2xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/spi/spi-pxa2xx.h
+++ b/drivers/spi/spi-pxa2xx.h
@@ -38,7 +38,7 @@ struct driver_data {
/* SSP register addresses */
void __iomem *ioaddr;
- u32 ssdr_physical;
+ phys_addr_t ssdr_physical;
/* SSP masks*/
u32 dma_cr1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede <[email protected]>
commit c8beccc19b92f5172994c0732db689c08f4f98e5 upstream.
Power-saving is causing loud plops on the Lenovo C50 All in one, add it
to the blacklist.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1572975
Signed-off-by: Hans de Goede <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
sound/pci/hda/hda_intel.c | 2 ++
1 file changed, 2 insertions(+)
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2072,6 +2072,8 @@ static struct snd_pci_quirk power_save_b
SND_PCI_QUIRK(0x1849, 0x0c0c, "Asrock B85M-ITX", 0),
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0),
+ /* https://bugzilla.redhat.com/show_bug.cgi?id=1572975 */
+ SND_PCI_QUIRK(0x17aa, 0x36a7, "Lenovo C50 All in one", 0),
/* https://bugzilla.kernel.org/show_bug.cgi?id=198611 */
SND_PCI_QUIRK(0x17aa, 0x2227, "Lenovo X1 Carbon 3rd Gen", 0),
{}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenwen Wang <[email protected]>
commit 3f12888dfae2a48741c4caa9214885b3aaf350f9 upstream.
In snd_ctl_elem_add_compat(), the fields of the struct 'data' need to be
copied from the corresponding fields of the struct 'data32' in userspace.
This is achieved by invoking copy_from_user() and get_user() functions. The
problem here is that the 'type' field is copied twice. One is by
copy_from_user() and one is by get_user(). Given that the 'type' field is
not used between the two copies, the second copy is *completely* redundant
and should be removed for better performance and cleanup. Also, these two
copies can cause inconsistent data: as the struct 'data32' resides in
userspace and a malicious userspace process can race to change the 'type'
field between the two copies to cause inconsistent data. Depending on how
the data is used in the future, such an inconsistency may cause potential
security risks.
For above reasons, we should take out the second copy.
Signed-off-by: Wenwen Wang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
sound/core/control_compat.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/sound/core/control_compat.c
+++ b/sound/core/control_compat.c
@@ -400,8 +400,7 @@ static int snd_ctl_elem_add_compat(struc
if (copy_from_user(&data->id, &data32->id, sizeof(data->id)) ||
copy_from_user(&data->type, &data32->type, 3 * sizeof(u32)))
goto error;
- if (get_user(data->owner, &data32->owner) ||
- get_user(data->type, &data32->type))
+ if (get_user(data->owner, &data32->owner))
goto error;
switch (data->type) {
case SNDRV_CTL_ELEM_TYPE_BOOLEAN:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long <[email protected]>
[ Upstream commit d625329b06e46bd20baf9ee40847d11982569204 ]
Since sctp ipv6 socket also supports v4 addrs, it's possible to
compare two v4 addrs in pf v6 .cmp_addr, sctp_inet6_cmp_addr.
However after Commit 1071ec9d453a ("sctp: do not check port in
sctp_inet6_cmp_addr"), it no longer calls af1->cmp_addr, which
in this case is sctp_v4_cmp_addr, but calls __sctp_v6_cmp_addr
where it handles them as two v6 addrs. It would cause a out of
bounds crash.
syzbot found this crash when trying to bind two v4 addrs to a
v6 socket.
This patch fixes it by adding the process for two v4 addrs in
sctp_inet6_cmp_addr.
Fixes: 1071ec9d453a ("sctp: do not check port in sctp_inet6_cmp_addr")
Reported-by: [email protected]
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/ipv6.c | 3 +++
1 file changed, 3 insertions(+)
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -863,6 +863,9 @@ static int sctp_inet6_cmp_addr(const uni
if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2))
return 1;
+ if (addr1->sa.sa_family == AF_INET && addr2->sa.sa_family == AF_INET)
+ return addr1->v4.sin_addr.s_addr == addr2->v4.sin_addr.s_addr;
+
return __sctp_v6_cmp_addr(addr1, addr2);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) <[email protected]>
commit 22076557b07c12086eeb16b8ce2b0b735f7a27e7 upstream.
usbip_host updates device status without holding lock from stub probe,
disconnect and rebind code paths. When multiple requests to import a
device are received, these unprotected code paths step all over each
other and drive fails with NULL-ptr deref and use-after-free errors.
The driver uses a table lock to protect the busid array for adding and
deleting busids to the table. However, the probe, disconnect and rebind
paths get the busid table entry and update the status without holding
the busid table lock. Add a new finer grain lock to protect the busid
entry. This new lock will be held to search and update the busid entry
fields from get_busid_idx(), add_match_busid() and del_match_busid().
match_busid_show() does the same to access the busid entry fields.
get_busid_priv() changed to return the pointer to the busid entry holding
the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind()
call put_busid_priv() to release the busid lock before returning. This
changes fixes the unprotected code paths eliminating the race conditions
in updating the busid entries.
Reported-by: Jakub Jirasek
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/usbip/stub.h | 2 ++
drivers/usb/usbip/stub_dev.c | 33 +++++++++++++++++++++++----------
drivers/usb/usbip/stub_main.c | 40 +++++++++++++++++++++++++++++++++++-----
3 files changed, 60 insertions(+), 15 deletions(-)
--- a/drivers/usb/usbip/stub.h
+++ b/drivers/usb/usbip/stub.h
@@ -88,6 +88,7 @@ struct bus_id_priv {
struct stub_device *sdev;
struct usb_device *udev;
char shutdown_busid;
+ spinlock_t busid_lock;
};
/* stub_priv is allocated from stub_priv_cache */
@@ -98,6 +99,7 @@ extern struct usb_device_driver stub_dri
/* stub_main.c */
struct bus_id_priv *get_busid_priv(const char *busid);
+void put_busid_priv(struct bus_id_priv *bid);
int del_match_busid(char *busid);
void stub_device_cleanup_urbs(struct stub_device *sdev);
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -314,7 +314,7 @@ static int stub_probe(struct usb_device
struct stub_device *sdev = NULL;
const char *udev_busid = dev_name(&udev->dev);
struct bus_id_priv *busid_priv;
- int rc;
+ int rc = 0;
dev_dbg(&udev->dev, "Enter probe\n");
@@ -331,13 +331,15 @@ static int stub_probe(struct usb_device
* other matched drivers by the driver core.
* See driver_probe_device() in driver/base/dd.c
*/
- return -ENODEV;
+ rc = -ENODEV;
+ goto call_put_busid_priv;
}
if (udev->descriptor.bDeviceClass == USB_CLASS_HUB) {
dev_dbg(&udev->dev, "%s is a usb hub device... skip!\n",
udev_busid);
- return -ENODEV;
+ rc = -ENODEV;
+ goto call_put_busid_priv;
}
if (!strcmp(udev->bus->bus_name, "vhci_hcd")) {
@@ -345,13 +347,16 @@ static int stub_probe(struct usb_device
"%s is attached on vhci_hcd... skip!\n",
udev_busid);
- return -ENODEV;
+ rc = -ENODEV;
+ goto call_put_busid_priv;
}
/* ok, this is my device */
sdev = stub_device_alloc(udev);
- if (!sdev)
- return -ENOMEM;
+ if (!sdev) {
+ rc = -ENOMEM;
+ goto call_put_busid_priv;
+ }
dev_info(&udev->dev,
"usbip-host: register new device (bus %u dev %u)\n",
@@ -383,7 +388,9 @@ static int stub_probe(struct usb_device
}
busid_priv->status = STUB_BUSID_ALLOC;
- return 0;
+ rc = 0;
+ goto call_put_busid_priv;
+
err_files:
usb_hub_release_port(udev->parent, udev->portnum,
(struct usb_dev_state *) udev);
@@ -394,6 +401,9 @@ err_port:
busid_priv->sdev = NULL;
stub_device_free(sdev);
+
+call_put_busid_priv:
+ put_busid_priv(busid_priv);
return rc;
}
@@ -432,7 +442,7 @@ static void stub_disconnect(struct usb_d
/* get stub_device */
if (!sdev) {
dev_err(&udev->dev, "could not get device");
- return;
+ goto call_put_busid_priv;
}
dev_set_drvdata(&udev->dev, NULL);
@@ -447,12 +457,12 @@ static void stub_disconnect(struct usb_d
(struct usb_dev_state *) udev);
if (rc) {
dev_dbg(&udev->dev, "unable to release port\n");
- return;
+ goto call_put_busid_priv;
}
/* If usb reset is called from event handler */
if (busid_priv->sdev->ud.eh == current)
- return;
+ goto call_put_busid_priv;
/* shutdown the current connection */
shutdown_busid(busid_priv);
@@ -465,6 +475,9 @@ static void stub_disconnect(struct usb_d
if (busid_priv->status == STUB_BUSID_ALLOC)
busid_priv->status = STUB_BUSID_ADDED;
+
+call_put_busid_priv:
+ put_busid_priv(busid_priv);
}
#ifdef CONFIG_PM
--- a/drivers/usb/usbip/stub_main.c
+++ b/drivers/usb/usbip/stub_main.c
@@ -40,6 +40,8 @@ static spinlock_t busid_table_lock;
static void init_busid_table(void)
{
+ int i;
+
/*
* This also sets the bus_table[i].status to
* STUB_BUSID_OTHER, which is 0.
@@ -47,6 +49,9 @@ static void init_busid_table(void)
memset(busid_table, 0, sizeof(busid_table));
spin_lock_init(&busid_table_lock);
+
+ for (i = 0; i < MAX_BUSID; i++)
+ spin_lock_init(&busid_table[i].busid_lock);
}
/*
@@ -58,15 +63,20 @@ static int get_busid_idx(const char *bus
int i;
int idx = -1;
- for (i = 0; i < MAX_BUSID; i++)
+ for (i = 0; i < MAX_BUSID; i++) {
+ spin_lock(&busid_table[i].busid_lock);
if (busid_table[i].name[0])
if (!strncmp(busid_table[i].name, busid, BUSID_SIZE)) {
idx = i;
+ spin_unlock(&busid_table[i].busid_lock);
break;
}
+ spin_unlock(&busid_table[i].busid_lock);
+ }
return idx;
}
+/* Returns holding busid_lock. Should call put_busid_priv() to unlock */
struct bus_id_priv *get_busid_priv(const char *busid)
{
int idx;
@@ -74,13 +84,21 @@ struct bus_id_priv *get_busid_priv(const
spin_lock(&busid_table_lock);
idx = get_busid_idx(busid);
- if (idx >= 0)
+ if (idx >= 0) {
bid = &(busid_table[idx]);
+ /* get busid_lock before returning */
+ spin_lock(&bid->busid_lock);
+ }
spin_unlock(&busid_table_lock);
return bid;
}
+void put_busid_priv(struct bus_id_priv *bid)
+{
+ spin_unlock(&bid->busid_lock);
+}
+
static int add_match_busid(char *busid)
{
int i;
@@ -93,15 +111,19 @@ static int add_match_busid(char *busid)
goto out;
}
- for (i = 0; i < MAX_BUSID; i++)
+ for (i = 0; i < MAX_BUSID; i++) {
+ spin_lock(&busid_table[i].busid_lock);
if (!busid_table[i].name[0]) {
strlcpy(busid_table[i].name, busid, BUSID_SIZE);
if ((busid_table[i].status != STUB_BUSID_ALLOC) &&
(busid_table[i].status != STUB_BUSID_REMOV))
busid_table[i].status = STUB_BUSID_ADDED;
ret = 0;
+ spin_unlock(&busid_table[i].busid_lock);
break;
}
+ spin_unlock(&busid_table[i].busid_lock);
+ }
out:
spin_unlock(&busid_table_lock);
@@ -122,6 +144,8 @@ int del_match_busid(char *busid)
/* found */
ret = 0;
+ spin_lock(&busid_table[idx].busid_lock);
+
if (busid_table[idx].status == STUB_BUSID_OTHER)
memset(busid_table[idx].name, 0, BUSID_SIZE);
@@ -129,6 +153,7 @@ int del_match_busid(char *busid)
(busid_table[idx].status != STUB_BUSID_ADDED))
busid_table[idx].status = STUB_BUSID_REMOV;
+ spin_unlock(&busid_table[idx].busid_lock);
out:
spin_unlock(&busid_table_lock);
@@ -141,9 +166,12 @@ static ssize_t show_match_busid(struct d
char *out = buf;
spin_lock(&busid_table_lock);
- for (i = 0; i < MAX_BUSID; i++)
+ for (i = 0; i < MAX_BUSID; i++) {
+ spin_lock(&busid_table[i].busid_lock);
if (busid_table[i].name[0])
out += sprintf(out, "%s ", busid_table[i].name);
+ spin_unlock(&busid_table[i].busid_lock);
+ }
spin_unlock(&busid_table_lock);
out += sprintf(out, "\n");
@@ -219,7 +247,7 @@ static void stub_device_rebind(void)
}
spin_unlock(&busid_table_lock);
- /* now run rebind */
+ /* now run rebind - no need to hold locks. driver files are removed */
for (i = 0; i < MAX_BUSID; i++) {
if (busid_table[i].name[0] &&
busid_table[i].shutdown_busid) {
@@ -249,6 +277,8 @@ static ssize_t rebind_store(struct devic
/* mark the device for deletion so probe ignores it during rescan */
bid->status = STUB_BUSID_OTHER;
+ /* release the busid lock */
+ put_busid_priv(bid);
ret = do_rebind((char *) buf, bid);
if (ret < 0)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Chapman <[email protected]>
commit de3b58bc359a861d5132300f53f95e83f71954b3 upstream.
Revert commit 820da5357572 ("l2tp: fix missing print session offset
info"). The peer_offset parameter is removed.
Signed-off-by: James Chapman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Cc: Guillaume Nault <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/l2tp/l2tp_netlink.c | 2 --
1 file changed, 2 deletions(-)
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -732,8 +732,6 @@ static int l2tp_nl_session_send(struct s
if ((session->ifname[0] &&
nla_put_string(skb, L2TP_ATTR_IFNAME, session->ifname)) ||
- (session->offset &&
- nla_put_u16(skb, L2TP_ATTR_OFFSET, session->offset)) ||
(session->cookie_len &&
nla_put(skb, L2TP_ATTR_COOKIE, session->cookie_len,
&session->cookie[0])) ||
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) <[email protected]>
commit 1e180f167d4e413afccbbb4a421b48b2de832549 upstream.
Device is left in the busid_table after unbind and rebind. Rebind
initiates usb bus scan and the original driver claims the device.
After rescan the device should be deleted from the busid_table as
it no longer belongs to usbip_host.
Fix it to delete the device after device_attach() succeeds.
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/usbip/stub_main.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/usb/usbip/stub_main.c
+++ b/drivers/usb/usbip/stub_main.c
@@ -201,6 +201,9 @@ static ssize_t rebind_store(struct devic
if (!bid)
return -ENODEV;
+ /* mark the device for deletion so probe ignores it during rescan */
+ bid->status = STUB_BUSID_OTHER;
+
/* device_attach() callers should hold parent lock for USB */
if (bid->udev->dev.parent)
device_lock(bid->udev->dev.parent);
@@ -212,6 +215,9 @@ static ssize_t rebind_store(struct devic
return ret;
}
+ /* delete device from busid_table */
+ del_match_busid((char *) buf);
+
return count;
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Federico Cuello <[email protected]>
commit 21493316a3c4598f308d5a9fa31cc74639c4caff upstream.
Currently it's not possible to set volume lower than 26% (it just mutes).
Also fixes this warning:
Warning! Unlikely big volume range (=9472), cval->res is probably wrong.
[13] FU [PCM Playback Volume] ch = 2, val = -9473/-1/1
, and volume works fine for full range.
Signed-off-by: Federico Cuello <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
sound/usb/mixer.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -904,6 +904,14 @@ static void volume_control_quirks(struct
}
break;
+ case USB_ID(0x0d8c, 0x0103):
+ if (!strcmp(kctl->id.name, "PCM Playback Volume")) {
+ usb_audio_info(chip,
+ "set volume quirk for CM102-A+/102S+\n");
+ cval->min = -256;
+ }
+ break;
+
case USB_ID(0x0471, 0x0101):
case USB_ID(0x0471, 0x0104):
case USB_ID(0x0471, 0x0105):
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby <[email protected]>
commit d70ef22892ed6c066e51e118b225923c9b74af34 upstream.
sign_extend32 counts the sign bit parameter from 0, not from 1. So we
have to use "11" for 12th bit, not "12".
This mistake means we have not allowed negative op and cmp args since
commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined
behaviour") till now.
Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour")
Signed-off-by: Jiri Slaby <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Darren Hart <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/futex.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1457,8 +1457,8 @@ static int futex_atomic_op_inuser(unsign
{
unsigned int op = (encoded_op & 0x70000000) >> 28;
unsigned int cmp = (encoded_op & 0x0f000000) >> 24;
- int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
- int cmparg = sign_extend32(encoded_op & 0x00000fff, 12);
+ int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 11);
+ int cmparg = sign_extend32(encoded_op & 0x00000fff, 11);
int oldval, ret;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan <[email protected]>
commit 28b68acc4a88dcf91fd1dcf2577371dc9bf574cc upstream.
Refine probe and disconnect debug msgs to be useful and say what is
in progress.
Signed-off-by: Shuah Khan <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/usbip/stub_dev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -316,7 +316,7 @@ static int stub_probe(struct usb_device
struct bus_id_priv *busid_priv;
int rc;
- dev_dbg(&udev->dev, "Enter\n");
+ dev_dbg(&udev->dev, "Enter probe\n");
/* check we should claim or not by busid_table */
busid_priv = get_busid_priv(udev_busid);
@@ -419,7 +419,7 @@ static void stub_disconnect(struct usb_d
struct bus_id_priv *busid_priv;
int rc;
- dev_dbg(&udev->dev, "Enter\n");
+ dev_dbg(&udev->dev, "Enter disconnect\n");
busid_priv = get_busid_priv(udev_busid);
if (!busid_priv) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) <[email protected]>
commit 7510df3f29d44685bab7b1918b61a8ccd57126a9 upstream.
After removing usbip_host module, devices it releases are left without
a driver. For example, when a keyboard or a mass storage device are
bound to usbip_host when it is removed, these devices are no longer
bound to any driver.
Fix it to run device_attach() from the module exit routine to restore
the devices to their original drivers. This includes cleanup changes
and moving device_attach() code to a common routine to be called from
rebind_store() and usbip_host_exit().
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/usbip/stub_dev.c | 6 ----
drivers/usb/usbip/stub_main.c | 60 +++++++++++++++++++++++++++++++++++-------
2 files changed, 52 insertions(+), 14 deletions(-)
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -463,12 +463,8 @@ static void stub_disconnect(struct usb_d
busid_priv->sdev = NULL;
stub_device_free(sdev);
- if (busid_priv->status == STUB_BUSID_ALLOC) {
+ if (busid_priv->status == STUB_BUSID_ALLOC)
busid_priv->status = STUB_BUSID_ADDED;
- } else {
- busid_priv->status = STUB_BUSID_OTHER;
- del_match_busid((char *)udev_busid);
- }
}
#ifdef CONFIG_PM
--- a/drivers/usb/usbip/stub_main.c
+++ b/drivers/usb/usbip/stub_main.c
@@ -28,6 +28,7 @@
#define DRIVER_DESC "USB/IP Host Driver"
struct kmem_cache *stub_priv_cache;
+
/*
* busid_tables defines matching busids that usbip can grab. A user can change
* dynamically what device is locally used and what device is exported to a
@@ -184,6 +185,51 @@ static ssize_t store_match_busid(struct
static DRIVER_ATTR(match_busid, S_IRUSR | S_IWUSR, show_match_busid,
store_match_busid);
+static int do_rebind(char *busid, struct bus_id_priv *busid_priv)
+{
+ int ret;
+
+ /* device_attach() callers should hold parent lock for USB */
+ if (busid_priv->udev->dev.parent)
+ device_lock(busid_priv->udev->dev.parent);
+ ret = device_attach(&busid_priv->udev->dev);
+ if (busid_priv->udev->dev.parent)
+ device_unlock(busid_priv->udev->dev.parent);
+ if (ret < 0) {
+ dev_err(&busid_priv->udev->dev, "rebind failed\n");
+ return ret;
+ }
+ return 0;
+}
+
+static void stub_device_rebind(void)
+{
+#if IS_MODULE(CONFIG_USBIP_HOST)
+ struct bus_id_priv *busid_priv;
+ int i;
+
+ /* update status to STUB_BUSID_OTHER so probe ignores the device */
+ spin_lock(&busid_table_lock);
+ for (i = 0; i < MAX_BUSID; i++) {
+ if (busid_table[i].name[0] &&
+ busid_table[i].shutdown_busid) {
+ busid_priv = &(busid_table[i]);
+ busid_priv->status = STUB_BUSID_OTHER;
+ }
+ }
+ spin_unlock(&busid_table_lock);
+
+ /* now run rebind */
+ for (i = 0; i < MAX_BUSID; i++) {
+ if (busid_table[i].name[0] &&
+ busid_table[i].shutdown_busid) {
+ busid_priv = &(busid_table[i]);
+ do_rebind(busid_table[i].name, busid_priv);
+ }
+ }
+#endif
+}
+
static ssize_t rebind_store(struct device_driver *dev, const char *buf,
size_t count)
{
@@ -204,16 +250,9 @@ static ssize_t rebind_store(struct devic
/* mark the device for deletion so probe ignores it during rescan */
bid->status = STUB_BUSID_OTHER;
- /* device_attach() callers should hold parent lock for USB */
- if (bid->udev->dev.parent)
- device_lock(bid->udev->dev.parent);
- ret = device_attach(&bid->udev->dev);
- if (bid->udev->dev.parent)
- device_unlock(bid->udev->dev.parent);
- if (ret < 0) {
- dev_err(&bid->udev->dev, "rebind failed\n");
+ ret = do_rebind((char *) buf, bid);
+ if (ret < 0)
return ret;
- }
/* delete device from busid_table */
del_match_busid((char *) buf);
@@ -339,6 +378,9 @@ static void __exit usbip_host_exit(void)
*/
usb_deregister_device_driver(&stub_driver);
+ /* initiate scan to attach devices */
+ stub_device_rebind();
+
kmem_cache_destroy(stub_priv_cache);
}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kerrisk (man-pages) <[email protected]>
commit 086e774a57fba4695f14383c0818994c0b31da7c upstream.
This is a patch that provides behavior that is more consistent, and
probably less surprising to users. I consider the change optional, and
welcome opinions about whether it should be applied.
By default, pipes are created with a capacity of 64 kiB. However,
/proc/sys/fs/pipe-max-size may be set smaller than this value. In this
scenario, an unprivileged user could thus create a pipe whose initial
capacity exceeds the limit. Therefore, it seems logical to cap the
initial pipe capacity according to the value of pipe-max-size.
The test program shown earlier in this patch series can be used to
demonstrate the effect of the change brought about with this patch:
# cat /proc/sys/fs/pipe-max-size
1048576
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
# echo 10000 > /proc/sys/fs/pipe-max-size
# cat /proc/sys/fs/pipe-max-size
16384
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 16384
# ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
caps the initial allocation for a new pipe for unprivileged users, but
not for privileged users.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michael Kerrisk <[email protected]>
Reviewed-by: Vegard Nossum <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Sangorrin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/pipe.c | 3 +++
1 file changed, 3 insertions(+)
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -616,6 +616,9 @@ struct pipe_inode_info *alloc_pipe_info(
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
struct user_struct *user = get_current_user();
+ if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
+ pipe_bufs = pipe_max_size >> PAGE_SHIFT;
+
if (!too_many_pipe_buffers_hard(user)) {
if (too_many_pipe_buffers_soft(user))
pipe_bufs = 1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby <[email protected]>
commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.
Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.
This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.
Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Russell King <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Acked-by: Heiko Carstens <[email protected]> [s390]
Acked-by: Chris Metcalf <[email protected]> [for tile]
Reviewed-by: Darren Hart (VMware) <[email protected]>
Reviewed-by: Will Deacon <[email protected]> [core/arm64]
Cc: [email protected]
Cc: Rich Felker <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: [email protected]
Cc: Jonas Bonn <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Yoshinori Sato <[email protected]>
Cc: [email protected]
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: [email protected]
Cc: Fenghua Yu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: Stefan Kristiansson <[email protected]>
Cc: [email protected]
Cc: Ivan Kokshaysky <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: [email protected]
Cc: Richard Henderson <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: Vineet Gupta <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: [email protected]
Cc: Martin Schwidefsky <[email protected]>
Cc: [email protected]
Cc: "David S. Miller" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/alpha/include/asm/futex.h | 26 +++---------------
arch/arc/include/asm/futex.h | 40 +++-------------------------
arch/arm/include/asm/futex.h | 26 ++----------------
arch/arm64/include/asm/futex.h | 26 ++----------------
arch/frv/include/asm/futex.h | 3 +-
arch/frv/kernel/futex.c | 27 ++-----------------
arch/hexagon/include/asm/futex.h | 38 ++-------------------------
arch/ia64/include/asm/futex.h | 25 ++----------------
arch/microblaze/include/asm/futex.h | 38 ++-------------------------
arch/mips/include/asm/futex.h | 25 ++----------------
arch/parisc/include/asm/futex.h | 25 ++----------------
arch/powerpc/include/asm/futex.h | 26 +++---------------
arch/s390/include/asm/futex.h | 23 +++-------------
arch/sh/include/asm/futex.h | 26 ++----------------
arch/sparc/include/asm/futex_64.h | 26 +++---------------
arch/tile/include/asm/futex.h | 40 +++-------------------------
arch/x86/include/asm/futex.h | 40 +++-------------------------
arch/xtensa/include/asm/futex.h | 27 +++----------------
include/asm-generic/futex.h | 50 ++++++------------------------------
kernel/futex.c | 39 ++++++++++++++++++++++++++++
20 files changed, 126 insertions(+), 470 deletions(-)
--- a/arch/alpha/include/asm/futex.h
+++ b/arch/alpha/include/asm/futex.h
@@ -29,18 +29,10 @@
: "r" (uaddr), "r"(oparg) \
: "memory")
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -66,17 +58,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/arc/include/asm/futex.h
+++ b/arch/arc/include/asm/futex.h
@@ -73,20 +73,11 @@
#endif
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
- return -EFAULT;
-
#ifndef CONFIG_ARC_HAS_LLSC
preempt_disable(); /* to guarantee atomic r-m-w of futex op */
#endif
@@ -118,30 +109,9 @@ static inline int futex_atomic_op_inuser
preempt_enable();
#endif
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ:
- ret = (oldval == cmparg);
- break;
- case FUTEX_OP_CMP_NE:
- ret = (oldval != cmparg);
- break;
- case FUTEX_OP_CMP_LT:
- ret = (oldval < cmparg);
- break;
- case FUTEX_OP_CMP_GE:
- ret = (oldval >= cmparg);
- break;
- case FUTEX_OP_CMP_LE:
- ret = (oldval <= cmparg);
- break;
- case FUTEX_OP_CMP_GT:
- ret = (oldval > cmparg);
- break;
- default:
- ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -128,20 +128,10 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
#endif /* !SMP */
static inline int
-futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret, tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
#ifndef CONFIG_SMP
preempt_disable();
#endif
@@ -172,17 +162,9 @@ futex_atomic_op_inuser (int encoded_op,
preempt_enable();
#endif
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -53,20 +53,10 @@
: "memory")
static inline int
-futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (int)(encoded_op << 8) >> 20;
- int cmparg = (int)(encoded_op << 20) >> 20;
int oldval = 0, ret, tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1U << (oparg & 0x1f);
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
pagefault_disable();
switch (op) {
@@ -96,17 +86,9 @@ futex_atomic_op_inuser(unsigned int enco
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/frv/include/asm/futex.h
+++ b/arch/frv/include/asm/futex.h
@@ -7,7 +7,8 @@
#include <asm/errno.h>
#include <asm/uaccess.h>
-extern int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr);
+extern int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr);
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
--- a/arch/frv/kernel/futex.c
+++ b/arch/frv/kernel/futex.c
@@ -186,20 +186,10 @@ static inline int atomic_futex_op_xchg_x
/*
* do the futex operations
*/
-int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
pagefault_disable();
switch (op) {
@@ -225,18 +215,9 @@ int futex_atomic_op_inuser(int encoded_o
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS; break;
- }
- }
+ if (!ret)
+ *oval = oldval;
return ret;
-} /* end futex_atomic_op_inuser() */
+} /* end arch_futex_atomic_op_inuser() */
--- a/arch/hexagon/include/asm/futex.h
+++ b/arch/hexagon/include/asm/futex.h
@@ -31,18 +31,9 @@
static inline int
-futex_atomic_op_inuser(int encoded_op, int __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
- return -EFAULT;
pagefault_disable();
@@ -72,30 +63,9 @@ futex_atomic_op_inuser(int encoded_op, i
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ:
- ret = (oldval == cmparg);
- break;
- case FUTEX_OP_CMP_NE:
- ret = (oldval != cmparg);
- break;
- case FUTEX_OP_CMP_LT:
- ret = (oldval < cmparg);
- break;
- case FUTEX_OP_CMP_GE:
- ret = (oldval >= cmparg);
- break;
- case FUTEX_OP_CMP_LE:
- ret = (oldval <= cmparg);
- break;
- case FUTEX_OP_CMP_GT:
- ret = (oldval > cmparg);
- break;
- default:
- ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/ia64/include/asm/futex.h
+++ b/arch/ia64/include/asm/futex.h
@@ -45,18 +45,9 @@ do { \
} while (0)
static inline int
-futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -84,17 +75,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/microblaze/include/asm/futex.h
+++ b/arch/microblaze/include/asm/futex.h
@@ -29,18 +29,9 @@
})
static inline int
-futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -66,30 +57,9 @@ futex_atomic_op_inuser(int encoded_op, u
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ:
- ret = (oldval == cmparg);
- break;
- case FUTEX_OP_CMP_NE:
- ret = (oldval != cmparg);
- break;
- case FUTEX_OP_CMP_LT:
- ret = (oldval < cmparg);
- break;
- case FUTEX_OP_CMP_GE:
- ret = (oldval >= cmparg);
- break;
- case FUTEX_OP_CMP_LE:
- ret = (oldval <= cmparg);
- break;
- case FUTEX_OP_CMP_GT:
- ret = (oldval > cmparg);
- break;
- default:
- ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -83,18 +83,9 @@
}
static inline int
-futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -125,17 +116,9 @@ futex_atomic_op_inuser(int encoded_op, u
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/parisc/include/asm/futex.h
+++ b/arch/parisc/include/asm/futex.h
@@ -32,20 +32,11 @@ _futex_spin_unlock_irqrestore(u32 __user
}
static inline int
-futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
unsigned long int flags;
u32 val;
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)))
- return -EFAULT;
pagefault_disable();
@@ -98,17 +89,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -31,18 +31,10 @@
: "b" (uaddr), "i" (-EFAULT), "r" (oparg) \
: "cr0", "memory")
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -68,17 +60,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/s390/include/asm/futex.h
+++ b/arch/s390/include/asm/futex.h
@@ -21,17 +21,12 @@
: "0" (-EFAULT), "d" (oparg), "a" (uaddr), \
"m" (*uaddr) : "cc");
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, newval, ret;
load_kernel_asce();
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
pagefault_disable();
switch (op) {
@@ -60,17 +55,9 @@ static inline int futex_atomic_op_inuser
}
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/sh/include/asm/futex.h
+++ b/arch/sh/include/asm/futex.h
@@ -10,20 +10,11 @@
/* XXX: UP variants, fix for SH-4A and SMP.. */
#include <asm/futex-irq.h>
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
pagefault_disable();
switch (op) {
@@ -49,17 +40,8 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
return ret;
}
--- a/arch/sparc/include/asm/futex_64.h
+++ b/arch/sparc/include/asm/futex_64.h
@@ -29,22 +29,14 @@
: "r" (uaddr), "r" (oparg), "i" (-EFAULT) \
: "memory")
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret, tem;
- if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))))
- return -EFAULT;
if (unlikely((((unsigned long) uaddr) & 0x3UL)))
return -EINVAL;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
pagefault_disable();
switch (op) {
@@ -69,17 +61,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/tile/include/asm/futex.h
+++ b/arch/tile/include/asm/futex.h
@@ -106,12 +106,9 @@
lock = __atomic_hashed_lock((int __force *)uaddr)
#endif
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int uninitialized_var(val), ret;
__futex_prolog();
@@ -119,12 +116,6 @@ static inline int futex_atomic_op_inuser
/* The 32-bit futex code makes this assumption, so validate it here. */
BUILD_BUG_ON(sizeof(atomic_t) != sizeof(int));
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
pagefault_disable();
switch (op) {
case FUTEX_OP_SET:
@@ -148,30 +139,9 @@ static inline int futex_atomic_op_inuser
}
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ:
- ret = (val == cmparg);
- break;
- case FUTEX_OP_CMP_NE:
- ret = (val != cmparg);
- break;
- case FUTEX_OP_CMP_LT:
- ret = (val < cmparg);
- break;
- case FUTEX_OP_CMP_GE:
- ret = (val >= cmparg);
- break;
- case FUTEX_OP_CMP_LE:
- ret = (val <= cmparg);
- break;
- case FUTEX_OP_CMP_GT:
- ret = (val > cmparg);
- break;
- default:
- ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = val;
+
return ret;
}
--- a/arch/x86/include/asm/futex.h
+++ b/arch/x86/include/asm/futex.h
@@ -41,20 +41,11 @@
"+m" (*uaddr), "=&r" (tem) \
: "r" (oparg), "i" (-EFAULT), "1" (0))
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret, tem;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
-
pagefault_disable();
switch (op) {
@@ -80,30 +71,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ:
- ret = (oldval == cmparg);
- break;
- case FUTEX_OP_CMP_NE:
- ret = (oldval != cmparg);
- break;
- case FUTEX_OP_CMP_LT:
- ret = (oldval < cmparg);
- break;
- case FUTEX_OP_CMP_GE:
- ret = (oldval >= cmparg);
- break;
- case FUTEX_OP_CMP_LE:
- ret = (oldval <= cmparg);
- break;
- case FUTEX_OP_CMP_GT:
- ret = (oldval > cmparg);
- break;
- default:
- ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/arch/xtensa/include/asm/futex.h
+++ b/arch/xtensa/include/asm/futex.h
@@ -44,18 +44,10 @@
: "r" (uaddr), "I" (-EFAULT), "r" (oparg) \
: "memory")
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+ u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
#if !XCHAL_HAVE_S32C1I
return -ENOSYS;
@@ -89,19 +81,10 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (ret)
- return ret;
-
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: return (oldval == cmparg);
- case FUTEX_OP_CMP_NE: return (oldval != cmparg);
- case FUTEX_OP_CMP_LT: return (oldval < cmparg);
- case FUTEX_OP_CMP_GE: return (oldval >= cmparg);
- case FUTEX_OP_CMP_LE: return (oldval <= cmparg);
- case FUTEX_OP_CMP_GT: return (oldval > cmparg);
- }
+ if (!ret)
+ *oval = oldval;
- return -ENOSYS;
+ return ret;
}
static inline int
--- a/include/asm-generic/futex.h
+++ b/include/asm-generic/futex.h
@@ -13,7 +13,7 @@
*/
/**
- * futex_atomic_op_inuser() - Atomic arithmetic operation with constant
+ * arch_futex_atomic_op_inuser() - Atomic arithmetic operation with constant
* argument and comparison of the previous
* futex value with another constant.
*
@@ -25,18 +25,11 @@
* <0 - On error
*/
static inline int
-futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval, ret;
u32 tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
preempt_disable();
pagefault_disable();
@@ -74,17 +67,9 @@ out_pagefault_enable:
pagefault_enable();
preempt_enable();
- if (ret == 0) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (ret == 0)
+ *oval = oldval;
+
return ret;
}
@@ -126,18 +111,9 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
#else
static inline int
-futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr)
{
- int op = (encoded_op >> 28) & 7;
- int cmp = (encoded_op >> 24) & 15;
- int oparg = (encoded_op << 8) >> 20;
- int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
- oparg = 1 << oparg;
-
- if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32)))
- return -EFAULT;
pagefault_disable();
@@ -153,17 +129,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) {
- switch (cmp) {
- case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
- case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
- case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
- case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
- case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
- case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
- default: ret = -ENOSYS;
- }
- }
+ if (!ret)
+ *oval = oldval;
+
return ret;
}
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1453,6 +1453,45 @@ out:
return ret;
}
+static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
+{
+ unsigned int op = (encoded_op & 0x70000000) >> 28;
+ unsigned int cmp = (encoded_op & 0x0f000000) >> 24;
+ int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
+ int cmparg = sign_extend32(encoded_op & 0x00000fff, 12);
+ int oldval, ret;
+
+ if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
+ if (oparg < 0 || oparg > 31)
+ return -EINVAL;
+ oparg = 1 << oparg;
+ }
+
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
+ ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
+ if (ret)
+ return ret;
+
+ switch (cmp) {
+ case FUTEX_OP_CMP_EQ:
+ return oldval == cmparg;
+ case FUTEX_OP_CMP_NE:
+ return oldval != cmparg;
+ case FUTEX_OP_CMP_LT:
+ return oldval < cmparg;
+ case FUTEX_OP_CMP_GE:
+ return oldval >= cmparg;
+ case FUTEX_OP_CMP_LE:
+ return oldval <= cmparg;
+ case FUTEX_OP_CMP_GT:
+ return oldval > cmparg;
+ default:
+ return -ENOSYS;
+ }
+}
+
/*
* Wake up all waiters hashed on the physical page that is mapped
* to this virtual address:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: zhongjiang <[email protected]>
commit dd83c161fbcc5d8be637ab159c0de015cbff5ba4 upstream.
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9
The related calltrace is as follows:
negation of -2147483648 cannot be represented in type 'int':
CPU: 9 PID: 16482 Comm: zj Tainted: G B ---- ------- 3.10.0-327.53.58.71.x86_64+ #66
Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
Call Trace:
dump_stack+0x19/0x1b
ubsan_epilogue+0xd/0x50
__ubsan_handle_negate_overflow+0x109/0x14e
SyS_wait4+0x1cb/0x1e0
system_call_fastpath+0x16/0x1b
Exclude the overflow to avoid the UBSAN warning.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhongjiang <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Xishi Qiu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/exit.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1608,6 +1608,10 @@ SYSCALL_DEFINE4(wait4, pid_t, upid, int
__WNOTHREAD|__WCLONE|__WALL))
return -EINVAL;
+ /* -INT_MIN is not defined */
+ if (upid == INT_MIN)
+ return -ESRCH;
+
if (upid == -1)
type = PIDTYPE_MAX;
else if (upid < 0) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <[email protected]>
commit 48fb6f4db940e92cfb16cd878cddd59ea6120d06 upstream.
Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.
Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key. Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.
kernel BUG at kernel/futex.c:679!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
Hardware name: linux,dummy-virt (DT)
task: ffff80001e271780 task.stack: ffff000010908000
PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.
This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning. The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.
#include <linux/futex.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <unistd.h>
#define NR_FUTEX_THREADS 16
pthread_t threads[NR_FUTEX_THREADS];
void *mem;
#define MEM_PROT (PROT_READ | PROT_WRITE)
#define MEM_SIZE 65536
static int futex_wrapper(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
}
void *poll_futex(void *unused)
{
for (;;) {
futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
}
}
int main(int argc, char *argv[])
{
int i;
mem = mmap(NULL, MEM_SIZE, MEM_PROT,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
printf("Mapping @ %p\n", mem);
printf("Creating futex threads...\n");
for (i = 0; i < NR_FUTEX_THREADS; i++)
pthread_create(&threads[i], NULL, poll_futex, NULL);
printf("Flipping mapping...\n");
for (;;) {
mmap(mem, MEM_SIZE, MEM_PROT,
MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}
return 0;
}
Reported-and-tested-by: Mark Rutland <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected] # 4.7+
Signed-off-by: Linus Torvalds <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/futex.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -666,13 +666,14 @@ again:
* this reference was taken by ihold under the page lock
* pinning the inode in place so i_lock was unnecessary. The
* only way for this check to fail is if the inode was
- * truncated in parallel so warn for now if this happens.
+ * truncated in parallel which is almost certainly an
+ * application bug. In such a case, just retry.
*
* We are not calling into get_futex_key_refs() in file-backed
* cases, therefore a successful atomic_inc return below will
* guarantee that get_futex_key() will still imply smp_mb(); (B).
*/
- if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
+ if (!atomic_inc_not_zero(&inode->i_count)) {
rcu_read_unlock();
put_page(page_head);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel <[email protected]>
commit 30b5ba5cf333cc650e474eaf2cc1ae91bc7cf89f upstream.
Implement a macro mov_q that can be used to move an immediate constant
into a 64-bit register, using between 2 and 4 movz/movk instructions
(depending on the operand)
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/include/asm/assembler.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -204,4 +204,24 @@ lr .req x30 // link register
.size __pi_##x, . - x; \
ENDPROC(x)
+ /*
+ * mov_q - move an immediate constant into a 64-bit register using
+ * between 2 and 4 movz/movk instructions (depending on the
+ * magnitude and sign of the operand)
+ */
+ .macro mov_q, reg, val
+ .if (((\val) >> 31) == 0 || ((\val) >> 31) == 0x1ffffffff)
+ movz \reg, :abs_g1_s:\val
+ .else
+ .if (((\val) >> 47) == 0 || ((\val) >> 47) == 0x1ffff)
+ movz \reg, :abs_g2_s:\val
+ .else
+ movz \reg, :abs_g3:\val
+ movk \reg, :abs_g2_nc:\val
+ .endif
+ movk \reg, :abs_g1_nc:\val
+ .endif
+ movk \reg, :abs_g0_nc:\val
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long <[email protected]>
[ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
Now sctp only delays the authentication for the normal cookie-echo
chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
authentication first based on the old asoc, which will definitely
fail due to the different auth info in the old asoc.
The duplicated cookie-echo chunk will create a new asoc with the
auth info from this chunk, and the authentication should also be
done with the new asoc's auth info for all of the collision 'A',
'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
will never pass the authentication and create the new connection.
This issue exists since very beginning, and this fix is to make
sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
for the normal cookie-echo chunk to delay the authentication.
While at it, remove the unused params from sctp_sf_authenticate()
and define sctp_auth_chunk_verify() used for all the places that
do the delayed authentication.
v1->v2:
fix the typo in changelog as Marcelo noticed.
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/associola.c | 30 ++++++++++++++++
net/sctp/sm_statefuns.c | 87 ++++++++++++++++++++++++++----------------------
2 files changed, 77 insertions(+), 40 deletions(-)
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor
struct sctp_endpoint *ep;
struct sctp_chunk *chunk;
struct sctp_inq *inqueue;
- int state;
sctp_subtype_t subtype;
+ int first_time = 1; /* is this the first time through the loop */
int error = 0;
+ int state;
/* The association should be held so we should be safe. */
ep = asoc->ep;
@@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor
state = asoc->state;
subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
+ /* If the first chunk in the packet is AUTH, do special
+ * processing specified in Section 6.3 of SCTP-AUTH spec
+ */
+ if (first_time && subtype.chunk == SCTP_CID_AUTH) {
+ struct sctp_chunkhdr *next_hdr;
+
+ next_hdr = sctp_inq_peek(inqueue);
+ if (!next_hdr)
+ goto normal;
+
+ /* If the next chunk is COOKIE-ECHO, skip the AUTH
+ * chunk while saving a pointer to it so we can do
+ * Authentication later (during cookie-echo
+ * processing).
+ */
+ if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
+ chunk->auth_chunk = skb_clone(chunk->skb,
+ GFP_ATOMIC);
+ chunk->auth = 1;
+ continue;
+ }
+ }
+
+normal:
/* SCTP-AUTH, Section 6.3:
* The receiver has a list of chunk types which it expects
* to be received only after an AUTH-chunk. This list has
@@ -1051,6 +1076,9 @@ static void sctp_assoc_bh_rcv(struct wor
/* If there is an error on chunk, discard this packet. */
if (error && chunk)
chunk->pdiscard = 1;
+
+ if (first_time)
+ first_time = 0;
}
sctp_association_put(asoc);
}
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -144,10 +144,8 @@ static sctp_disposition_t sctp_sf_violat
void *arg,
sctp_cmd_seq_t *commands);
-static sctp_ierror_t sctp_sf_authenticate(struct net *net,
- const struct sctp_endpoint *ep,
+static sctp_ierror_t sctp_sf_authenticate(
const struct sctp_association *asoc,
- const sctp_subtype_t type,
struct sctp_chunk *chunk);
static sctp_disposition_t __sctp_sf_do_9_1_abort(struct net *net,
@@ -615,6 +613,38 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(s
return SCTP_DISPOSITION_CONSUME;
}
+static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
+ const struct sctp_association *asoc)
+{
+ struct sctp_chunk auth;
+
+ if (!chunk->auth_chunk)
+ return true;
+
+ /* SCTP-AUTH: auth_chunk pointer is only set when the cookie-echo
+ * is supposed to be authenticated and we have to do delayed
+ * authentication. We've just recreated the association using
+ * the information in the cookie and now it's much easier to
+ * do the authentication.
+ */
+
+ /* Make sure that we and the peer are AUTH capable */
+ if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
+ return false;
+
+ /* set-up our fake chunk so that we can process it */
+ auth.skb = chunk->auth_chunk;
+ auth.asoc = chunk->asoc;
+ auth.sctp_hdr = chunk->sctp_hdr;
+ auth.chunk_hdr = (struct sctp_chunkhdr *)
+ skb_push(chunk->auth_chunk,
+ sizeof(struct sctp_chunkhdr));
+ skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
+ auth.transport = chunk->transport;
+
+ return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
+}
+
/*
* Respond to a normal COOKIE ECHO chunk.
* We are the side that is being asked for an association.
@@ -751,36 +781,9 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(st
if (error)
goto nomem_init;
- /* SCTP-AUTH: auth_chunk pointer is only set when the cookie-echo
- * is supposed to be authenticated and we have to do delayed
- * authentication. We've just recreated the association using
- * the information in the cookie and now it's much easier to
- * do the authentication.
- */
- if (chunk->auth_chunk) {
- struct sctp_chunk auth;
- sctp_ierror_t ret;
-
- /* Make sure that we and the peer are AUTH capable */
- if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
- sctp_association_free(new_asoc);
- return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
- }
-
- /* set-up our fake chunk so that we can process it */
- auth.skb = chunk->auth_chunk;
- auth.asoc = chunk->asoc;
- auth.sctp_hdr = chunk->sctp_hdr;
- auth.chunk_hdr = (sctp_chunkhdr_t *)skb_push(chunk->auth_chunk,
- sizeof(sctp_chunkhdr_t));
- skb_pull(chunk->auth_chunk, sizeof(sctp_chunkhdr_t));
- auth.transport = chunk->transport;
-
- ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
- if (ret != SCTP_IERROR_NO_ERROR) {
- sctp_association_free(new_asoc);
- return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
- }
+ if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
+ sctp_association_free(new_asoc);
+ return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
}
repl = sctp_make_cookie_ack(new_asoc, chunk);
@@ -1717,13 +1720,15 @@ static sctp_disposition_t sctp_sf_do_dup
GFP_ATOMIC))
goto nomem;
+ if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+ return SCTP_DISPOSITION_DISCARD;
+
/* Make sure no new addresses are being added during the
* restart. Though this is a pretty complicated attack
* since you'd have to get inside the cookie.
*/
- if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
+ if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
return SCTP_DISPOSITION_CONSUME;
- }
/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
* the peer has restarted (Action A), it MUST NOT setup a new
@@ -1828,6 +1833,9 @@ static sctp_disposition_t sctp_sf_do_dup
GFP_ATOMIC))
goto nomem;
+ if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+ return SCTP_DISPOSITION_DISCARD;
+
/* Update the content of current association. */
sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
@@ -1920,6 +1928,9 @@ static sctp_disposition_t sctp_sf_do_dup
* a COOKIE ACK.
*/
+ if (!sctp_auth_chunk_verify(net, chunk, asoc))
+ return SCTP_DISPOSITION_DISCARD;
+
/* Don't accidentally move back into established state. */
if (asoc->state < SCTP_STATE_ESTABLISHED) {
sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
@@ -3985,10 +3996,8 @@ gen_shutdown:
*
* The return value is the disposition of the chunk.
*/
-static sctp_ierror_t sctp_sf_authenticate(struct net *net,
- const struct sctp_endpoint *ep,
+static sctp_ierror_t sctp_sf_authenticate(
const struct sctp_association *asoc,
- const sctp_subtype_t type,
struct sctp_chunk *chunk)
{
struct sctp_authhdr *auth_hdr;
@@ -4087,7 +4096,7 @@ sctp_disposition_t sctp_sf_eat_auth(stru
commands);
auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
- error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
+ error = sctp_sf_authenticate(asoc, chunk);
switch (error) {
case SCTP_IERROR_AUTH_BAD_HMAC:
/* Generate the ERROR chunk and discard the rest
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman <[email protected]>
This reverts commit 9de3a3bfed892608dc30a6bc3fd8bdbeae5b51a5 which was
commit 79935915300c5eb88a0e94fa9148a7505c14a02a upstream.
As Ben points out:
This depends on:
commit 570c70a60f53ca737ead4e5966c446bf0d39fac9
Author: Fabio Estevam <[email protected]>
Date: Wed Apr 5 11:32:34 2017 -0300
ASoC: sgtl5000: Allow LRCLK pad drive strength to be changed
which did not show up until 4.13, so this makes no sense to have in this
stable branch.
Reported-by: Ben Hutchings <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 1 -
1 file changed, 1 deletion(-)
--- a/arch/arm/boot/dts/imx6qdl-wandboard.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-wandboard.dtsi
@@ -88,7 +88,6 @@
clocks = <&clks 201>;
VDDA-supply = <®_2p5v>;
VDDIO-supply = <®_3p3v>;
- lrclk-strength = <3>;
};
};
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai <[email protected]>
commit f65e0d299807d8a11812845c972493c3f9a18e10 upstream.
snd_timer_notify1() is called outside the spinlock and it retakes the
lock after the unlock. This is rather racy, and it's safer to move
snd_timer_notify() call inside the main spinlock.
The patch also contains a slight refactoring / cleanup of the code.
Now all start/stop/continue/pause look more symmetric and a bit better
readable.
Signed-off-by: Takashi Iwai <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
sound/core/timer.c | 220 ++++++++++++++++++++++++-----------------------------
1 file changed, 102 insertions(+), 118 deletions(-)
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -318,8 +318,6 @@ int snd_timer_open(struct snd_timer_inst
return 0;
}
-static int _snd_timer_stop(struct snd_timer_instance *timeri, int event);
-
/*
* close a timer instance
*/
@@ -408,7 +406,6 @@ unsigned long snd_timer_resolution(struc
static void snd_timer_notify1(struct snd_timer_instance *ti, int event)
{
struct snd_timer *timer;
- unsigned long flags;
unsigned long resolution = 0;
struct snd_timer_instance *ts;
struct timespec tstamp;
@@ -432,34 +429,66 @@ static void snd_timer_notify1(struct snd
return;
if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE)
return;
- spin_lock_irqsave(&timer->lock, flags);
list_for_each_entry(ts, &ti->slave_active_head, active_list)
if (ts->ccallback)
ts->ccallback(ts, event + 100, &tstamp, resolution);
- spin_unlock_irqrestore(&timer->lock, flags);
}
-static int snd_timer_start1(struct snd_timer *timer, struct snd_timer_instance *timeri,
- unsigned long sticks)
+/* start/continue a master timer */
+static int snd_timer_start1(struct snd_timer_instance *timeri,
+ bool start, unsigned long ticks)
{
+ struct snd_timer *timer;
+ int result;
+ unsigned long flags;
+
+ timer = timeri->timer;
+ if (!timer)
+ return -EINVAL;
+
+ spin_lock_irqsave(&timer->lock, flags);
+ if (timer->card && timer->card->shutdown) {
+ result = -ENODEV;
+ goto unlock;
+ }
+ if (timeri->flags & (SNDRV_TIMER_IFLG_RUNNING |
+ SNDRV_TIMER_IFLG_START)) {
+ result = -EBUSY;
+ goto unlock;
+ }
+
+ if (start)
+ timeri->ticks = timeri->cticks = ticks;
+ else if (!timeri->cticks)
+ timeri->cticks = 1;
+ timeri->pticks = 0;
+
list_move_tail(&timeri->active_list, &timer->active_list_head);
if (timer->running) {
if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE)
goto __start_now;
timer->flags |= SNDRV_TIMER_FLG_RESCHED;
timeri->flags |= SNDRV_TIMER_IFLG_START;
- return 1; /* delayed start */
+ result = 1; /* delayed start */
} else {
- timer->sticks = sticks;
+ if (start)
+ timer->sticks = ticks;
timer->hw.start(timer);
__start_now:
timer->running++;
timeri->flags |= SNDRV_TIMER_IFLG_RUNNING;
- return 0;
+ result = 0;
}
+ snd_timer_notify1(timeri, start ? SNDRV_TIMER_EVENT_START :
+ SNDRV_TIMER_EVENT_CONTINUE);
+ unlock:
+ spin_unlock_irqrestore(&timer->lock, flags);
+ return result;
}
-static int snd_timer_start_slave(struct snd_timer_instance *timeri)
+/* start/continue a slave timer */
+static int snd_timer_start_slave(struct snd_timer_instance *timeri,
+ bool start)
{
unsigned long flags;
@@ -473,88 +502,37 @@ static int snd_timer_start_slave(struct
spin_lock(&timeri->timer->lock);
list_add_tail(&timeri->active_list,
&timeri->master->slave_active_head);
+ snd_timer_notify1(timeri, start ? SNDRV_TIMER_EVENT_START :
+ SNDRV_TIMER_EVENT_CONTINUE);
spin_unlock(&timeri->timer->lock);
}
spin_unlock_irqrestore(&slave_active_lock, flags);
return 1; /* delayed start */
}
-/*
- * start the timer instance
- */
-int snd_timer_start(struct snd_timer_instance *timeri, unsigned int ticks)
-{
- struct snd_timer *timer;
- int result = -EINVAL;
- unsigned long flags;
-
- if (timeri == NULL || ticks < 1)
- return -EINVAL;
- if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) {
- result = snd_timer_start_slave(timeri);
- if (result >= 0)
- snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_START);
- return result;
- }
- timer = timeri->timer;
- if (timer == NULL)
- return -EINVAL;
- if (timer->card && timer->card->shutdown)
- return -ENODEV;
- spin_lock_irqsave(&timer->lock, flags);
- if (timeri->flags & (SNDRV_TIMER_IFLG_RUNNING |
- SNDRV_TIMER_IFLG_START)) {
- result = -EBUSY;
- goto unlock;
- }
- timeri->ticks = timeri->cticks = ticks;
- timeri->pticks = 0;
- result = snd_timer_start1(timer, timeri, ticks);
- unlock:
- spin_unlock_irqrestore(&timer->lock, flags);
- if (result >= 0)
- snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_START);
- return result;
-}
-
-static int _snd_timer_stop(struct snd_timer_instance *timeri, int event)
+/* stop/pause a master timer */
+static int snd_timer_stop1(struct snd_timer_instance *timeri, bool stop)
{
struct snd_timer *timer;
+ int result = 0;
unsigned long flags;
- if (snd_BUG_ON(!timeri))
- return -ENXIO;
-
- if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) {
- spin_lock_irqsave(&slave_active_lock, flags);
- if (!(timeri->flags & SNDRV_TIMER_IFLG_RUNNING)) {
- spin_unlock_irqrestore(&slave_active_lock, flags);
- return -EBUSY;
- }
- if (timeri->timer)
- spin_lock(&timeri->timer->lock);
- timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING;
- list_del_init(&timeri->ack_list);
- list_del_init(&timeri->active_list);
- if (timeri->timer)
- spin_unlock(&timeri->timer->lock);
- spin_unlock_irqrestore(&slave_active_lock, flags);
- goto __end;
- }
timer = timeri->timer;
if (!timer)
return -EINVAL;
spin_lock_irqsave(&timer->lock, flags);
if (!(timeri->flags & (SNDRV_TIMER_IFLG_RUNNING |
SNDRV_TIMER_IFLG_START))) {
- spin_unlock_irqrestore(&timer->lock, flags);
- return -EBUSY;
+ result = -EBUSY;
+ goto unlock;
}
list_del_init(&timeri->ack_list);
list_del_init(&timeri->active_list);
- if (timer->card && timer->card->shutdown) {
- spin_unlock_irqrestore(&timer->lock, flags);
- return 0;
+ if (timer->card && timer->card->shutdown)
+ goto unlock;
+ if (stop) {
+ timeri->cticks = timeri->ticks;
+ timeri->pticks = 0;
}
if ((timeri->flags & SNDRV_TIMER_IFLG_RUNNING) &&
!(--timer->running)) {
@@ -569,35 +547,60 @@ static int _snd_timer_stop(struct snd_ti
}
}
timeri->flags &= ~(SNDRV_TIMER_IFLG_RUNNING | SNDRV_TIMER_IFLG_START);
+ snd_timer_notify1(timeri, stop ? SNDRV_TIMER_EVENT_STOP :
+ SNDRV_TIMER_EVENT_CONTINUE);
+ unlock:
spin_unlock_irqrestore(&timer->lock, flags);
- __end:
- if (event != SNDRV_TIMER_EVENT_RESOLUTION)
- snd_timer_notify1(timeri, event);
+ return result;
+}
+
+/* stop/pause a slave timer */
+static int snd_timer_stop_slave(struct snd_timer_instance *timeri, bool stop)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&slave_active_lock, flags);
+ if (!(timeri->flags & SNDRV_TIMER_IFLG_RUNNING)) {
+ spin_unlock_irqrestore(&slave_active_lock, flags);
+ return -EBUSY;
+ }
+ timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING;
+ if (timeri->timer) {
+ spin_lock(&timeri->timer->lock);
+ list_del_init(&timeri->ack_list);
+ list_del_init(&timeri->active_list);
+ snd_timer_notify1(timeri, stop ? SNDRV_TIMER_EVENT_STOP :
+ SNDRV_TIMER_EVENT_CONTINUE);
+ spin_unlock(&timeri->timer->lock);
+ }
+ spin_unlock_irqrestore(&slave_active_lock, flags);
return 0;
}
/*
+ * start the timer instance
+ */
+int snd_timer_start(struct snd_timer_instance *timeri, unsigned int ticks)
+{
+ if (timeri == NULL || ticks < 1)
+ return -EINVAL;
+ if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE)
+ return snd_timer_start_slave(timeri, true);
+ else
+ return snd_timer_start1(timeri, true, ticks);
+}
+
+/*
* stop the timer instance.
*
* do not call this from the timer callback!
*/
int snd_timer_stop(struct snd_timer_instance *timeri)
{
- struct snd_timer *timer;
- unsigned long flags;
- int err;
-
- err = _snd_timer_stop(timeri, SNDRV_TIMER_EVENT_STOP);
- if (err < 0)
- return err;
- timer = timeri->timer;
- if (!timer)
- return -EINVAL;
- spin_lock_irqsave(&timer->lock, flags);
- timeri->cticks = timeri->ticks;
- timeri->pticks = 0;
- spin_unlock_irqrestore(&timer->lock, flags);
- return 0;
+ if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE)
+ return snd_timer_stop_slave(timeri, true);
+ else
+ return snd_timer_stop1(timeri, true);
}
/*
@@ -605,32 +608,10 @@ int snd_timer_stop(struct snd_timer_inst
*/
int snd_timer_continue(struct snd_timer_instance *timeri)
{
- struct snd_timer *timer;
- int result = -EINVAL;
- unsigned long flags;
-
- if (timeri == NULL)
- return result;
if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE)
- return snd_timer_start_slave(timeri);
- timer = timeri->timer;
- if (! timer)
- return -EINVAL;
- if (timer->card && timer->card->shutdown)
- return -ENODEV;
- spin_lock_irqsave(&timer->lock, flags);
- if (timeri->flags & SNDRV_TIMER_IFLG_RUNNING) {
- result = -EBUSY;
- goto unlock;
- }
- if (!timeri->cticks)
- timeri->cticks = 1;
- timeri->pticks = 0;
- result = snd_timer_start1(timer, timeri, timer->sticks);
- unlock:
- spin_unlock_irqrestore(&timer->lock, flags);
- snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_CONTINUE);
- return result;
+ return snd_timer_start_slave(timeri, false);
+ else
+ return snd_timer_start1(timeri, false, 0);
}
/*
@@ -638,7 +619,10 @@ int snd_timer_continue(struct snd_timer_
*/
int snd_timer_pause(struct snd_timer_instance * timeri)
{
- return _snd_timer_stop(timeri, SNDRV_TIMER_EVENT_PAUSE);
+ if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE)
+ return snd_timer_stop_slave(timeri, false);
+ else
+ return snd_timer_stop1(timeri, false);
}
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Taglang <[email protected]>
[ Upstream commit 14224923c3600bae2ac4dcae3bf0c3d4dc2812be ]
Currently, skb->len and skb->data_len are set to the page size, not
the packet size. This causes the frame check sequence to not be
located at the "end" of the packet resulting in ethernet frame check
errors. The driver does work currently, but stricter kernel facing
networking solutions like OpenVSwitch will drop these packets as
invalid.
These changes set the packet size correctly so that these errors no
longer occur. The length does not include the frame check sequence, so
that subtraction was removed.
Tested on Oracle/SUN Multithreaded 10-Gigabit Ethernet Network
Controller [108e:abcd] and validated in wireshark.
Signed-off-by: Rob Taglang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/sun/niu.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3442,7 +3442,7 @@ static int niu_process_rx_pkt(struct nap
len = (val & RCR_ENTRY_L2_LEN) >>
RCR_ENTRY_L2_LEN_SHIFT;
- len -= ETH_FCS_LEN;
+ append_size = len + ETH_HLEN + ETH_FCS_LEN;
addr = (val & RCR_ENTRY_PKT_BUF_ADDR) <<
RCR_ENTRY_PKT_BUF_ADDR_SHIFT;
@@ -3452,7 +3452,6 @@ static int niu_process_rx_pkt(struct nap
RCR_ENTRY_PKTBUFSZ_SHIFT];
off = addr & ~PAGE_MASK;
- append_size = rcr_size;
if (num_rcr == 1) {
int ptype;
@@ -3465,7 +3464,7 @@ static int niu_process_rx_pkt(struct nap
else
skb_checksum_none_assert(skb);
} else if (!(val & RCR_ENTRY_MULTI))
- append_size = len - skb->len;
+ append_size = append_size - skb->len;
niu_rx_skb_append(skb, page, off, append_size, rcr_size);
if ((page->index + rp->rbr_block_size) - rcr_size == addr) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh <[email protected]>
[ Upstream commit 6ad4e91c6d796b38a7f0e724db1de28eeb122bad ]
Add check of coalescing parameters received through ethtool are within
range of values supported by the HW.
Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the
users through ethtool. The ethtool support up to 32 bit value for each.
However, mlx4 modify cq limits the coalescing time parameter and
coalescing frames parameters to 16 bits.
Return out of range error if user tries to set these parameters to
higher values.
Change type of sample-interval and adaptive_rx_coal parameters in mlx4
driver to u32 as the ethtool holds them as u32 and these parameters are
not limited due to mlx4 HW.
Fixes: c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Moshe Shemesh <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 16 ++++++++++++++++
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 7 +++++--
2 files changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -967,6 +967,22 @@ static int mlx4_en_set_coalesce(struct n
if (!coal->tx_max_coalesced_frames_irq)
return -EINVAL;
+ if (coal->tx_coalesce_usecs > MLX4_EN_MAX_COAL_TIME ||
+ coal->rx_coalesce_usecs > MLX4_EN_MAX_COAL_TIME ||
+ coal->rx_coalesce_usecs_low > MLX4_EN_MAX_COAL_TIME ||
+ coal->rx_coalesce_usecs_high > MLX4_EN_MAX_COAL_TIME) {
+ netdev_info(dev, "%s: maximum coalesce time supported is %d usecs\n",
+ __func__, MLX4_EN_MAX_COAL_TIME);
+ return -ERANGE;
+ }
+
+ if (coal->tx_max_coalesced_frames > MLX4_EN_MAX_COAL_PKTS ||
+ coal->rx_max_coalesced_frames > MLX4_EN_MAX_COAL_PKTS) {
+ netdev_info(dev, "%s: maximum coalesced frames supported is %d\n",
+ __func__, MLX4_EN_MAX_COAL_PKTS);
+ return -ERANGE;
+ }
+
priv->rx_frames = (coal->rx_max_coalesced_frames ==
MLX4_EN_AUTO_CONF) ?
MLX4_EN_RX_COAL_TARGET :
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -140,6 +140,9 @@ enum {
#define MLX4_EN_TX_COAL_PKTS 16
#define MLX4_EN_TX_COAL_TIME 0x10
+#define MLX4_EN_MAX_COAL_PKTS U16_MAX
+#define MLX4_EN_MAX_COAL_TIME U16_MAX
+
#define MLX4_EN_RX_RATE_LOW 400000
#define MLX4_EN_RX_COAL_TIME_LOW 0
#define MLX4_EN_RX_RATE_HIGH 450000
@@ -518,8 +521,8 @@ struct mlx4_en_priv {
u16 rx_usecs_low;
u32 pkt_rate_high;
u16 rx_usecs_high;
- u16 sample_interval;
- u16 adaptive_rx_coal;
+ u32 sample_interval;
+ u32 adaptive_rx_coal;
u32 msg_enable;
u32 loopback_ok;
u32 validate_loopback;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long <[email protected]>
[ Upstream commit ce402f044e4e432c296f90eaabb8dbe8f3624391 ]
When auth is enabled for cookie-ack chunk, in sctp_inq_pop, sctp
processes auth chunk first, then continues to the next chunk in
this packet if chunk_end + chunk_hdr size < skb_tail_pointer().
Otherwise, it will go to the next packet or discard this chunk.
However, it missed the fact that cookie-ack chunk's size is equal
to chunk_hdr size, which couldn't match that check, and thus this
chunk would not get processed.
This patch fixes it by changing the check to chunk_end + chunk_hdr
size <= skb_tail_pointer().
Fixes: 26b87c788100 ("net: sctp: fix remote memory pressure from excessive queueing")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/inqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -178,7 +178,7 @@ struct sctp_chunk *sctp_inq_pop(struct s
skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
chunk->subh.v = NULL; /* Subheader is no longer valid. */
- if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+ if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <=
skb_tail_pointer(chunk->skb)) {
/* This is not a singleton */
chunk->singleton = 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan <[email protected]>
[ Upstream commit d89a2adb8bfe6f8949ff389acdb9fa298b6e8e12 ]
tg3_free_consistent() calls dma_free_coherent() to free tp->hw_stats
under spinlock and can trigger BUG_ON() in vunmap() because vunmap()
may sleep. Fix it by removing the spinlock and relying on the
TG3_FLAG_INIT_COMPLETE flag to prevent race conditions between
tg3_get_stats64() and tg3_free_consistent(). TG3_FLAG_INIT_COMPLETE
is always cleared under tp->lock before tg3_free_consistent()
and therefore tg3_get_stats64() can safely access tp->hw_stats
under tp->lock if TG3_FLAG_INIT_COMPLETE is set.
Fixes: f5992b72ebe0 ("tg3: Fix race condition in tg3_get_stats64().")
Reported-by: Zumeng Chen <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8722,14 +8722,15 @@ static void tg3_free_consistent(struct t
tg3_mem_rx_release(tp);
tg3_mem_tx_release(tp);
- /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
- tg3_full_lock(tp, 0);
+ /* tp->hw_stats can be referenced safely:
+ * 1. under rtnl_lock
+ * 2. or under tp->lock if TG3_FLAG_INIT_COMPLETE is set.
+ */
if (tp->hw_stats) {
dma_free_coherent(&tp->pdev->dev, sizeof(struct tg3_hw_stats),
tp->hw_stats, tp->stats_mapping);
tp->hw_stats = NULL;
}
- tg3_full_unlock(tp);
}
/*
@@ -14163,7 +14164,7 @@ static struct rtnl_link_stats64 *tg3_get
struct tg3 *tp = netdev_priv(dev);
spin_lock_bh(&tp->lock);
- if (!tp->hw_stats) {
+ if (!tp->hw_stats || !tg3_flag(tp, INIT_COMPLETE)) {
*stats = tp->net_stats_prev;
spin_unlock_bh(&tp->lock);
return stats;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng <[email protected]>
[ Upstream commit 16ae6aa1705299789f71fdea59bfb119c1fbd9c0 ]
The TCP repair sequence of operation is to first set the socket in
repair mode, then inject the TCP stats into the socket with repair
socket options, then call connect() to re-activate the socket. The
connect syscall simply returns and set state to ESTABLISHED
mode. As a result Fast Open is meaningless for TCP repair.
However allowing sendto() system call with MSG_FASTOPEN flag half-way
during the repair operation could unexpectedly cause data to be
sent, before the operation finishes changing the internal TCP stats
(e.g. MSS). This in turn triggers TCP warnings on inconsistent
packet accounting.
The fix is to simply disallow Fast Open operation once the socket
is in the repair mode.
Reported-by: syzbot <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1108,7 +1108,7 @@ int tcp_sendmsg(struct sock *sk, struct
lock_sock(sk);
flags = msg->msg_flags;
- if (flags & MSG_FASTOPEN) {
+ if ((flags & MSG_FASTOPEN) && !tp->repair) {
err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
if (err == -EINPROGRESS && copied_syn > 0)
goto out;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Debabrata Banerjee <[email protected]>
[ Upstream commit 4fa8667ca3989ce14cf66301fa251544fbddbdd0 ]
Make sure multicast, broadcast, and zero mac's cannot be the output of rlb
updates, which should all be directed arps. Receive load balancing will be
collapsed if any of these happen, as the switch will broadcast.
Signed-off-by: Debabrata Banerjee <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/bonding/bond_alb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -453,7 +453,7 @@ static void rlb_update_client(struct rlb
{
int i;
- if (!client_info->slave)
+ if (!client_info->slave || !is_valid_ether_addr(client_info->mac_dst))
return;
for (i = 0; i < RLB_ARP_BURST_SIZE; i++) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <[email protected]>
[ Upstream commit a8d7aa17bbc970971ccdf71988ea19230ab368b1 ]
syzbot reported a crash in tasklet_action_common() caused by dccp.
dccp needs to make sure socket wont disappear before tasklet handler
has completed.
This patch takes a reference on the socket when arming the tasklet,
and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()
kernel BUG at kernel/softirq.c:514!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ #30
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515
RSP: 0018:ffff8801d9b3faf8 EFLAGS: 00010246
dccp_close: ABORT with 65423 bytes unread
RAX: 1ffff1003b367f6b RBX: ffff8801daf1f3f0 RCX: 0000000000000000
RDX: ffff8801cf895498 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffff8801d9b3fc40 R08: ffffed0039f12a95 R09: ffffed0039f12a94
dccp_close: ABORT with 65423 bytes unread
R10: ffffed0039f12a94 R11: ffff8801cf8954a3 R12: 0000000000000000
R13: ffff8801d9b3fc18 R14: dffffc0000000000 R15: ffff8801cf895490
FS: 0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2bc28000 CR3: 00000001a08a9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tasklet_action+0x1d/0x20 kernel/softirq.c:533
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
dccp_close: ABORT with 65423 bytes unread
run_ksoftirqd+0x86/0x100 kernel/softirq.c:646
smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8
RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: ffff8801d9b3faf8
Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Gerrit Renker <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ccids/ccid2.c | 14 ++++++++++++--
net/dccp/timer.c | 2 +-
2 files changed, 13 insertions(+), 3 deletions(-)
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -126,6 +126,16 @@ static void ccid2_change_l_seq_window(st
DCCPF_SEQ_WMAX));
}
+static void dccp_tasklet_schedule(struct sock *sk)
+{
+ struct tasklet_struct *t = &dccp_sk(sk)->dccps_xmitlet;
+
+ if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+ sock_hold(sk);
+ __tasklet_schedule(t);
+ }
+}
+
static void ccid2_hc_tx_rto_expire(unsigned long data)
{
struct sock *sk = (struct sock *)data;
@@ -166,7 +176,7 @@ static void ccid2_hc_tx_rto_expire(unsig
/* if we were blocked before, we may now send cwnd=1 packet */
if (sender_was_blocked)
- tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
+ dccp_tasklet_schedule(sk);
/* restart backed-off timer */
sk_reset_timer(sk, &hc->tx_rtotimer, jiffies + hc->tx_rto);
out:
@@ -706,7 +716,7 @@ static void ccid2_hc_tx_packet_recv(stru
done:
/* check if incoming Acks allow pending packets to be sent */
if (sender_was_blocked && !ccid2_cwnd_network_limited(hc))
- tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
+ dccp_tasklet_schedule(sk);
dccp_ackvec_parsed_cleanup(&hc->tx_av_chunks);
}
--- a/net/dccp/timer.c
+++ b/net/dccp/timer.c
@@ -230,12 +230,12 @@ static void dccp_write_xmitlet(unsigned
else
dccp_write_xmit(sk);
bh_unlock_sock(sk);
+ sock_put(sk);
}
static void dccp_write_xmit_timer(unsigned long data)
{
dccp_write_xmitlet(data);
- sock_put((struct sock *)data);
}
void dccp_init_xmit_timers(struct sock *sk)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar <[email protected]>
[ Upstream commit af3e0fcf78879f718c5f73df0814951bd7057d34 ]
Use disable_irq_nosync() instead of disable_irq() as this might be
called in atomic context with netpoll.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/realtek/8139too.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -2229,7 +2229,7 @@ static void rtl8139_poll_controller(stru
struct rtl8139_private *tp = netdev_priv(dev);
const int irq = tp->pci_dev->irq;
- disable_irq(irq);
+ disable_irq_nosync(irq);
rtl8139_interrupt(irq, dev);
enable_irq(irq);
}
On 05/24/2018 02:37 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.133 release.
> There are 92 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> Anything received after that time might be too late.
>
Building s390:allmodconfig ... failed
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
(.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2
Complete report follows later.
Guenter
On Thu, May 24, 2018 at 02:02:37PM +0200, Jan Kara wrote:
> > This is in the SLES kernel for a reason, and again, it's in the section
> > that says "this should be pushed to stable". So if it's good enough for
> > the SLES kernel, why isn't it good enough for all users of this kernel
> > tree?
>
> Heh, fair enough. I guess Mel in the end didn't find patches worthy enough
> to be pushed to stable tree. But at least now I know they are well tested
> with 4.4 base so they should do no harm in the stable tree so my stance is
> closer to neutral.
>
Early on, I backported a number of performance patches to -stable with the
view to having a good baseline to compare a new mainline release with.
However, after a while some of them required unrelated backports that
would be excessive for -stable. While I could have continued backporting
some patches, I stopped as the time required to run all of the performance
tests is excessive and I was already tracking too many kernels.
--
Mel Gorman
SUSE Labs
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.133 release.
> There are 92 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Merged, compiled with -Werror, and installed onto my Pixel 2 XL and
OnePlus 5.
No initial issues noticed in dmesg or general usage.
Thanks!
Nathan
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.133 release.
> There are 92 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> Anything received after that time might be too late.
>
Build results:
total: 146 pass: 145 fail: 1
Failed builds:
s390:allmodconfig
Qemu test results:
total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
(.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2):
undefined reference to `_LC_BR_R1'
Details are available at http://kerneltests.org/builders/.
Guenter
On Thu, May 24, 2018 at 4:28 AM Greg KH <[email protected]> wrote:
> On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
> > Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
> > <[email protected]>
> > wrote:
> >
> > > On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > > > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > > > 4.4-stable review patch. If anyone has any objections, please
let me
> > know.
> > > >
> > > > Just one objection: Why does stable care about this (and the
previous
> > > > patch)? I've checked the stable queue and I don't see anything that
> > would
> > > > have these patches as a prerequisite. And on their own, they are
only
> > > > cleanups without substantial gains.
> >
> > > There's a small gain here:
> >
> > > > > paralleldd
> > > > > 4.4.0 4.4.0
> > > > > vanilla avoidlock
> > > > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > > > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > > > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > > > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > > > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > > > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > > > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > > > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > > > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > > > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> > > > >
> > > > > The impact is small but intuitively, it makes sense to avoid
> > unnecessary
> > > > > calls to lock_page.
> >
> > > Yes, it's small, but it's marked in the SLES kernel as "needs to be
> > > merged into stable", so obviously it matters to someone :)
> >
> > Hmm. I had the same reaction to these two as Jan, but assumed that they
> > made applying later patches easier, and didn't take the trouble he did
to
> > find that's not so.
> >
> > I've no wish to be disputatious, but it does seem that the definition of
> > "stable" has changed, and not necessarily for the better, if it's now a
> > home for small gains: I thought we left those to upstream.
> This is in the SLES kernel for a reason, and again, it's in the section
> that says "this should be pushed to stable". So if it's good enough for
> the SLES kernel, why isn't it good enough for all users of this kernel
> tree?
> If you all think it should be dropped in both places, that's fine with
> me :)
I think they are perfectly fine in SLES: folding in good work is a part of
what distros are about.
But I cannot find anything in stable-kernel-rules.rst that would admit them
- perhaps that's just out of date?
If -stable is to be a compendium of "this looks nice, you might like to
include it", so be it: but the rules should then be updated.
Hugh
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.133 release.
> There are 92 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> Anything received after that time might be too late.
Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.
Summary
------------------------------------------------------------------------
kernel: 4.4.133-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
git describe: v4.4.132-93-g915a3d7cdea9
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
No regressions (compared to build v4.4.132-71-g180635995c36)
Ran 6863 total tests in the following environments and test suites.
Environments
--------------
- juno-r2 - arm64
- qemu_arm
- qemu_x86_64
- x15 - arm
- x86_64
Test Suites
-----------
* boot
* kselftest
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* libhugetlbfs
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none
Summary
------------------------------------------------------------------------
kernel: 4.4.133-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.133-rc1-hikey-20180524-200
git commit: 9a1fc06621f38daaaa47feeaa1eeaf38db532433
git describe: 4.4.133-rc1-hikey-20180524-200
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.133-rc1-hikey-20180524-200
No regressions (compared to build 4.4.133-rc1-hikey-20180521-198)
Ran 2610 total tests in the following environments and test suites.
Environments
--------------
- hi6220-hikey - arm64
- qemu_arm64
Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
--
Linaro LKFT
https://lkft.linaro.org
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.133 release.
> > There are 92 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > Anything received after that time might be too late.
>
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.
>
> Summary
> ------------------------------------------------------------------------
>
> kernel: 4.4.133-rc1
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> git branch: linux-4.4.y
> git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> git describe: v4.4.132-93-g915a3d7cdea9
> Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
>
>
> No regressions (compared to build v4.4.132-71-g180635995c36)
>
Shouldn't this compare against v4.4.132 ?
I looked into kselftest/rtnetlink.sh test results.
The test history is a bit confusing.
- v4.4.131-57-ge33795f7a573 and earlier passed
- v4.4.132 failed
- v4.4.132-30-ga102378c6551 passed
...
- v4.4.132-70-gaa7ab28e9c5e passed
- v4.4.132-71-g180635995c36 and later failed
Does that mean that this specific test is unreliable ?
Thanks,
Guenter
>
> Ran 6863 total tests in the following environments and test suites.
>
> Environments
> --------------
> - juno-r2 - arm64
> - qemu_arm
> - qemu_x86_64
> - x15 - arm
> - x86_64
>
> Test Suites
> -----------
> * boot
> * kselftest
> * ltp-cap_bounds-tests
> * ltp-containers-tests
> * ltp-fcntl-locktests-tests
> * ltp-filecaps-tests
> * ltp-fs-tests
> * ltp-fs_bind-tests
> * ltp-fs_perms_simple-tests
> * ltp-fsx-tests
> * ltp-hugetlb-tests
> * ltp-io-tests
> * ltp-ipc-tests
> * ltp-math-tests
> * ltp-nptl-tests
> * ltp-pty-tests
> * ltp-sched-tests
> * ltp-securebits-tests
> * ltp-syscalls-tests
> * ltp-timers-tests
> * libhugetlbfs
> * kselftest-vsyscall-mode-native
> * kselftest-vsyscall-mode-none
>
> Summary
> ------------------------------------------------------------------------
>
> kernel: 4.4.133-rc1
> git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
> git branch: 4.4.133-rc1-hikey-20180524-200
> git commit: 9a1fc06621f38daaaa47feeaa1eeaf38db532433
> git describe: 4.4.133-rc1-hikey-20180524-200
> Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.133-rc1-hikey-20180524-200
>
>
> No regressions (compared to build 4.4.133-rc1-hikey-20180521-198)
>
>
> Ran 2610 total tests in the following environments and test suites.
>
> Environments
> --------------
> - hi6220-hikey - arm64
> - qemu_arm64
>
> Test Suites
> -----------
> * boot
> * kselftest
> * libhugetlbfs
> * ltp-cap_bounds-tests
> * ltp-containers-tests
> * ltp-fcntl-locktests-tests
> * ltp-filecaps-tests
> * ltp-fs-tests
> * ltp-fs_bind-tests
> * ltp-fs_perms_simple-tests
> * ltp-fsx-tests
> * ltp-hugetlb-tests
> * ltp-io-tests
> * ltp-ipc-tests
> * ltp-math-tests
> * ltp-nptl-tests
> * ltp-pty-tests
> * ltp-sched-tests
> * ltp-securebits-tests
> * ltp-syscalls-tests
> * ltp-timers-tests
>
> --
> Linaro LKFT
> https://lkft.linaro.org
On Thu, May 24, 2018 at 10:27:59AM -0700, Hugh Dickins wrote:
> On Thu, May 24, 2018 at 4:28 AM Greg KH <[email protected]> wrote:
> > On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
> > > Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
> > > <[email protected]>
> > > wrote:
> > >
> > > > On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > > > > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > > > > 4.4-stable review patch. If anyone has any objections, please
> let me
> > > know.
> > > > >
> > > > > Just one objection: Why does stable care about this (and the
> previous
> > > > > patch)? I've checked the stable queue and I don't see anything that
> > > would
> > > > > have these patches as a prerequisite. And on their own, they are
> only
> > > > > cleanups without substantial gains.
> > >
> > > > There's a small gain here:
> > >
> > > > > > paralleldd
> > > > > > 4.4.0 4.4.0
> > > > > > vanilla avoidlock
> > > > > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> > > > > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> > > > > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> > > > > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> > > > > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> > > > > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> > > > > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> > > > > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> > > > > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> > > > > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
> > > > > >
> > > > > > The impact is small but intuitively, it makes sense to avoid
> > > unnecessary
> > > > > > calls to lock_page.
> > >
> > > > Yes, it's small, but it's marked in the SLES kernel as "needs to be
> > > > merged into stable", so obviously it matters to someone :)
> > >
> > > Hmm. I had the same reaction to these two as Jan, but assumed that they
> > > made applying later patches easier, and didn't take the trouble he did
> to
> > > find that's not so.
> > >
> > > I've no wish to be disputatious, but it does seem that the definition of
> > > "stable" has changed, and not necessarily for the better, if it's now a
> > > home for small gains: I thought we left those to upstream.
>
> > This is in the SLES kernel for a reason, and again, it's in the section
> > that says "this should be pushed to stable". So if it's good enough for
> > the SLES kernel, why isn't it good enough for all users of this kernel
> > tree?
>
> > If you all think it should be dropped in both places, that's fine with
> > me :)
>
> I think they are perfectly fine in SLES: folding in good work is a part of
> what distros are about.
And it's also what stable is for. We have had backports of performance
improvements in the past, along with lots of other things over the
years. This is a performance improvement. A tiny one, yes, but getting
rid of a lock is a good thing, and I picked it up as part of my review
of what a distro decided was worth adding for their users, as that's a
huge signal that might be of value to others.
> But I cannot find anything in stable-kernel-rules.rst that would admit them
> - perhaps that's just out of date?
Nope, that's the list I use to say "no" to. You can't describe
everything in that file, it's a judgement call.
> If -stable is to be a compendium of "this looks nice, you might like to
> include it", so be it: but the rules should then be updated.
This is a "a bunch of people I trust took it in their kernel, and it's
been running on zillion of machines for a while and causes no harm and a
slight benefit, so let's add it!" type of thing. It's not the only
patch in this series that was like that, but for some reason this one is
the one the triggered the debate, which is funny to me as this does have
numbers in it showing that it is an improvement :)
thanks,
greg k-h
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.133 release.
> > There are 92 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > Anything received after that time might be too late.
>
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.
>
> Summary
> ------------------------------------------------------------------------
>
> kernel: 4.4.133-rc1
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> git branch: linux-4.4.y
> git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> git describe: v4.4.132-93-g915a3d7cdea9
> Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
>
>
> No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2
LTP tests that we previously were not passing. I don't know why you all
were not reporting that in the past, it took someone else randomly
deciding to run LTP to report it to me...
Why did an improvement in results not show up?
thanks,
greg k-h
On 05/24/2018 03:37 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.133 release.
> There are 92 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Compiled and booted on my test system. No dmesg regressions.
thanks,
-- Shuah
On Thu, May 24, 2018 at 12:06 PM Greg KH <[email protected]> wrote:
> On Thu, May 24, 2018 at 10:27:59AM -0700, Hugh Dickins wrote:
> > On Thu, May 24, 2018 at 4:28 AM Greg KH <[email protected]>
wrote:
> > > On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
> > > > Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman
> > > > <[email protected]>
> > > > wrote:
> > > >
> > > > > On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
> > > > > > On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> > > > > > > 4.4-stable review patch. If anyone has any objections, please
> > let me
> > > > know.
> > > > > >
> > > > > > Just one objection: Why does stable care about this (and the
> > previous
> > > > > > patch)? I've checked the stable queue and I don't see anything
that
> > > > would
> > > > > > have these patches as a prerequisite. And on their own, they are
> > only
> > > > > > cleanups without substantial gains.
> > > >
> > > > > There's a small gain here:
> > > >
> > > > > > > paralleldd
> > > > > > > 4.4.0
4.4.0
> > > > > > > vanilla
avoidlock
> > > > > > > Amean Elapsd-1 5.28 ( 0.00%) 5.15 (
2.50%)
> > > > > > > Amean Elapsd-4 5.29 ( 0.00%) 5.17 (
2.12%)
> > > > > > > Amean Elapsd-7 5.28 ( 0.00%) 5.18 (
1.78%)
> > > > > > > Amean Elapsd-12 5.20 ( 0.00%) 5.33 (
-2.50%)
> > > > > > > Amean Elapsd-21 5.14 ( 0.00%) 5.21 (
-1.41%)
> > > > > > > Amean Elapsd-30 5.30 ( 0.00%) 5.12 (
3.38%)
> > > > > > > Amean Elapsd-48 5.78 ( 0.00%) 5.42 (
6.21%)
> > > > > > > Amean Elapsd-79 6.78 ( 0.00%) 6.62 (
2.46%)
> > > > > > > Amean Elapsd-110 9.09 ( 0.00%) 8.99 (
1.15%)
> > > > > > > Amean Elapsd-128 10.60 ( 0.00%) 10.43 (
1.66%)
> > > > > > >
> > > > > > > The impact is small but intuitively, it makes sense to avoid
> > > > unnecessary
> > > > > > > calls to lock_page.
> > > >
> > > > > Yes, it's small, but it's marked in the SLES kernel as "needs to
be
> > > > > merged into stable", so obviously it matters to someone :)
> > > >
> > > > Hmm. I had the same reaction to these two as Jan, but assumed that
they
> > > > made applying later patches easier, and didn't take the trouble he
did
> > to
> > > > find that's not so.
> > > >
> > > > I've no wish to be disputatious, but it does seem that the
definition of
> > > > "stable" has changed, and not necessarily for the better, if it's
now a
> > > > home for small gains: I thought we left those to upstream.
> >
> > > This is in the SLES kernel for a reason, and again, it's in the
section
> > > that says "this should be pushed to stable". So if it's good enough
for
> > > the SLES kernel, why isn't it good enough for all users of this kernel
> > > tree?
> >
> > > If you all think it should be dropped in both places, that's fine with
> > > me :)
> >
> > I think they are perfectly fine in SLES: folding in good work is a part
of
> > what distros are about.
> And it's also what stable is for. We have had backports of performance
> improvements in the past, along with lots of other things over the
> years. This is a performance improvement. A tiny one, yes, but getting
> rid of a lock is a good thing, and I picked it up as part of my review
> of what a distro decided was worth adding for their users, as that's a
> huge signal that might be of value to others.
> > But I cannot find anything in stable-kernel-rules.rst that would admit
them
> > - perhaps that's just out of date?
> Nope, that's the list I use to say "no" to. You can't describe
> everything in that file, it's a judgement call.
> > If -stable is to be a compendium of "this looks nice, you might like to
> > include it", so be it: but the rules should then be updated.
> This is a "a bunch of people I trust took it in their kernel, and it's
> been running on zillion of machines for a while and causes no harm and a
> slight benefit, so let's add it!" type of thing. It's not the only
> patch in this series that was like that, but for some reason this one is
> the one the triggered the debate, which is funny to me as this does have
> numbers in it showing that it is an improvement :)
Thank you for looking after the -stable trees: please let me not waste your
time any further. I have no specific objection to the two patches, which
are certainly not egregious offenders. But I do still find the disconnect
between stable-kernel-rules.rst and reality confusing - or perhaps I just
find reality confusing :)
Hugh
On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.133 release.
> > There are 92 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > Anything received after that time might be too late.
> >
> Build results:
> total: 146 pass: 145 fail: 1
> Failed builds:
> s390:allmodconfig
> Qemu test results:
> total: 127 pass: 127 fail: 0
>
> Build error (s390:allmodconfig):
>
> arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
> (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2):
> undefined reference to `_LC_BR_R1'
I'll look into the s390 stuff in the morning, I think I know what I
messed up there...
thanks,
greg k-h
> > kernel: 4.4.133-rc1
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > git branch: linux-4.4.y
> > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> > git describe: v4.4.132-93-g915a3d7cdea9
> > Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
> >
> >
> > No regressions (compared to build v4.4.132-71-g180635995c36)
>
> It should have gotten better, as there was a fix in here for at least 2
> LTP tests that we previously were not passing. I don't know why you all
> were not reporting that in the past, it took someone else randomly
> deciding to run LTP to report it to me...
>
> Why did an improvement in results not show up?
Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
(commit 49eea433b326 in mainline) so this "fix" is changing the
timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
ARM64. Needs review though :\
On 24 May 2018 at 23:47, Guenter Roeck <[email protected]> wrote:
> On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
>> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
>> > This is the start of the stable review cycle for the 4.4.133 release.
>> > There are 92 patches in this series, all will be posted as a response
>> > to this one. If anyone has any issues with these being applied, please
>> > let me know.
>> >
>> > Responses should be made by Sat May 26 09:31:28 UTC 2018.
>> > Anything received after that time might be too late.
>>
>> Results from Linaro’s test farm.
>> No regressions on arm64, arm and x86_64.
>>
>> Summary
>> ------------------------------------------------------------------------
>>
>> kernel: 4.4.133-rc1
>> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> git branch: linux-4.4.y
>> git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
>> git describe: v4.4.132-93-g915a3d7cdea9
>> Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
>>
>>
>> No regressions (compared to build v4.4.132-71-g180635995c36)
>>
>
> Shouldn't this compare against v4.4.132 ?
>
> I looked into kselftest/rtnetlink.sh test results.
> The test history is a bit confusing.
>
> - v4.4.131-57-ge33795f7a573 and earlier passed
> - v4.4.132 failed
> - v4.4.132-30-ga102378c6551 passed
> ...
> - v4.4.132-70-gaa7ab28e9c5e passed
> - v4.4.132-71-g180635995c36 and later failed
>
> Does that mean that this specific test is unreliable ?
kselftest rtnetlink.sh test case failure is not a regression on 4.16,
4.14, 4.9 and 4.4 builds.
Because it used to skip due to missing tc tool.
Now the ''tc' tool added to Open Embedded build and test running and
reported failed.
This is not a regression in the kernel.
It is a change in the user space.
Old output
============
SKIP: Could not run test without the tc tool
selftests: rtnetlink.sh [PASS]
https://lkft.validation.linaro.org/scheduler/job/225015#L2255
New output
=============
RTNETLINK answers: Operation not supported
Cannot find device \"test-dummy0\"
FAIL: cannot add dummy interface
selftests: rtnetlink.sh [FAIL]
https://lkft.validation.linaro.org/scheduler/job/226352#L3375
Ref bug link:
https://bugs.linaro.org/show_bug.cgi?id=3834
- Naresh
On 05/24/2018 03:34 PM, Naresh Kamboju wrote:
> On 24 May 2018 at 23:47, Guenter Roeck <[email protected]> wrote:
>> On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
>>> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
>>>> This is the start of the stable review cycle for the 4.4.133 release.
>>>> There are 92 patches in this series, all will be posted as a response
>>>> to this one. If anyone has any issues with these being applied, please
>>>> let me know.
>>>>
>>>> Responses should be made by Sat May 26 09:31:28 UTC 2018.
>>>> Anything received after that time might be too late.
>>>
>>> Results from Linaro’s test farm.
>>> No regressions on arm64, arm and x86_64.
>>>
>>> Summary
>>> ------------------------------------------------------------------------
>>>
>>> kernel: 4.4.133-rc1
>>> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>>> git branch: linux-4.4.y
>>> git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
>>> git describe: v4.4.132-93-g915a3d7cdea9
>>> Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
>>>
>>>
>>> No regressions (compared to build v4.4.132-71-g180635995c36)
>>>
>>
>> Shouldn't this compare against v4.4.132 ?
>>
>> I looked into kselftest/rtnetlink.sh test results.
>> The test history is a bit confusing.
>>
>> - v4.4.131-57-ge33795f7a573 and earlier passed
>> - v4.4.132 failed
>> - v4.4.132-30-ga102378c6551 passed
>> ...
>> - v4.4.132-70-gaa7ab28e9c5e passed
>> - v4.4.132-71-g180635995c36 and later failed
>>
>> Does that mean that this specific test is unreliable ?
>
> kselftest rtnetlink.sh test case failure is not a regression on 4.16,
> 4.14, 4.9 and 4.4 builds.
> Because it used to skip due to missing tc tool.
>
> Now the ''tc' tool added to Open Embedded build and test running and
> reported failed.
> This is not a regression in the kernel.
> It is a change in the user space.
>
> Old output
> ============
> SKIP: Could not run test without the tc tool
> selftests: rtnetlink.sh [PASS]
> https://lkft.validation.linaro.org/scheduler/job/225015#L2255
>
>
> New output
> =============
> RTNETLINK answers: Operation not supported
> Cannot find device \"test-dummy0\"
> FAIL: cannot add dummy interface
> selftests: rtnetlink.sh [FAIL]
> https://lkft.validation.linaro.org/scheduler/job/226352#L3375
>
> Ref bug link:
> https://bugs.linaro.org/show_bug.cgi?id=3834
>
> - Naresh
>
Which kselftest versdion do you run? Is this from linux-next?
thanks,
-- Shuah
On Thu, May 24, 2018 at 03:52:49PM -0600, Shuah Khan wrote:
> On 05/24/2018 03:34 PM, Naresh Kamboju wrote:
> > On 24 May 2018 at 23:47, Guenter Roeck <[email protected]> wrote:
> >> On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
> >>> On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> >>>> This is the start of the stable review cycle for the 4.4.133 release.
> >>>> There are 92 patches in this series, all will be posted as a response
> >>>> to this one. If anyone has any issues with these being applied, please
> >>>> let me know.
> >>>>
> >>>> Responses should be made by Sat May 26 09:31:28 UTC 2018.
> >>>> Anything received after that time might be too late.
> >>>
> >>> Results from Linaro’s test farm.
> >>> No regressions on arm64, arm and x86_64.
> >>>
> >>> Summary
> >>> ------------------------------------------------------------------------
> >>>
> >>> kernel: 4.4.133-rc1
> >>> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> >>> git branch: linux-4.4.y
> >>> git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> >>> git describe: v4.4.132-93-g915a3d7cdea9
> >>> Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
> >>>
> >>>
> >>> No regressions (compared to build v4.4.132-71-g180635995c36)
> >>>
> >>
> >> Shouldn't this compare against v4.4.132 ?
> >>
> >> I looked into kselftest/rtnetlink.sh test results.
> >> The test history is a bit confusing.
> >>
> >> - v4.4.131-57-ge33795f7a573 and earlier passed
> >> - v4.4.132 failed
> >> - v4.4.132-30-ga102378c6551 passed
> >> ...
> >> - v4.4.132-70-gaa7ab28e9c5e passed
> >> - v4.4.132-71-g180635995c36 and later failed
> >>
> >> Does that mean that this specific test is unreliable ?
> >
> > kselftest rtnetlink.sh test case failure is not a regression on 4.16,
> > 4.14, 4.9 and 4.4 builds.
> > Because it used to skip due to missing tc tool.
> >
> > Now the ''tc' tool added to Open Embedded build and test running and
> > reported failed.
> > This is not a regression in the kernel.
> > It is a change in the user space.
> >
> > Old output
> > ============
> > SKIP: Could not run test without the tc tool
> > selftests: rtnetlink.sh [PASS]
> > https://lkft.validation.linaro.org/scheduler/job/225015#L2255
> >
> >
> > New output
> > =============
> > RTNETLINK answers: Operation not supported
> > Cannot find device \"test-dummy0\"
> > FAIL: cannot add dummy interface
> > selftests: rtnetlink.sh [FAIL]
> > https://lkft.validation.linaro.org/scheduler/job/226352#L3375
> >
> > Ref bug link:
> > https://bugs.linaro.org/show_bug.cgi?id=3834
> >
> > - Naresh
> >
>
> Which kselftest versdion do you run? Is this from linux-next?
It's using kselftest from 4.16 (latest stable, as a rule). I think this
patch will fix the 'Operation not supported' issue:
https://patchwork.kernel.org/patch/10424807/
Dan
On Thu, May 24, 2018 at 09:08:06PM +0200, Greg Kroah-Hartman wrote:
> On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
> > On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.4.133 release.
> > > There are 92 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > > Anything received after that time might be too late.
> >
> > Results from Linaro’s test farm.
> > No regressions on arm64, arm and x86_64.
> >
> > Summary
> > ------------------------------------------------------------------------
> >
> > kernel: 4.4.133-rc1
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > git branch: linux-4.4.y
> > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> > git describe: v4.4.132-93-g915a3d7cdea9
> > Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
> >
> >
> > No regressions (compared to build v4.4.132-71-g180635995c36)
>
> It should have gotten better, as there was a fix in here for at least 2
> LTP tests that we previously were not passing. I don't know why you all
> were not reporting that in the past, it took someone else randomly
> deciding to run LTP to report it to me...
>
> Why did an improvement in results not show up?
Our normal process for failing tests is to skip them and then
fix/report/etc separately until they're fixed, at which time we
re-enable them. Otherwise, we would have to wade through too many
failures in every result set, and we could miss actual regressions.
We've never really reported things as they've been fixed - just
immediate regressions.
However! We are working on 'known issue' support in qa-reports (SQUAD),
so that we can run failing tests and the system will mark them as a
known failure. This will allow us to keep our baseline 'green', and also
let us run the failing tests so that we can see in realtime as issues
get fixed. Once we have that, the only tests that we'll carry on our
skiplists are those that cause boards to crash or reboot.
If you have the test cases in mind, I can check them. Otherwise, I can
run our full set of tests in staging (without a skiplist) tomorrow and
see if any are now passing.
Dan
Hello Rafael,
The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
https://www.spinics.net/lists/stable/msg239475.html
Looking at
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
I see that these two tests (and other important tests as well) are being SKIPPED.
By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
Thanks,
Daniel Sangorrin
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Rafael Tinoco
> Sent: Friday, May 25, 2018 5:32 AM
> To: Greg Kroah-Hartman <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
>
> > > kernel: 4.4.133-rc1
> > > git repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > > git branch: linux-4.4.y
> > > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> > > git describe: v4.4.132-93-g915a3d7cdea9
> > > Test details:
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
> 3d7cdea9
> > >
> > >
> > > No regressions (compared to build v4.4.132-71-g180635995c36)
> >
> > It should have gotten better, as there was a fix in here for at least 2
> > LTP tests that we previously were not passing. I don't know why you all
> > were not reporting that in the past, it took someone else randomly
> > deciding to run LTP to report it to me...
> >
> > Why did an improvement in results not show up?
>
> Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
> I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
> (commit 49eea433b326 in mainline) so this "fix" is changing the
> timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
> ARM64. Needs review though :\
Thank you Daniel! Will investigate those.
Meanwhile, Greg, I referred to:
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Since we're not using this type of clock on arm64's 4.4 kernel vdso
functions. This commit's description calls attention for it to be
responsible for fixing kselftest flacking tests, we wouldn't get that
on 4.4 according to bellow:
stable-rc-linux-4.14.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.16.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.4.y
<none>
stable-rc-linux-4.9.y
99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
Yet, the second fix was backported to all (including 4.4.y):
stable-rc-linux-4.14.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.16.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.4.y
7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.9.y
a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Not sure you want to keep it in 4.4, thought it was worth mentioning it.
Cheers.
On 24 May 2018 at 22:34, Daniel Sangorrin
<[email protected]> wrote:
> Hello Rafael,
>
> The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
> https://www.spinics.net/lists/stable/msg239475.html
>
> Looking at
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
> I see that these two tests (and other important tests as well) are being SKIPPED.
>
> By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
>
> Thanks,
> Daniel Sangorrin
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On
>> Behalf Of Rafael Tinoco
>> Sent: Friday, May 25, 2018 5:32 AM
>> To: Greg Kroah-Hartman <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
>>
>> > > kernel: 4.4.133-rc1
>> > > git repo:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> > > git branch: linux-4.4.y
>> > > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
>> > > git describe: v4.4.132-93-g915a3d7cdea9
>> > > Test details:
>> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
>> 3d7cdea9
>> > >
>> > >
>> > > No regressions (compared to build v4.4.132-71-g180635995c36)
>> >
>> > It should have gotten better, as there was a fix in here for at least 2
>> > LTP tests that we previously were not passing. I don't know why you all
>> > were not reporting that in the past, it took someone else randomly
>> > deciding to run LTP to report it to me...
>> >
>> > Why did an improvement in results not show up?
>>
>> Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
>> I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
>> (commit 49eea433b326 in mainline) so this "fix" is changing the
>> timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
>> ARM64. Needs review though :\
>
>
>
> -----Original Message-----
> From: Rafael Tinoco [mailto:[email protected]]
>
> Thank you Daniel! Will investigate those.
OK, thank you :).
Notice that I did those tests on x86_64. It seems you are testing on arm, so there may be some differences.
I just checked these tests on 4.4.133 (on x86_64):
fcntl35: PASS
fcntl35_64: PASS
select04: PASS
I am currently investigating other tests that are failing as well. They are not regressions, just some patches have not been backported yet.
Thanks,
Daniel
>
> Meanwhile, Greg, I referred to:
>
> time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
>
> Since we're not using this type of clock on arm64's 4.4 kernel vdso
> functions. This commit's description calls attention for it to be
> responsible for fixing kselftest flacking tests, we wouldn't get that
> on 4.4 according to bellow:
>
> stable-rc-linux-4.14.y
> dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
> 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in
> clock_gettime() vDSO
> 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
> 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
>
> stable-rc-linux-4.16.y
> dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
> 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in
> clock_gettime() vDSO
> 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
> 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
>
> stable-rc-linux-4.4.y
> <none>
>
> stable-rc-linux-4.9.y
> 99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
> 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in
> clock_gettime() vDSO
> 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
> 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
>
> Yet, the second fix was backported to all (including 4.4.y):
>
> stable-rc-linux-4.14.y
> 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
> stable-rc-linux-4.16.y
> 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
> stable-rc-linux-4.4.y
> 7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
> stable-rc-linux-4.9.y
> a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
>
> Not sure you want to keep it in 4.4, thought it was worth mentioning it.
>
> Cheers.
>
> On 24 May 2018 at 22:34, Daniel Sangorrin
> <[email protected]> wrote:
> > Hello Rafael,
> >
> > The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
> > https://www.spinics.net/lists/stable/msg239475.html
> >
> > Looking at
> >
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
> 3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
> > I see that these two tests (and other important tests as well) are being SKIPPED.
> >
> > By the way, I see that select04 FAILS in your case. But in my setup, select04 was
> working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
> >
> > Thanks,
> > Daniel Sangorrin
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:[email protected]]
> On
> >> Behalf Of Rafael Tinoco
> >> Sent: Friday, May 25, 2018 5:32 AM
> >> To: Greg Kroah-Hartman <[email protected]>
> >> Cc: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
> >>
> >> > > kernel: 4.4.133-rc1
> >> > > git repo:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> >> > > git branch: linux-4.4.y
> >> > > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> >> > > git describe: v4.4.132-93-g915a3d7cdea9
> >> > > Test details:
> >>
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
> >> 3d7cdea9
> >> > >
> >> > >
> >> > > No regressions (compared to build v4.4.132-71-g180635995c36)
> >> >
> >> > It should have gotten better, as there was a fix in here for at least 2
> >> > LTP tests that we previously were not passing. I don't know why you all
> >> > were not reporting that in the past, it took someone else randomly
> >> > deciding to run LTP to report it to me...
> >> >
> >> > Why did an improvement in results not show up?
> >>
> >> Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
> >> I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
> >> (commit 49eea433b326 in mainline) so this "fix" is changing the
> >> timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
> >> ARM64. Needs review though :\
> >
> >
> >
Hi Daniel,
On 25 May 2018 at 07:04, Daniel Sangorrin
<[email protected]> wrote:
> Hello Rafael,
>
> The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
> https://www.spinics.net/lists/stable/msg239475.html
Thanks for the patch.
Now i have manually tested LTP syscalls and confirms,
fcntl35 and fcntl35_64 pass on qemu_x86_64, (arm64) Hikey, Juno and (arm32) x15.
Linux version 4.4.133-rc1 (buildslave@x86-64-07) (gcc version 6.2.1 20161016
(Linaro GCC 6.2-2016.11) ) #1 SMP Thu May 24 10:24:11 UTC 2018
[ 131.873912] LTP: starting fcntl35
tst_test.c:980: INFO: Timeout per run is 0h 05m 00s
fcntl35.c:101: PASS: an unprivileged user init the capacity of a pipe
to 4096 successfully
fcntl35.c:101: PASS: a privileged user init the capacity of a pipe to
65536 successfully
Summary:
passed 2
failed 0
skipped 0
[ 132.090096] LTP: starting fcntl35_64
incrementing stop
tst_test.c:980: INFO: Timeout per run is 0h 05m 00s
fcntl35.c:101: PASS: an unprivileged user init the capacity of a pipe
to 4096 successfully
fcntl35.c:101: PASS: a privileged user init the capacity of a pipe to
65536 successfully
Summary:
passed 2
failed 0
skipped 0
warnings 0
>
> Looking at
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
> I see that these two tests (and other important tests as well) are being SKIPPED.
fcntl35 and fcntl35_64 will be unskipped from now on.
>
> By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
select04 failed on (slow) qemu_arm only and PASS on real hardware of
arm32 x15, arm64 Juno and x86_64.
Test case verdict comparison across all boards and qemu,
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls-tests/select04
LTP select04 failed log on slow qemu_arm32,
tst_test.c:980: INFO: Timeout per run is 0h 15m 00s
tst_timer_test.c:356: INFO: CLOCK_MONOTONIC resolution 1ns
tst_timer_test.c:368: INFO: prctl(PR_GET_TIMERSLACK) = 50us
tst_timer_test.c:275: INFO: select() sleeping for 1000us 500
iterations, threshold 450.01us
tst_timer_test.c:318: INFO: min 1336us, max 2282us, median 1522us,
trunc mean 1542.82us (discarded 25)
tst_timer_test.c:321: FAIL: select() slept for too long
Full log: https://lkft.validation.linaro.org/scheduler/job/228569#L8792
Best regards
Naresh Kamboju
On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
> On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
> > On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.4.133 release.
> > > There are 92 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > Build results:
> > total: 146 pass: 145 fail: 1
> > Failed builds:
> > s390:allmodconfig
> > Qemu test results:
> > total: 127 pass: 127 fail: 0
> >
> > Build error (s390:allmodconfig):
> >
> > arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
> > (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2):
> > undefined reference to `_LC_BR_R1'
>
> I'll look into the s390 stuff in the morning, I think I know what I
> messed up there...
Nope, I don't think this was my fault :)
Oddly 'defconfig' worked for s390 with the offending patch, but not
allmodconfig. I've now dropped the patch I think was causing the
problem and pushed out a -rc2.
thanks,
greg k-h
On Fri, May 25, 2018 at 04:11:45PM +0200, Greg Kroah-Hartman wrote:
> On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
> > > On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > > > This is the start of the stable review cycle for the 4.4.133 release.
> > > > There are 92 patches in this series, all will be posted as a response
> > > > to this one. If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > > > Anything received after that time might be too late.
> > > >
> > > Build results:
> > > total: 146 pass: 145 fail: 1
> > > Failed builds:
> > > s390:allmodconfig
> > > Qemu test results:
> > > total: 127 pass: 127 fail: 0
> > >
> > > Build error (s390:allmodconfig):
> > >
> > > arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
> > > (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2):
> > > undefined reference to `_LC_BR_R1'
> >
> > I'll look into the s390 stuff in the morning, I think I know what I
> > messed up there...
>
> Nope, I don't think this was my fault :)
>
> Oddly 'defconfig' worked for s390 with the offending patch, but not
> allmodconfig. I've now dropped the patch I think was causing the
> problem and pushed out a -rc2.
>
Confirmed to build with v4.4.132-92-g8330f2b.
Guenter
On Fri, May 25, 2018 at 09:39:19AM -0700, Guenter Roeck wrote:
> On Fri, May 25, 2018 at 04:11:45PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
> > > On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
> > > > On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.4.133 release.
> > > > > There are 92 patches in this series, all will be posted as a response
> > > > > to this one. If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > >
> > > > > Responses should be made by Sat May 26 09:31:28 UTC 2018.
> > > > > Anything received after that time might be too late.
> > > > >
> > > > Build results:
> > > > total: 146 pass: 145 fail: 1
> > > > Failed builds:
> > > > s390:allmodconfig
> > > > Qemu test results:
> > > > total: 127 pass: 127 fail: 0
> > > >
> > > > Build error (s390:allmodconfig):
> > > >
> > > > arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1':
> > > > (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2):
> > > > undefined reference to `_LC_BR_R1'
> > >
> > > I'll look into the s390 stuff in the morning, I think I know what I
> > > messed up there...
> >
> > Nope, I don't think this was my fault :)
> >
> > Oddly 'defconfig' worked for s390 with the offending patch, but not
> > allmodconfig. I've now dropped the patch I think was causing the
> > problem and pushed out a -rc2.
> >
> Confirmed to build with v4.4.132-92-g8330f2b.
Wonderful, thanks for testing and letting me know.
greg k-h
On Thu, 2018-05-24 at 11:37 +0200, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Xin Long <[email protected]>
>
> [ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
>
> Now sctp only delays the authentication for the normal cookie-echo
> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
> authentication first based on the old asoc, which will definitely
> fail due to the different auth info in the old asoc.
[...]
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor
> struct sctp_endpoint *ep;
> struct sctp_chunk *chunk;
> struct sctp_inq *inqueue;
> - int state;
> sctp_subtype_t subtype;
> + int first_time = 1; /* is this the first time through the loop */
> int error = 0;
> + int state;
>
> /* The association should be held so we should be safe. */
> ep = asoc->ep;
> @@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor
> state = asoc->state;
> subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>
> + /* If the first chunk in the packet is AUTH, do special
> + * processing specified in Section 6.3 of SCTP-AUTH spec
> + */
> + if (first_time && subtype.chunk == SCTP_CID_AUTH) {
> + struct sctp_chunkhdr *next_hdr;
> +
> + next_hdr = sctp_inq_peek(inqueue);
> + if (!next_hdr)
> + goto normal;
> +
> + /* If the next chunk is COOKIE-ECHO, skip the AUTH
> + * chunk while saving a pointer to it so we can do
> + * Authentication later (during cookie-echo
> + * processing).
> + */
> + if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
> + chunk->auth_chunk = skb_clone(chunk->skb,
> + GFP_ATOMIC);
> + chunk->auth = 1;
Doesn't the first_time flag need to be cleared here (and before the
other continue statement in this loop)?
Ben.
> + continue;
> + }
> + }
> +
> +normal:
[...]
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
On Wed, Jun 06, 2018 at 11:31:47PM +0100, Ben Hutchings wrote:
> On Thu, 2018-05-24 at 11:37 +0200, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.??If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Xin Long <[email protected]>
> >
> > [ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
> >
> > Now sctp only delays the authentication for the normal cookie-echo
> > chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
> > for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
> > authentication first based on the old asoc, which will definitely
> > fail due to the different auth info in the old asoc.
> [...]
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor
> > ? struct sctp_endpoint *ep;
> > ? struct sctp_chunk *chunk;
> > ? struct sctp_inq *inqueue;
> > - int state;
> > ? sctp_subtype_t subtype;
> > + int first_time = 1; /* is this the first time through the loop */
> > ? int error = 0;
> > + int state;
> > ?
> > ? /* The association should be held so we should be safe. */
> > ? ep = asoc->ep;
> > @@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor
> > ? state = asoc->state;
> > ? subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
> > ?
> > + /* If the first chunk in the packet is AUTH, do special
> > + ?* processing specified in Section 6.3 of SCTP-AUTH spec
> > + ?*/
> > + if (first_time && subtype.chunk == SCTP_CID_AUTH) {
> > + struct sctp_chunkhdr *next_hdr;
> > +
> > + next_hdr = sctp_inq_peek(inqueue);
> > + if (!next_hdr)
> > + goto normal;
> > +
> > + /* If the next chunk is COOKIE-ECHO, skip the AUTH
> > + ?* chunk while saving a pointer to it so we can do
> > + ?* Authentication later (during cookie-echo
> > + ?* processing).
> > + ?*/
> > + if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
> > + chunk->auth_chunk = skb_clone(chunk->skb,
> > + ??????GFP_ATOMIC);
> > + chunk->auth = 1;
>
> Doesn't the first_time flag need to be cleared here (and before the
> other continue statement in this loop)?
Seems the description is not matching the code closely. As is,
first_time is about the first time an AUTH chunk is handled followed
by a COOKIE-ECHO chunk (which is what we wanted, in the end), and not
strictly enforcing 'first chunk in the packet', as the description
says.
We should rename this first_time into a chunk counter instead. It
may even help with debugging on crashes.
Thanks for reviewing this, btw.
Marcelo
>
> Ben.
>
> > + continue;
> > + }
> > + }
> > +
> > +normal:
> [...]
>
> --
> Ben Hutchings, Software Developer ? Codethink Ltd
> https://www.codethink.co.uk/ Dale House, 35 Dale Street
> Manchester, M1 2HF, United Kingdom
Hi!
> > > > I've no wish to be disputatious, but it does seem that the definition of
> > > > "stable" has changed, and not necessarily for the better, if it's now a
> > > > home for small gains: I thought we left those to upstream.
> >
> > > This is in the SLES kernel for a reason, and again, it's in the section
> > > that says "this should be pushed to stable". So if it's good enough for
> > > the SLES kernel, why isn't it good enough for all users of this kernel
> > > tree?
> >
> > > If you all think it should be dropped in both places, that's fine with
> > > me :)
> >
> > I think they are perfectly fine in SLES: folding in good work is a part of
> > what distros are about.
>
> And it's also what stable is for. We have had backports of performance
> improvements in the past, along with lots of other things over the
> years. This is a performance improvement. A tiny one, yes, but getting
> rid of a lock is a good thing, and I picked it up as part of my review
> of what a distro decided was worth adding for their users, as that's a
> huge signal that might be of value to others.
>
> > But I cannot find anything in stable-kernel-rules.rst that would admit them
> > - perhaps that's just out of date?
>
> Nope, that's the list I use to say "no" to. You can't describe
> everything in that file, it's a judgement call.
Well, it would be good if the documentation matched reality, because other people
use the documentation to decide, too.
For example, documentation says bug has to be fixed in mainline, but in actual practice
you try to have exactly the same patch.
Pavel