LinuxLists.cc - [PATCH 4.4 00/64] 4.4.244-rc1 review

2020-11-17 13:12:13

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 00/64] 4.4.244-rc1 review

This is the start of the stable review cycle for the 4.4.244 release.
There are 64 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.244-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.244-rc1

Boris Protopopov <[email protected]>
Convert trailing spaces and periods in path components

Eric Biggers <[email protected]>
ext4: fix leaking sysfs kobject after failed mount

Matteo Croce <[email protected]>
reboot: fix overflow parsing reboot cpu number

Matteo Croce <[email protected]>
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"

Jiri Olsa <[email protected]>
perf/core: Fix race in the perf_mmap_close() function

Juergen Gross <[email protected]>
xen/events: block rogue events for some time

Juergen Gross <[email protected]>
xen/events: defer eoi in case of excessive number of events

Juergen Gross <[email protected]>
xen/events: use a common cpu hotplug hook for event channels

Juergen Gross <[email protected]>
xen/events: switch user event channels to lateeoi model

Juergen Gross <[email protected]>
xen/pciback: use lateeoi irq binding

Juergen Gross <[email protected]>
xen/scsiback: use lateeoi irq binding

Juergen Gross <[email protected]>
xen/netback: use lateeoi irq binding

Juergen Gross <[email protected]>
xen/blkback: use lateeoi irq binding

Juergen Gross <[email protected]>
xen/events: add a new "late EOI" evtchn framework

Juergen Gross <[email protected]>
xen/events: fix race in evtchn_fifo_unmask()

Juergen Gross <[email protected]>
xen/events: add a proper barrier to 2-level uevent unmasking

Juergen Gross <[email protected]>
xen/events: avoid removing an event channel while handling it

Anand K Mistry <[email protected]>
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

George Spelvin <[email protected]>
random32: make prandom_u32() output unpredictable

Mao Wenan <[email protected]>
net: Update window_clamp if SOCK_RCVBUF is set

Martin Schiller <[email protected]>
net/x25: Fix null-ptr-deref in x25_connect

Ursula Braun <[email protected]>
net/af_iucv: fix null pointer dereference on shutdown

Oliver Herms <[email protected]>
IPv6: Set SIT tunnel hard_header_len to zero

Stefano Stabellini <[email protected]>
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"

Coiby Xu <[email protected]>
pinctrl: amd: fix incorrect way to disable debounce filter

Coiby Xu <[email protected]>
pinctrl: amd: use higher precision for 512 RtcClk

Thomas Zimmermann <[email protected]>
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]

Al Viro <[email protected]>
don't dump the threads that had been already exiting when zapped.

Wengang Wang <[email protected]>
ocfs2: initialize ip_next_orphan

Alexander Usyskin <[email protected]>
mei: protect mei_cl_mtu from null dereference

Chris Brandt <[email protected]>
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode

Joseph Qi <[email protected]>
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()

Kaixu Xia <[email protected]>
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA

Peter Zijlstra <[email protected]>
perf: Fix get_recursion_context()

Wang Hai <[email protected]>
cosa: Add missing kfree in error path of cosa_write

Evan Nimmo <[email protected]>
of/address: Fix of_node memory leak in of_dma_is_coherent

Christoph Hellwig <[email protected]>
xfs: fix a missing unlock on error in xfs_fs_map_blocks

Suravee Suthikulpanit <[email protected]>
iommu/amd: Increase interrupt remapping table limit to 512 entries

Ye Bin <[email protected]>
cfg80211: regulatory: Fix inconsistent format argument

Johannes Berg <[email protected]>
mac80211: always wind down STA state

Johannes Berg <[email protected]>
mac80211: fix use of skb payload instead of header

Evan Quan <[email protected]>
drm/amdgpu: perform srbm soft reset always on SDMA resume

Bob Peterson <[email protected]>
gfs2: check for live vs. read-only file system in gfs2_fitrim

Bob Peterson <[email protected]>
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free

Evgeny Novikov <[email protected]>
usb: gadget: goku_udc: fix potential crashes in probe

Masashi Honma <[email protected]>
ath9k_htc: Use appropriate rs_datalen type

Mark Gray <[email protected]>
geneve: add transport ports in route lookup for geneve

Martyna Szapar <[email protected]>
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c

Grzegorz Siwik <[email protected]>
i40e: Wrong truncation from u16 to u8

Will Deacon <[email protected]>
pinctrl: devicetree: Avoid taking direct reference to device name string

Filipe Manana <[email protected]>
Btrfs: fix missing error return if writeback for extent buffer never started

Stephane Grosjean <[email protected]>
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping

Dan Carpenter <[email protected]>
can: peak_usb: add range checking in decode operations

Oleksij Rempel <[email protected]>
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()

Oliver Hartkopp <[email protected]>
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames

Vincent Mailhol <[email protected]>
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context

Dan Carpenter <[email protected]>
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()

Jiri Olsa <[email protected]>
perf tools: Add missing swap for ino_generation

zhuoliang zhang <[email protected]>
net: xfrm: fix a race condition during allocing spi

Johannes Thumshirn <[email protected]>
btrfs: reschedule when cloning lots of extents

Zeng Tao <[email protected]>
time: Prevent undefined behaviour in timespec64_to_ns()

Shijie Luo <[email protected]>
mm: mempolicy: fix potential pte_unmap_unlock pte error

Alexander Aring <[email protected]>
gfs2: Wake up when sd_glock_disposal becomes zero

Steven Rostedt (VMware) <[email protected]>
ring-buffer: Fix recursion protection transitions between interrupt context

-------------

Diffstat:

Documentation/kernel-parameters.txt | 8 +
Makefile | 4 +-
arch/x86/kernel/cpu/bugs.c | 52 ++-
drivers/block/xen-blkback/blkback.c | 22 +-
drivers/block/xen-blkback/xenbus.c | 5 +-
drivers/char/random.c | 2 -
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 +-
drivers/gpu/drm/gma500/psb_irq.c | 34 +-
drivers/iommu/amd_iommu_types.h | 6 +-
drivers/misc/mei/client.h | 4 +-
drivers/net/can/dev.c | 14 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 ++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 +-
drivers/net/geneve.c | 36 +-
drivers/net/wan/cosa.c | 1 +
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2 +-
drivers/net/xen-netback/common.h | 39 ++
drivers/net/xen-netback/interface.c | 59 ++-
drivers/net/xen-netback/netback.c | 17 +-
drivers/of/address.c | 4 +-
drivers/pinctrl/devicetree.c | 26 +-
drivers/pinctrl/pinctrl-amd.c | 6 +-
drivers/usb/class/cdc-acm.c | 9 +
drivers/usb/gadget/udc/goku_udc.c | 2 +-
drivers/xen/events/events_2l.c | 9 +-
drivers/xen/events/events_base.c | 444 ++++++++++++++++++--
drivers/xen/events/events_fifo.c | 102 ++---
drivers/xen/events/events_internal.h | 20 +-
drivers/xen/evtchn.c | 7 +-
drivers/xen/xen-pciback/pci_stub.c | 14 +-
drivers/xen/xen-pciback/pciback.h | 12 +-
drivers/xen/xen-pciback/pciback_ops.c | 48 ++-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 23 +-
fs/btrfs/extent_io.c | 4 +
fs/btrfs/ioctl.c | 2 +
fs/cifs/cifs_unicode.c | 8 +-
fs/ext4/inline.c | 1 +
fs/ext4/super.c | 5 +-
fs/gfs2/glock.c | 3 +-
fs/gfs2/rgrp.c | 5 +-
fs/ocfs2/super.c | 1 +
fs/xfs/xfs_pnfs.c | 2 +-
include/linux/can/skb.h | 20 +-
include/linux/prandom.h | 36 +-
include/linux/time64.h | 4 +
include/xen/events.h | 29 +-
kernel/events/core.c | 7 +-
kernel/events/internal.h | 2 +-
kernel/exit.c | 5 +-
kernel/reboot.c | 28 +-
kernel/time/timer.c | 7 -
kernel/trace/ring_buffer.c | 54 ++-
lib/random32.c | 463 +++++++++++++--------
lib/swiotlb.c | 6 +-
mm/mempolicy.c | 6 +-
net/ipv4/syncookies.c | 9 +-
net/ipv6/sit.c | 2 -
net/ipv6/syncookies.c | 10 +-
net/iucv/af_iucv.c | 3 +-
net/mac80211/sta_info.c | 18 +
net/mac80211/tx.c | 35 +-
net/wireless/reg.c | 2 +-
net/x25/af_x25.c | 2 +-
net/xfrm/xfrm_state.c | 8 +-
sound/hda/ext/hdac_ext_controller.c | 2 +
tools/perf/util/session.c | 1 +
68 files changed, 1431 insertions(+), 522 deletions(-)

2020-11-17 13:12:13

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 37/64] dont dump the threads that had been already exiting when zapped.

From: Al Viro <[email protected]>

commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream.

Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.

Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.

That covers all callers of do_coredump(). Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time. Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm()
when mm->core_state is already set, but the stack contents is not what
we would have in signal delivery.

At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers. In case of alpha we
leak 312 bytes of stack. Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.

Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their ->mm. So let's pretend that zapper always
loses the race. IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]

As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads. Which excludes voluntary
exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that,
so seccomp is fine.

[*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.

Cc: [email protected]
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/exit.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -408,7 +408,10 @@ static void exit_mm(struct task_struct *
up_read(&mm->mmap_sem);

self.task = tsk;
- self.next = xchg(&core_state->dumper.next, &self);
+ if (self.task->flags & PF_SIGNALED)
+ self.next = xchg(&core_state->dumper.next, &self);
+ else
+ self.task = NULL;
/*
* Implies mb(), the result of xchg() must be visible
* to core_state->dumper.

2020-11-17 13:12:14

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 39/64] pinctrl: amd: use higher precision for 512 RtcClk

From: Coiby Xu <[email protected]>

commit c64a6a0d4a928c63e5bc3b485552a8903a506c36 upstream.

RTC is 32.768kHz thus 512 RtcClk equals 15625 usec. The documentation
likely has dropped precision and that's why the driver mistakenly took
the slightly deviated value.

Cc: [email protected]
Reported-by: Andy Shevchenko <[email protected]>
Suggested-by: Andy Shevchenko <[email protected]>
Suggested-by: Hans de Goede <[email protected]>
Signed-off-by: Coiby Xu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/linux-gpio/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/pinctrl/pinctrl-amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -144,7 +144,7 @@ static int amd_gpio_set_debounce(struct
pin_reg |= BIT(DB_TMR_OUT_UNIT_OFF);
pin_reg &= ~BIT(DB_TMR_LARGE_OFF);
} else if (debounce < 250000) {
- time = debounce / 15600;
+ time = debounce / 15625;
pin_reg |= time & DB_TMR_OUT_MASK;
pin_reg &= ~BIT(DB_TMR_OUT_UNIT_OFF);
pin_reg |= BIT(DB_TMR_LARGE_OFF);

2020-11-17 13:12:16

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 40/64] pinctrl: amd: fix incorrect way to disable debounce filter

From: Coiby Xu <[email protected]>

commit 06abe8291bc31839950f7d0362d9979edc88a666 upstream.

The correct way to disable debounce filter is to clear bit 5 and 6
of the register.

Cc: [email protected]
Signed-off-by: Coiby Xu <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Cc: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/linux-gpio/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/pinctrl/pinctrl-amd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -154,14 +154,14 @@ static int amd_gpio_set_debounce(struct
pin_reg |= BIT(DB_TMR_OUT_UNIT_OFF);
pin_reg |= BIT(DB_TMR_LARGE_OFF);
} else {
- pin_reg &= ~DB_CNTRl_MASK;
+ pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF);
ret = -EINVAL;
}
} else {
pin_reg &= ~BIT(DB_TMR_OUT_UNIT_OFF);
pin_reg &= ~BIT(DB_TMR_LARGE_OFF);
pin_reg &= ~DB_TMR_OUT_MASK;
- pin_reg &= ~DB_CNTRl_MASK;
+ pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF);
}
writel(pin_reg, gpio_dev->base + offset * 4);
spin_unlock_irqrestore(&gpio_dev->lock, flags);

2020-11-17 13:12:26

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 43/64] net/af_iucv: fix null pointer dereference on shutdown

From: Ursula Braun <[email protected]>

[ Upstream commit 4031eeafa71eaf22ae40a15606a134ae86345daf ]

syzbot reported the following KASAN finding:

BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
Read of size 2 at addr 000000000000021e by task syz-executor907/519

CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135
[<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118
[<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383
[<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline]
[<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534
[<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
[<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457
[<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204
[<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline]
[<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210
[<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

There is nothing to shutdown if a connection has never been established.
Besides that iucv->hs_dev is not yet initialized if a socket is in
IUCV_OPEN state and iucv->path is not yet initialized if socket is in
IUCV_BOUND state.
So, just skip the shutdown calls for a socket in these states.

Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support")
Fixes: 82492a355fac ("af_iucv: add shutdown for HS transport")
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
[jwi: correct one Fixes tag]
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/iucv/af_iucv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1513,7 +1513,8 @@ static int iucv_sock_shutdown(struct soc
break;
}

- if (how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) {
+ if ((how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) &&
+ sk->sk_state == IUCV_CONNECTED) {
if (iucv->transport == AF_IUCV_TRANS_IUCV) {
txmsg.class = 0;
txmsg.tag = 0;

2020-11-17 13:12:30

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 54/64] xen/scsiback: use lateeoi irq binding

From: Juergen Gross <[email protected]>

commit 86991b6e7ea6c613b7692f65106076943449b6b7 upstream.

In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving scsifront use the lateeoi
irq binding for scsiback and unmask the event channel only just before
leaving the event handling function.

In case of a ring protocol error don't issue an EOI in order to avoid
the possibility to use that for producing an event storm. This at once
will result in no further call of scsiback_irq_fn(), so the ring_error
struct member can be dropped and scsiback_do_cmd_fn() can signal the
protocol error via a negative return value.

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/xen-scsiback.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -91,7 +91,6 @@ struct vscsibk_info {
unsigned int irq;

struct vscsiif_back_ring ring;
- int ring_error;

spinlock_t ring_lock;
atomic_t nr_unreplied_reqs;
@@ -698,7 +697,8 @@ static int prepare_pending_reqs(struct v
return 0;
}

-static int scsiback_do_cmd_fn(struct vscsibk_info *info)
+static int scsiback_do_cmd_fn(struct vscsibk_info *info,
+ unsigned int *eoi_flags)
{
struct vscsiif_back_ring *ring = &info->ring;
struct vscsiif_request ring_req;
@@ -715,11 +715,12 @@ static int scsiback_do_cmd_fn(struct vsc
rc = ring->rsp_prod_pvt;
pr_warn("Dom%d provided bogus ring requests (%#x - %#x = %u). Halting ring processing\n",
info->domid, rp, rc, rp - rc);
- info->ring_error = 1;
- return 0;
+ return -EINVAL;
}

while ((rc != rp)) {
+ *eoi_flags &= ~XEN_EOI_FLAG_SPURIOUS;
+
if (RING_REQUEST_CONS_OVERFLOW(ring, rc))
break;
pending_req = kmem_cache_alloc(scsiback_cachep, GFP_KERNEL);
@@ -782,13 +783,16 @@ static int scsiback_do_cmd_fn(struct vsc
static irqreturn_t scsiback_irq_fn(int irq, void *dev_id)
{
struct vscsibk_info *info = dev_id;
+ int rc;
+ unsigned int eoi_flags = XEN_EOI_FLAG_SPURIOUS;

- if (info->ring_error)
- return IRQ_HANDLED;
-
- while (scsiback_do_cmd_fn(info))
+ while ((rc = scsiback_do_cmd_fn(info, &eoi_flags)) > 0)
cond_resched();

+ /* In case of a ring error we keep the event channel masked. */
+ if (!rc)
+ xen_irq_lateeoi(irq, eoi_flags);
+
return IRQ_HANDLED;
}

@@ -809,7 +813,7 @@ static int scsiback_init_sring(struct vs
sring = (struct vscsiif_sring *)area;
BACK_RING_INIT(&info->ring, sring, PAGE_SIZE);

- err = bind_interdomain_evtchn_to_irq(info->domid, evtchn);
+ err = bind_interdomain_evtchn_to_irq_lateeoi(info->domid, evtchn);
if (err < 0)
goto unmap_page;

@@ -1210,7 +1214,6 @@ static int scsiback_probe(struct xenbus_

info->domid = dev->otherend_id;
spin_lock_init(&info->ring_lock);
- info->ring_error = 0;
atomic_set(&info->nr_unreplied_reqs, 0);
init_waitqueue_head(&info->waiting_to_free);
info->dev = dev;

2020-11-17 13:12:36

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 57/64] xen/events: use a common cpu hotplug hook for event channels

From: Juergen Gross <[email protected]>

commit 7beb290caa2adb0a399e735a1e175db9aae0523a upstream.

Today only fifo event channels have a cpu hotplug callback. In order
to prepare for more percpu (de)init work move that callback into
events_base.c and add percpu_init() and percpu_deinit() hooks to
struct evtchn_ops.

This is part of XSA-332.

Cc: [email protected]
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/events/events_base.c | 47 +++++++++++++++++++++++++++
drivers/xen/events/events_fifo.c | 59 ++++++++++++++---------------------
drivers/xen/events/events_internal.h | 3 +
3 files changed, 74 insertions(+), 35 deletions(-)

--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -33,6 +33,7 @@
#include <linux/irqnr.h>
#include <linux/pci.h>
#include <linux/spinlock.h>
+#include <linux/cpu.h>

#ifdef CONFIG_X86
#include <asm/desc.h>
@@ -1832,6 +1833,50 @@ void xen_callback_vector(void) {}
static bool fifo_events = true;
module_param(fifo_events, bool, 0);

+static int xen_evtchn_cpu_prepare(unsigned int cpu)
+{
+ int ret = 0;
+
+ if (evtchn_ops->percpu_init)
+ ret = evtchn_ops->percpu_init(cpu);
+
+ return ret;
+}
+
+static int xen_evtchn_cpu_dead(unsigned int cpu)
+{
+ int ret = 0;
+
+ if (evtchn_ops->percpu_deinit)
+ ret = evtchn_ops->percpu_deinit(cpu);
+
+ return ret;
+}
+
+static int evtchn_cpu_notification(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ int cpu = (long)hcpu;
+ int ret = 0;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ ret = xen_evtchn_cpu_prepare(cpu);
+ break;
+ case CPU_DEAD:
+ ret = xen_evtchn_cpu_dead(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
+}
+
+static struct notifier_block evtchn_cpu_notifier = {
+ .notifier_call = evtchn_cpu_notification,
+};
+
void __init xen_init_IRQ(void)
{
int ret = -EINVAL;
@@ -1841,6 +1886,8 @@ void __init xen_init_IRQ(void)
if (ret < 0)
xen_evtchn_2l_init();

+ register_cpu_notifier(&evtchn_cpu_notifier);
+
evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
sizeof(*evtchn_to_irq), GFP_KERNEL);
BUG_ON(!evtchn_to_irq);
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -386,21 +386,6 @@ static void evtchn_fifo_resume(void)
event_array_pages = 0;
}

-static const struct evtchn_ops evtchn_ops_fifo = {
- .max_channels = evtchn_fifo_max_channels,
- .nr_channels = evtchn_fifo_nr_channels,
- .setup = evtchn_fifo_setup,
- .bind_to_cpu = evtchn_fifo_bind_to_cpu,
- .clear_pending = evtchn_fifo_clear_pending,
- .set_pending = evtchn_fifo_set_pending,
- .is_pending = evtchn_fifo_is_pending,
- .test_and_set_mask = evtchn_fifo_test_and_set_mask,
- .mask = evtchn_fifo_mask,
- .unmask = evtchn_fifo_unmask,
- .handle_events = evtchn_fifo_handle_events,
- .resume = evtchn_fifo_resume,
-};
-
static int evtchn_fifo_alloc_control_block(unsigned cpu)
{
void *control_block = NULL;
@@ -423,29 +408,34 @@ static int evtchn_fifo_alloc_control_blo
return ret;
}

-static int evtchn_fifo_cpu_notification(struct notifier_block *self,
- unsigned long action,
- void *hcpu)
+static int evtchn_fifo_percpu_init(unsigned int cpu)
{
- int cpu = (long)hcpu;
- int ret = 0;
+ if (!per_cpu(cpu_control_block, cpu))
+ return evtchn_fifo_alloc_control_block(cpu);
+ return 0;
+}

- switch (action) {
- case CPU_UP_PREPARE:
- if (!per_cpu(cpu_control_block, cpu))
- ret = evtchn_fifo_alloc_control_block(cpu);
- break;
- case CPU_DEAD:
- __evtchn_fifo_handle_events(cpu, true);
- break;
- default:
- break;
- }
- return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
+static int evtchn_fifo_percpu_deinit(unsigned int cpu)
+{
+ __evtchn_fifo_handle_events(cpu, true);
+ return 0;
}

-static struct notifier_block evtchn_fifo_cpu_notifier = {
- .notifier_call = evtchn_fifo_cpu_notification,
+static const struct evtchn_ops evtchn_ops_fifo = {
+ .max_channels = evtchn_fifo_max_channels,
+ .nr_channels = evtchn_fifo_nr_channels,
+ .setup = evtchn_fifo_setup,
+ .bind_to_cpu = evtchn_fifo_bind_to_cpu,
+ .clear_pending = evtchn_fifo_clear_pending,
+ .set_pending = evtchn_fifo_set_pending,
+ .is_pending = evtchn_fifo_is_pending,
+ .test_and_set_mask = evtchn_fifo_test_and_set_mask,
+ .mask = evtchn_fifo_mask,
+ .unmask = evtchn_fifo_unmask,
+ .handle_events = evtchn_fifo_handle_events,
+ .resume = evtchn_fifo_resume,
+ .percpu_init = evtchn_fifo_percpu_init,
+ .percpu_deinit = evtchn_fifo_percpu_deinit,
};

int __init xen_evtchn_fifo_init(void)
@@ -461,7 +451,6 @@ int __init xen_evtchn_fifo_init(void)

evtchn_ops = &evtchn_ops_fifo;

- register_cpu_notifier(&evtchn_fifo_cpu_notifier);
out:
put_cpu();
return ret;
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -71,6 +71,9 @@ struct evtchn_ops {

void (*handle_events)(unsigned cpu);
void (*resume)(void);
+
+ int (*percpu_init)(unsigned int cpu);
+ int (*percpu_deinit)(unsigned int cpu);
};

extern const struct evtchn_ops *evtchn_ops;

2020-11-17 13:12:40

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 45/64] net: Update window_clamp if SOCK_RCVBUF is set

From: Mao Wenan <[email protected]>

[ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ]

When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.

Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/syncookies.c | 9 +++++++--
net/ipv6/syncookies.c | 10 ++++++++--
2 files changed, 15 insertions(+), 4 deletions(-)

--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -307,7 +307,7 @@ struct sock *cookie_v4_check(struct sock
__u32 cookie = ntohl(th->ack_seq) - 1;
struct sock *ret = sk;
struct request_sock *req;
- int mss;
+ int full_space, mss;
struct rtable *rt;
__u8 rcv_wscale;
struct flowi4 fl4;
@@ -391,8 +391,13 @@ struct sock *cookie_v4_check(struct sock

/* Try to redo what tcp_v4_send_synack did. */
req->rsk_window_clamp = tp->window_clamp ? :dst_metric(&rt->dst, RTAX_WINDOW);
+ /* limit the window selection if the user enforce a smaller rx buffer */
+ full_space = tcp_full_space(sk);
+ if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
+ (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
+ req->rsk_window_clamp = full_space;

- tcp_select_initial_window(tcp_full_space(sk), req->mss,
+ tcp_select_initial_window(full_space, req->mss,
&req->rsk_rcv_wnd, &req->rsk_window_clamp,
ireq->wscale_ok, &rcv_wscale,
dst_metric(&rt->dst, RTAX_INITRWND));
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -144,7 +144,7 @@ struct sock *cookie_v6_check(struct sock
__u32 cookie = ntohl(th->ack_seq) - 1;
struct sock *ret = sk;
struct request_sock *req;
- int mss;
+ int full_space, mss;
struct dst_entry *dst;
__u8 rcv_wscale;

@@ -237,7 +237,13 @@ struct sock *cookie_v6_check(struct sock
}

req->rsk_window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW);
- tcp_select_initial_window(tcp_full_space(sk), req->mss,
+ /* limit the window selection if the user enforce a smaller rx buffer */
+ full_space = tcp_full_space(sk);
+ if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
+ (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
+ req->rsk_window_clamp = full_space;
+
+ tcp_select_initial_window(full_space, req->mss,
&req->rsk_rcv_wnd, &req->rsk_window_clamp,
ireq->wscale_ok, &rcv_wscale,
dst_metric(dst, RTAX_INITRWND));

2020-11-17 13:12:55

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 31/64] perf: Fix get_recursion_context()

From: Peter Zijlstra <[email protected]>

[ Upstream commit ce0f17fc93f63ee91428af10b7b2ddef38cd19e5 ]

One should use in_serving_softirq() to detect SoftIRQ context.

Fixes: 96f6d4444302 ("perf_counter: avoid recursion")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/events/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c1274c3..8baa3121e7a6b 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -193,7 +193,7 @@ static inline int get_recursion_context(int *recursion)
rctx = 3;
else if (in_irq())
rctx = 2;
- else if (in_softirq())
+ else if (in_serving_softirq())
rctx = 1;
else
rctx = 0;
--
2.27.0

2020-11-17 13:13:12

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 51/64] xen/events: add a new "late EOI" evtchn framework

From: Juergen Gross <[email protected]>

commit 54c9de89895e0a36047fcc4ae754ea5b8655fb9d upstream.

In order to avoid tight event channel related IRQ loops add a new
framework of "late EOI" handling: the IRQ the event channel is bound
to will be masked until the event has been handled and the related
driver is capable to handle another event. The driver is responsible
for unmasking the event channel via the new function xen_irq_lateeoi().

This is similar to binding an event channel to a threaded IRQ, but
without having to structure the driver accordingly.

In order to support a future special handling in case a rogue guest
is sending lots of unsolicited events, add a flag to xen_irq_lateeoi()
which can be set by the caller to indicate the event was a spurious
one.

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/events/events_base.c | 151 ++++++++++++++++++++++++++++++++++-----
include/xen/events.h | 29 ++++++-
2 files changed, 159 insertions(+), 21 deletions(-)

--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -112,6 +112,7 @@ static bool (*pirq_needs_eoi)(unsigned i
static struct irq_info *legacy_info_ptrs[NR_IRQS_LEGACY];

static struct irq_chip xen_dynamic_chip;
+static struct irq_chip xen_lateeoi_chip;
static struct irq_chip xen_percpu_chip;
static struct irq_chip xen_pirq_chip;
static void enable_dynirq(struct irq_data *data);
@@ -404,6 +405,33 @@ void notify_remote_via_irq(int irq)
}
EXPORT_SYMBOL_GPL(notify_remote_via_irq);

+static void xen_irq_lateeoi_locked(struct irq_info *info)
+{
+ evtchn_port_t evtchn;
+
+ evtchn = info->evtchn;
+ if (!VALID_EVTCHN(evtchn))
+ return;
+
+ unmask_evtchn(evtchn);
+}
+
+void xen_irq_lateeoi(unsigned int irq, unsigned int eoi_flags)
+{
+ struct irq_info *info;
+ unsigned long flags;
+
+ read_lock_irqsave(&evtchn_rwlock, flags);
+
+ info = info_for_irq(irq);
+
+ if (info)
+ xen_irq_lateeoi_locked(info);
+
+ read_unlock_irqrestore(&evtchn_rwlock, flags);
+}
+EXPORT_SYMBOL_GPL(xen_irq_lateeoi);
+
static void xen_irq_init(unsigned irq)
{
struct irq_info *info;
@@ -875,7 +903,7 @@ int xen_pirq_from_irq(unsigned irq)
}
EXPORT_SYMBOL_GPL(xen_pirq_from_irq);

-int bind_evtchn_to_irq(unsigned int evtchn)
+static int bind_evtchn_to_irq_chip(evtchn_port_t evtchn, struct irq_chip *chip)
{
int irq;
int ret;
@@ -892,7 +920,7 @@ int bind_evtchn_to_irq(unsigned int evtc
if (irq < 0)
goto out;

- irq_set_chip_and_handler_name(irq, &xen_dynamic_chip,
+ irq_set_chip_and_handler_name(irq, chip,
handle_edge_irq, "event");

ret = xen_irq_info_evtchn_setup(irq, evtchn);
@@ -913,8 +941,19 @@ out:

return irq;
}
+
+int bind_evtchn_to_irq(evtchn_port_t evtchn)
+{
+ return bind_evtchn_to_irq_chip(evtchn, &xen_dynamic_chip);
+}
EXPORT_SYMBOL_GPL(bind_evtchn_to_irq);

+int bind_evtchn_to_irq_lateeoi(evtchn_port_t evtchn)
+{
+ return bind_evtchn_to_irq_chip(evtchn, &xen_lateeoi_chip);
+}
+EXPORT_SYMBOL_GPL(bind_evtchn_to_irq_lateeoi);
+
static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
{
struct evtchn_bind_ipi bind_ipi;
@@ -956,8 +995,9 @@ static int bind_ipi_to_irq(unsigned int
return irq;
}

-int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
- unsigned int remote_port)
+static int bind_interdomain_evtchn_to_irq_chip(unsigned int remote_domain,
+ evtchn_port_t remote_port,
+ struct irq_chip *chip)
{
struct evtchn_bind_interdomain bind_interdomain;
int err;
@@ -968,10 +1008,26 @@ int bind_interdomain_evtchn_to_irq(unsig
err = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
&bind_interdomain);

- return err ? : bind_evtchn_to_irq(bind_interdomain.local_port);
+ return err ? : bind_evtchn_to_irq_chip(bind_interdomain.local_port,
+ chip);
+}
+
+int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
+ evtchn_port_t remote_port)
+{
+ return bind_interdomain_evtchn_to_irq_chip(remote_domain, remote_port,
+ &xen_dynamic_chip);
}
EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq);

+int bind_interdomain_evtchn_to_irq_lateeoi(unsigned int remote_domain,
+ evtchn_port_t remote_port)
+{
+ return bind_interdomain_evtchn_to_irq_chip(remote_domain, remote_port,
+ &xen_lateeoi_chip);
+}
+EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq_lateeoi);
+
static int find_virq(unsigned int virq, unsigned int cpu)
{
struct evtchn_status status;
@@ -1067,14 +1123,15 @@ static void unbind_from_irq(unsigned int
mutex_unlock(&irq_mapping_update_lock);
}

-int bind_evtchn_to_irqhandler(unsigned int evtchn,
- irq_handler_t handler,
- unsigned long irqflags,
- const char *devname, void *dev_id)
+static int bind_evtchn_to_irqhandler_chip(evtchn_port_t evtchn,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname, void *dev_id,
+ struct irq_chip *chip)
{
int irq, retval;

- irq = bind_evtchn_to_irq(evtchn);
+ irq = bind_evtchn_to_irq_chip(evtchn, chip);
if (irq < 0)
return irq;
retval = request_irq(irq, handler, irqflags, devname, dev_id);
@@ -1085,18 +1142,38 @@ int bind_evtchn_to_irqhandler(unsigned i

return irq;
}
+
+int bind_evtchn_to_irqhandler(evtchn_port_t evtchn,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname, void *dev_id)
+{
+ return bind_evtchn_to_irqhandler_chip(evtchn, handler, irqflags,
+ devname, dev_id,
+ &xen_dynamic_chip);
+}
EXPORT_SYMBOL_GPL(bind_evtchn_to_irqhandler);

-int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
- unsigned int remote_port,
- irq_handler_t handler,
- unsigned long irqflags,
- const char *devname,
- void *dev_id)
+int bind_evtchn_to_irqhandler_lateeoi(evtchn_port_t evtchn,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname, void *dev_id)
+{
+ return bind_evtchn_to_irqhandler_chip(evtchn, handler, irqflags,
+ devname, dev_id,
+ &xen_lateeoi_chip);
+}
+EXPORT_SYMBOL_GPL(bind_evtchn_to_irqhandler_lateeoi);
+
+static int bind_interdomain_evtchn_to_irqhandler_chip(
+ unsigned int remote_domain, evtchn_port_t remote_port,
+ irq_handler_t handler, unsigned long irqflags,
+ const char *devname, void *dev_id, struct irq_chip *chip)
{
int irq, retval;

- irq = bind_interdomain_evtchn_to_irq(remote_domain, remote_port);
+ irq = bind_interdomain_evtchn_to_irq_chip(remote_domain, remote_port,
+ chip);
if (irq < 0)
return irq;

@@ -1108,8 +1185,33 @@ int bind_interdomain_evtchn_to_irqhandle

return irq;
}
+
+int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
+ evtchn_port_t remote_port,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname,
+ void *dev_id)
+{
+ return bind_interdomain_evtchn_to_irqhandler_chip(remote_domain,
+ remote_port, handler, irqflags, devname,
+ dev_id, &xen_dynamic_chip);
+}
EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irqhandler);

+int bind_interdomain_evtchn_to_irqhandler_lateeoi(unsigned int remote_domain,
+ evtchn_port_t remote_port,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname,
+ void *dev_id)
+{
+ return bind_interdomain_evtchn_to_irqhandler_chip(remote_domain,
+ remote_port, handler, irqflags, devname,
+ dev_id, &xen_lateeoi_chip);
+}
+EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irqhandler_lateeoi);
+
int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu,
irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
@@ -1639,6 +1741,21 @@ static struct irq_chip xen_dynamic_chip
.irq_mask_ack = mask_ack_dynirq,

.irq_set_affinity = set_affinity_irq,
+ .irq_retrigger = retrigger_dynirq,
+};
+
+static struct irq_chip xen_lateeoi_chip __read_mostly = {
+ /* The chip name needs to contain "xen-dyn" for irqbalance to work. */
+ .name = "xen-dyn-lateeoi",
+
+ .irq_disable = disable_dynirq,
+ .irq_mask = disable_dynirq,
+ .irq_unmask = enable_dynirq,
+
+ .irq_ack = mask_ack_dynirq,
+ .irq_mask_ack = mask_ack_dynirq,
+
+ .irq_set_affinity = set_affinity_irq,
.irq_retrigger = retrigger_dynirq,
};

--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -12,11 +12,16 @@

unsigned xen_evtchn_nr_channels(void);

-int bind_evtchn_to_irq(unsigned int evtchn);
-int bind_evtchn_to_irqhandler(unsigned int evtchn,
+int bind_evtchn_to_irq(evtchn_port_t evtchn);
+int bind_evtchn_to_irq_lateeoi(evtchn_port_t evtchn);
+int bind_evtchn_to_irqhandler(evtchn_port_t evtchn,
irq_handler_t handler,
unsigned long irqflags, const char *devname,
void *dev_id);
+int bind_evtchn_to_irqhandler_lateeoi(evtchn_port_t evtchn,
+ irq_handler_t handler,
+ unsigned long irqflags, const char *devname,
+ void *dev_id);
int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu);
int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu,
irq_handler_t handler,
@@ -29,13 +34,21 @@ int bind_ipi_to_irqhandler(enum ipi_vect
const char *devname,
void *dev_id);
int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
- unsigned int remote_port);
+ evtchn_port_t remote_port);
+int bind_interdomain_evtchn_to_irq_lateeoi(unsigned int remote_domain,
+ evtchn_port_t remote_port);
int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
- unsigned int remote_port,
+ evtchn_port_t remote_port,
irq_handler_t handler,
unsigned long irqflags,
const char *devname,
void *dev_id);
+int bind_interdomain_evtchn_to_irqhandler_lateeoi(unsigned int remote_domain,
+ evtchn_port_t remote_port,
+ irq_handler_t handler,
+ unsigned long irqflags,
+ const char *devname,
+ void *dev_id);

/*
* Common unbind function for all event sources. Takes IRQ to unbind from.
@@ -44,6 +57,14 @@ int bind_interdomain_evtchn_to_irqhandle
*/
void unbind_from_irqhandler(unsigned int irq, void *dev_id);

+/*
+ * Send late EOI for an IRQ bound to an event channel via one of the *_lateeoi
+ * functions above.
+ */
+void xen_irq_lateeoi(unsigned int irq, unsigned int eoi_flags);
+/* Signal an event was spurious, i.e. there was no action resulting from it. */
+#define XEN_EOI_FLAG_SPURIOUS 0x00000001
+
#define XEN_IRQ_PRIORITY_MAX EVTCHN_FIFO_PRIORITY_MAX
#define XEN_IRQ_PRIORITY_DEFAULT EVTCHN_FIFO_PRIORITY_DEFAULT
#define XEN_IRQ_PRIORITY_MIN EVTCHN_FIFO_PRIORITY_MIN

2020-11-17 13:13:18

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 35/64] mei: protect mei_cl_mtu from null dereference

From: Alexander Usyskin <[email protected]>

commit bcbc0b2e275f0a797de11a10eff495b4571863fc upstream.

A receive callback is queued while the client is still connected
but can still be called after the client was disconnected. Upon
disconnect cl->me_cl is set to NULL, hence we need to check
that ME client is not-NULL in mei_cl_mtu to avoid
null dereference.

Cc: <[email protected]>
Signed-off-by: Alexander Usyskin <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/misc/mei/client.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/misc/mei/client.h
+++ b/drivers/misc/mei/client.h
@@ -156,11 +156,11 @@ static inline u8 mei_cl_me_id(const stru
*
* @cl: host client
*
- * Return: mtu
+ * Return: mtu or 0 if client is not connected
*/
static inline size_t mei_cl_mtu(const struct mei_cl *cl)
{
- return cl->me_cl->props.max_msg_length;
+ return cl->me_cl ? cl->me_cl->props.max_msg_length : 0;
}

/**

2020-11-17 13:13:18

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 53/64] xen/netback: use lateeoi irq binding

From: Juergen Gross <[email protected]>

commit 23025393dbeb3b8b3b60ebfa724cdae384992e27 upstream.

In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving netfront use the lateeoi
irq binding for netback and unmask the event channel only just before
going to sleep waiting for new events.

Make sure not to issue an EOI when none is pending by introducing an
eoi_pending element to struct xenvif_queue.

When no request has been consumed set the spurious flag when sending
the EOI for an interrupt.

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netback/common.h | 39 +++++++++++++++++++++++
drivers/net/xen-netback/interface.c | 59 +++++++++++++++++++++++++++++++-----
drivers/net/xen-netback/netback.c | 17 +++++++---
3 files changed, 103 insertions(+), 12 deletions(-)

--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -137,6 +137,20 @@ struct xenvif_queue { /* Per-queue data
char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
struct xenvif *vif; /* Parent VIF */

+ /*
+ * TX/RX common EOI handling.
+ * When feature-split-event-channels = 0, interrupt handler sets
+ * NETBK_COMMON_EOI, otherwise NETBK_RX_EOI and NETBK_TX_EOI are set
+ * by the RX and TX interrupt handlers.
+ * RX and TX handler threads will issue an EOI when either
+ * NETBK_COMMON_EOI or their specific bits (NETBK_RX_EOI or
+ * NETBK_TX_EOI) are set and they will reset those bits.
+ */
+ atomic_t eoi_pending;
+#define NETBK_RX_EOI 0x01
+#define NETBK_TX_EOI 0x02
+#define NETBK_COMMON_EOI 0x04
+
/* Use NAPI for guest TX */
struct napi_struct napi;
/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
@@ -317,6 +331,7 @@ void xenvif_kick_thread(struct xenvif_qu

int xenvif_dealloc_kthread(void *data);

+bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread);
void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);

void xenvif_carrier_on(struct xenvif *vif);
@@ -353,4 +368,28 @@ void xenvif_skb_zerocopy_complete(struct
bool xenvif_mcast_match(struct xenvif *vif, const u8 *addr);
void xenvif_mcast_addr_list_free(struct xenvif *vif);

+#include <linux/atomic.h>
+
+static inline int xenvif_atomic_fetch_or(int i, atomic_t *v)
+{
+ int c, old;
+
+ c = v->counter;
+ while ((old = cmpxchg(&v->counter, c, c | i)) != c)
+ c = old;
+
+ return c;
+}
+
+static inline int xenvif_atomic_fetch_andnot(int i, atomic_t *v)
+{
+ int c, old;
+
+ c = v->counter;
+ while ((old = cmpxchg(&v->counter, c, c & ~i)) != c)
+ c = old;
+
+ return c;
+}
+
#endif /* __XEN_NETBACK__COMMON_H__ */
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -76,12 +76,28 @@ int xenvif_schedulable(struct xenvif *vi
!vif->disabled;
}

+static bool xenvif_handle_tx_interrupt(struct xenvif_queue *queue)
+{
+ bool rc;
+
+ rc = RING_HAS_UNCONSUMED_REQUESTS(&queue->tx);
+ if (rc)
+ napi_schedule(&queue->napi);
+ return rc;
+}
+
static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
{
struct xenvif_queue *queue = dev_id;
+ int old;

- if (RING_HAS_UNCONSUMED_REQUESTS(&queue->tx))
- napi_schedule(&queue->napi);
+ old = xenvif_atomic_fetch_or(NETBK_TX_EOI, &queue->eoi_pending);
+ WARN(old & NETBK_TX_EOI, "Interrupt while EOI pending\n");
+
+ if (!xenvif_handle_tx_interrupt(queue)) {
+ atomic_andnot(NETBK_TX_EOI, &queue->eoi_pending);
+ xen_irq_lateeoi(irq, XEN_EOI_FLAG_SPURIOUS);
+ }

return IRQ_HANDLED;
}
@@ -115,19 +131,46 @@ static int xenvif_poll(struct napi_struc
return work_done;
}

+static bool xenvif_handle_rx_interrupt(struct xenvif_queue *queue)
+{
+ bool rc;
+
+ rc = xenvif_have_rx_work(queue, false);
+ if (rc)
+ xenvif_kick_thread(queue);
+ return rc;
+}
+
static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
{
struct xenvif_queue *queue = dev_id;
+ int old;

- xenvif_kick_thread(queue);
+ old = xenvif_atomic_fetch_or(NETBK_RX_EOI, &queue->eoi_pending);
+ WARN(old & NETBK_RX_EOI, "Interrupt while EOI pending\n");
+
+ if (!xenvif_handle_rx_interrupt(queue)) {
+ atomic_andnot(NETBK_RX_EOI, &queue->eoi_pending);
+ xen_irq_lateeoi(irq, XEN_EOI_FLAG_SPURIOUS);
+ }

return IRQ_HANDLED;
}

irqreturn_t xenvif_interrupt(int irq, void *dev_id)
{
- xenvif_tx_interrupt(irq, dev_id);
- xenvif_rx_interrupt(irq, dev_id);
+ struct xenvif_queue *queue = dev_id;
+ int old;
+
+ old = xenvif_atomic_fetch_or(NETBK_COMMON_EOI, &queue->eoi_pending);
+ WARN(old, "Interrupt while EOI pending\n");
+
+ /* Use bitwise or as we need to call both functions. */
+ if ((!xenvif_handle_tx_interrupt(queue) |
+ !xenvif_handle_rx_interrupt(queue))) {
+ atomic_andnot(NETBK_COMMON_EOI, &queue->eoi_pending);
+ xen_irq_lateeoi(irq, XEN_EOI_FLAG_SPURIOUS);
+ }

return IRQ_HANDLED;
}
@@ -555,7 +598,7 @@ int xenvif_connect(struct xenvif_queue *

if (tx_evtchn == rx_evtchn) {
/* feature-split-event-channels == 0 */
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
queue->name, queue);
if (err < 0)
@@ -566,7 +609,7 @@ int xenvif_connect(struct xenvif_queue *
/* feature-split-event-channels == 1 */
snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
"%s-tx", queue->name);
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
queue->tx_irq_name, queue);
if (err < 0)
@@ -576,7 +619,7 @@ int xenvif_connect(struct xenvif_queue *

snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
"%s-rx", queue->name);
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
queue->rx_irq_name, queue);
if (err < 0)
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -670,6 +670,10 @@ void xenvif_napi_schedule_or_enable_even

if (more_to_do)
napi_schedule(&queue->napi);
+ else if (xenvif_atomic_fetch_andnot(NETBK_TX_EOI | NETBK_COMMON_EOI,
+ &queue->eoi_pending) &
+ (NETBK_TX_EOI | NETBK_COMMON_EOI))
+ xen_irq_lateeoi(queue->tx_irq, 0);
}

static void tx_add_credit(struct xenvif_queue *queue)
@@ -2010,14 +2014,14 @@ static bool xenvif_rx_queue_ready(struct
return queue->stalled && prod - cons >= 1;
}

-static bool xenvif_have_rx_work(struct xenvif_queue *queue)
+bool xenvif_have_rx_work(struct xenvif_queue *queue, bool test_kthread)
{
return (!skb_queue_empty(&queue->rx_queue)
&& xenvif_rx_ring_slots_available(queue))
|| (queue->vif->stall_timeout &&
(xenvif_rx_queue_stalled(queue)
|| xenvif_rx_queue_ready(queue)))
- || kthread_should_stop()
+ || (test_kthread && kthread_should_stop())
|| queue->vif->disabled;
}

@@ -2048,15 +2052,20 @@ static void xenvif_wait_for_rx_work(stru
{
DEFINE_WAIT(wait);

- if (xenvif_have_rx_work(queue))
+ if (xenvif_have_rx_work(queue, true))
return;

for (;;) {
long ret;

prepare_to_wait(&queue->wq, &wait, TASK_INTERRUPTIBLE);
- if (xenvif_have_rx_work(queue))
+ if (xenvif_have_rx_work(queue, true))
break;
+ if (xenvif_atomic_fetch_andnot(NETBK_RX_EOI | NETBK_COMMON_EOI,
+ &queue->eoi_pending) &
+ (NETBK_RX_EOI | NETBK_COMMON_EOI))
+ xen_irq_lateeoi(queue->rx_irq, 0);
+
ret = schedule_timeout(xenvif_rx_queue_timeout(queue));
if (!ret)
break;

2020-11-17 13:13:39

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 58/64] xen/events: defer eoi in case of excessive number of events

From: Juergen Gross <[email protected]>

commit e99502f76271d6bc4e374fe368c50c67a1fd3070 upstream.

In case rogue guests are sending events at high frequency it might
happen that xen_evtchn_do_upcall() won't stop processing events in
dom0. As this is done in irq handling a crash might be the result.

In order to avoid that, delay further inter-domain events after some
time in xen_evtchn_do_upcall() by forcing eoi processing into a
worker on the same cpu, thus inhibiting new events coming in.

The time after which eoi processing is to be delayed is configurable
via a new module parameter "event_loop_timeout" which specifies the
maximum event loop time in jiffies (default: 2, the value was chosen
after some tests showing that a value of 2 was the lowest with an
only slight drop of dom0 network throughput while multiple guests
performed an event storm).

How long eoi processing will be delayed can be specified via another
parameter "event_eoi_delay" (again in jiffies, default 10, again the
value was chosen after testing with different delay values).

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/kernel-parameters.txt | 8 +
drivers/xen/events/events_2l.c | 7 -
drivers/xen/events/events_base.c | 189 ++++++++++++++++++++++++++++++++++-
drivers/xen/events/events_fifo.c | 30 ++---
drivers/xen/events/events_internal.h | 14 ++
5 files changed, 216 insertions(+), 32 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -4488,6 +4488,14 @@ bytes respectively. Such letter suffixes
Disables the PV optimizations forcing the HVM guest to
run as generic HVM guest with no PV drivers.

+ xen.event_eoi_delay= [XEN]
+ How long to delay EOI handling in case of event
+ storms (jiffies). Default is 10.
+
+ xen.event_loop_timeout= [XEN]
+ After which time (jiffies) the event handling loop
+ should start to delay EOI handling. Default is 2.
+
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -160,7 +160,7 @@ static inline xen_ulong_t active_evtchns
* a bitset of words which contain pending event bits. The second
* level is a bitset of pending events themselves.
*/
-static void evtchn_2l_handle_events(unsigned cpu)
+static void evtchn_2l_handle_events(unsigned cpu, struct evtchn_loop_ctrl *ctrl)
{
int irq;
xen_ulong_t pending_words;
@@ -241,10 +241,7 @@ static void evtchn_2l_handle_events(unsi

/* Process port. */
port = (word_idx * BITS_PER_EVTCHN_WORD) + bit_idx;
- irq = get_evtchn_to_irq(port);
-
- if (irq != -1)
- generic_handle_irq(irq);
+ handle_irq_for_port(port, ctrl);

bit_idx = (bit_idx + 1) % BITS_PER_EVTCHN_WORD;

--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -34,6 +34,8 @@
#include <linux/pci.h>
#include <linux/spinlock.h>
#include <linux/cpu.h>
+#include <linux/atomic.h>
+#include <linux/ktime.h>

#ifdef CONFIG_X86
#include <asm/desc.h>
@@ -64,6 +66,15 @@

#include "events_internal.h"

+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "xen."
+
+static uint __read_mostly event_loop_timeout = 2;
+module_param(event_loop_timeout, uint, 0644);
+
+static uint __read_mostly event_eoi_delay = 10;
+module_param(event_eoi_delay, uint, 0644);
+
const struct evtchn_ops *evtchn_ops;

/*
@@ -87,6 +98,7 @@ static DEFINE_RWLOCK(evtchn_rwlock);
* irq_mapping_update_lock
* evtchn_rwlock
* IRQ-desc lock
+ * percpu eoi_list_lock
*/

static LIST_HEAD(xen_irq_list_head);
@@ -119,6 +131,8 @@ static struct irq_chip xen_pirq_chip;
static void enable_dynirq(struct irq_data *data);
static void disable_dynirq(struct irq_data *data);

+static DEFINE_PER_CPU(unsigned int, irq_epoch);
+
static void clear_evtchn_to_irq_row(unsigned row)
{
unsigned col;
@@ -406,17 +420,120 @@ void notify_remote_via_irq(int irq)
}
EXPORT_SYMBOL_GPL(notify_remote_via_irq);

+struct lateeoi_work {
+ struct delayed_work delayed;
+ spinlock_t eoi_list_lock;
+ struct list_head eoi_list;
+};
+
+static DEFINE_PER_CPU(struct lateeoi_work, lateeoi);
+
+static void lateeoi_list_del(struct irq_info *info)
+{
+ struct lateeoi_work *eoi = &per_cpu(lateeoi, info->eoi_cpu);
+ unsigned long flags;
+
+ spin_lock_irqsave(&eoi->eoi_list_lock, flags);
+ list_del_init(&info->eoi_list);
+ spin_unlock_irqrestore(&eoi->eoi_list_lock, flags);
+}
+
+static void lateeoi_list_add(struct irq_info *info)
+{
+ struct lateeoi_work *eoi = &per_cpu(lateeoi, info->eoi_cpu);
+ struct irq_info *elem;
+ u64 now = get_jiffies_64();
+ unsigned long delay;
+ unsigned long flags;
+
+ if (now < info->eoi_time)
+ delay = info->eoi_time - now;
+ else
+ delay = 1;
+
+ spin_lock_irqsave(&eoi->eoi_list_lock, flags);
+
+ if (list_empty(&eoi->eoi_list)) {
+ list_add(&info->eoi_list, &eoi->eoi_list);
+ mod_delayed_work_on(info->eoi_cpu, system_wq,
+ &eoi->delayed, delay);
+ } else {
+ list_for_each_entry_reverse(elem, &eoi->eoi_list, eoi_list) {
+ if (elem->eoi_time <= info->eoi_time)
+ break;
+ }
+ list_add(&info->eoi_list, &elem->eoi_list);
+ }
+
+ spin_unlock_irqrestore(&eoi->eoi_list_lock, flags);
+}
+
static void xen_irq_lateeoi_locked(struct irq_info *info)
{
evtchn_port_t evtchn;
+ unsigned int cpu;

evtchn = info->evtchn;
- if (!VALID_EVTCHN(evtchn))
+ if (!VALID_EVTCHN(evtchn) || !list_empty(&info->eoi_list))
+ return;
+
+ cpu = info->eoi_cpu;
+ if (info->eoi_time && info->irq_epoch == per_cpu(irq_epoch, cpu)) {
+ lateeoi_list_add(info);
return;
+ }

+ info->eoi_time = 0;
unmask_evtchn(evtchn);
}

+static void xen_irq_lateeoi_worker(struct work_struct *work)
+{
+ struct lateeoi_work *eoi;
+ struct irq_info *info;
+ u64 now = get_jiffies_64();
+ unsigned long flags;
+
+ eoi = container_of(to_delayed_work(work), struct lateeoi_work, delayed);
+
+ read_lock_irqsave(&evtchn_rwlock, flags);
+
+ while (true) {
+ spin_lock(&eoi->eoi_list_lock);
+
+ info = list_first_entry_or_null(&eoi->eoi_list, struct irq_info,
+ eoi_list);
+
+ if (info == NULL || now < info->eoi_time) {
+ spin_unlock(&eoi->eoi_list_lock);
+ break;
+ }
+
+ list_del_init(&info->eoi_list);
+
+ spin_unlock(&eoi->eoi_list_lock);
+
+ info->eoi_time = 0;
+
+ xen_irq_lateeoi_locked(info);
+ }
+
+ if (info)
+ mod_delayed_work_on(info->eoi_cpu, system_wq,
+ &eoi->delayed, info->eoi_time - now);
+
+ read_unlock_irqrestore(&evtchn_rwlock, flags);
+}
+
+static void xen_cpu_init_eoi(unsigned int cpu)
+{
+ struct lateeoi_work *eoi = &per_cpu(lateeoi, cpu);
+
+ INIT_DELAYED_WORK(&eoi->delayed, xen_irq_lateeoi_worker);
+ spin_lock_init(&eoi->eoi_list_lock);
+ INIT_LIST_HEAD(&eoi->eoi_list);
+}
+
void xen_irq_lateeoi(unsigned int irq, unsigned int eoi_flags)
{
struct irq_info *info;
@@ -436,6 +553,7 @@ EXPORT_SYMBOL_GPL(xen_irq_lateeoi);
static void xen_irq_init(unsigned irq)
{
struct irq_info *info;
+
#ifdef CONFIG_SMP
/* By default all event channels notify CPU#0. */
cpumask_copy(irq_get_affinity_mask(irq), cpumask_of(0));
@@ -450,6 +568,7 @@ static void xen_irq_init(unsigned irq)

set_info_for_irq(irq, info);

+ INIT_LIST_HEAD(&info->eoi_list);
list_add_tail(&info->list, &xen_irq_list_head);
}

@@ -505,6 +624,9 @@ static void xen_free_irq(unsigned irq)

write_lock_irqsave(&evtchn_rwlock, flags);

+ if (!list_empty(&info->eoi_list))
+ lateeoi_list_del(info);
+
list_del(&info->list);

set_info_for_irq(irq, NULL);
@@ -1363,6 +1485,54 @@ void xen_send_IPI_one(unsigned int cpu,
notify_remote_via_irq(irq);
}

+struct evtchn_loop_ctrl {
+ ktime_t timeout;
+ unsigned count;
+ bool defer_eoi;
+};
+
+void handle_irq_for_port(evtchn_port_t port, struct evtchn_loop_ctrl *ctrl)
+{
+ int irq;
+ struct irq_info *info;
+
+ irq = get_evtchn_to_irq(port);
+ if (irq == -1)
+ return;
+
+ /*
+ * Check for timeout every 256 events.
+ * We are setting the timeout value only after the first 256
+ * events in order to not hurt the common case of few loop
+ * iterations. The 256 is basically an arbitrary value.
+ *
+ * In case we are hitting the timeout we need to defer all further
+ * EOIs in order to ensure to leave the event handling loop rather
+ * sooner than later.
+ */
+ if (!ctrl->defer_eoi && !(++ctrl->count & 0xff)) {
+ ktime_t kt = ktime_get();
+
+ if (!ctrl->timeout.tv64) {
+ kt = ktime_add_ms(kt,
+ jiffies_to_msecs(event_loop_timeout));
+ ctrl->timeout = kt;
+ } else if (kt.tv64 > ctrl->timeout.tv64) {
+ ctrl->defer_eoi = true;
+ }
+ }
+
+ info = info_for_irq(irq);
+
+ if (ctrl->defer_eoi) {
+ info->eoi_cpu = smp_processor_id();
+ info->irq_epoch = __this_cpu_read(irq_epoch);
+ info->eoi_time = get_jiffies_64() + event_eoi_delay;
+ }
+
+ generic_handle_irq(irq);
+}
+
static DEFINE_PER_CPU(unsigned, xed_nesting_count);

static void __xen_evtchn_do_upcall(void)
@@ -1370,6 +1540,7 @@ static void __xen_evtchn_do_upcall(void)
struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
int cpu = get_cpu();
unsigned count;
+ struct evtchn_loop_ctrl ctrl = { 0 };

read_lock(&evtchn_rwlock);

@@ -1379,7 +1550,7 @@ static void __xen_evtchn_do_upcall(void)
if (__this_cpu_inc_return(xed_nesting_count) - 1)
goto out;

- xen_evtchn_handle_events(cpu);
+ xen_evtchn_handle_events(cpu, &ctrl);

BUG_ON(!irqs_disabled());

@@ -1390,6 +1561,13 @@ static void __xen_evtchn_do_upcall(void)
out:
read_unlock(&evtchn_rwlock);

+ /*
+ * Increment irq_epoch only now to defer EOIs only for
+ * xen_irq_lateeoi() invocations occurring from inside the loop
+ * above.
+ */
+ __this_cpu_inc(irq_epoch);
+
put_cpu();
}

@@ -1827,9 +2005,6 @@ void xen_callback_vector(void)
void xen_callback_vector(void) {}
#endif

-#undef MODULE_PARAM_PREFIX
-#define MODULE_PARAM_PREFIX "xen."
-
static bool fifo_events = true;
module_param(fifo_events, bool, 0);

@@ -1837,6 +2012,8 @@ static int xen_evtchn_cpu_prepare(unsign
{
int ret = 0;

+ xen_cpu_init_eoi(cpu);
+
if (evtchn_ops->percpu_init)
ret = evtchn_ops->percpu_init(cpu);

@@ -1886,6 +2063,8 @@ void __init xen_init_IRQ(void)
if (ret < 0)
xen_evtchn_2l_init();

+ xen_cpu_init_eoi(smp_processor_id());
+
register_cpu_notifier(&evtchn_cpu_notifier);

evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -275,19 +275,9 @@ static uint32_t clear_linked(volatile ev
return w & EVTCHN_FIFO_LINK_MASK;
}

-static void handle_irq_for_port(unsigned port)
-{
- int irq;
-
- irq = get_evtchn_to_irq(port);
- if (irq != -1)
- generic_handle_irq(irq);
-}
-
-static void consume_one_event(unsigned cpu,
+static void consume_one_event(unsigned cpu, struct evtchn_loop_ctrl *ctrl,
struct evtchn_fifo_control_block *control_block,
- unsigned priority, unsigned long *ready,
- bool drop)
+ unsigned priority, unsigned long *ready)
{
struct evtchn_fifo_queue *q = &per_cpu(cpu_queue, cpu);
uint32_t head;
@@ -320,16 +310,17 @@ static void consume_one_event(unsigned c
clear_bit(priority, ready);

if (evtchn_fifo_is_pending(port) && !evtchn_fifo_is_masked(port)) {
- if (unlikely(drop))
+ if (unlikely(!ctrl))
pr_warn("Dropping pending event for port %u\n", port);
else
- handle_irq_for_port(port);
+ handle_irq_for_port(port, ctrl);
}

q->head[priority] = head;
}

-static void __evtchn_fifo_handle_events(unsigned cpu, bool drop)
+static void __evtchn_fifo_handle_events(unsigned cpu,
+ struct evtchn_loop_ctrl *ctrl)
{
struct evtchn_fifo_control_block *control_block;
unsigned long ready;
@@ -341,14 +332,15 @@ static void __evtchn_fifo_handle_events(

while (ready) {
q = find_first_bit(&ready, EVTCHN_FIFO_MAX_QUEUES);
- consume_one_event(cpu, control_block, q, &ready, drop);
+ consume_one_event(cpu, ctrl, control_block, q, &ready);
ready |= xchg(&control_block->ready, 0);
}
}

-static void evtchn_fifo_handle_events(unsigned cpu)
+static void evtchn_fifo_handle_events(unsigned cpu,
+ struct evtchn_loop_ctrl *ctrl)
{
- __evtchn_fifo_handle_events(cpu, false);
+ __evtchn_fifo_handle_events(cpu, ctrl);
}

static void evtchn_fifo_resume(void)
@@ -417,7 +409,7 @@ static int evtchn_fifo_percpu_init(unsig

static int evtchn_fifo_percpu_deinit(unsigned int cpu)
{
- __evtchn_fifo_handle_events(cpu, true);
+ __evtchn_fifo_handle_events(cpu, NULL);
return 0;
}

--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -32,11 +32,15 @@ enum xen_irq_type {
*/
struct irq_info {
struct list_head list;
+ struct list_head eoi_list;
int refcnt;
enum xen_irq_type type; /* type */
unsigned irq;
unsigned int evtchn; /* event channel */
unsigned short cpu; /* cpu bound */
+ unsigned short eoi_cpu; /* EOI must happen on this cpu */
+ unsigned int irq_epoch; /* If eoi_cpu valid: irq_epoch of event */
+ u64 eoi_time; /* Time in jiffies when to EOI. */

union {
unsigned short virq;
@@ -55,6 +59,8 @@ struct irq_info {
#define PIRQ_SHAREABLE (1 << 1)
#define PIRQ_MSI_GROUP (1 << 2)

+struct evtchn_loop_ctrl;
+
struct evtchn_ops {
unsigned (*max_channels)(void);
unsigned (*nr_channels)(void);
@@ -69,7 +75,7 @@ struct evtchn_ops {
void (*mask)(unsigned port);
void (*unmask)(unsigned port);

- void (*handle_events)(unsigned cpu);
+ void (*handle_events)(unsigned cpu, struct evtchn_loop_ctrl *ctrl);
void (*resume)(void);

int (*percpu_init)(unsigned int cpu);
@@ -80,6 +86,7 @@ extern const struct evtchn_ops *evtchn_o

extern int **evtchn_to_irq;
int get_evtchn_to_irq(unsigned int evtchn);
+void handle_irq_for_port(evtchn_port_t port, struct evtchn_loop_ctrl *ctrl);

struct irq_info *info_for_irq(unsigned irq);
unsigned cpu_from_irq(unsigned irq);
@@ -137,9 +144,10 @@ static inline void unmask_evtchn(unsigne
return evtchn_ops->unmask(port);
}

-static inline void xen_evtchn_handle_events(unsigned cpu)
+static inline void xen_evtchn_handle_events(unsigned cpu,
+ struct evtchn_loop_ctrl *ctrl)
{
- return evtchn_ops->handle_events(cpu);
+ return evtchn_ops->handle_events(cpu, ctrl);
}

static inline void xen_evtchn_resume(void)

2020-11-17 14:08:07

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 60/64] perf/core: Fix race in the perf_mmap_close() function

From: Jiri Olsa <[email protected]>

commit f91072ed1b7283b13ca57fcfbece5a3b92726143 upstream.

There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.

perf_mmap_close:
...
atomic_dec(&rb->mmap_count);
...
if (atomic_read(&rb->mmap_count))
goto out_put;

<ring buffer detach>
free_uid

out_put:
ring_buffer_put(rb); /* could be last */

The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.

The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:

refcount_t: addition on 0; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0x6d/0xf
...
Call Trace:
prepare_creds+0x190/0x1e0
copy_creds+0x35/0x172
copy_process+0x471/0x1a80
_do_fork+0x83/0x3a0
__do_sys_wait4+0x83/0x90
__do_sys_clone+0x85/0xa0
do_syscall_64+0x5b/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using atomic decrease and check instead of separated calls.

Tested-by: Michael Petlan <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Wade Mealing <[email protected]>
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole");
Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava
[sudip: backport to v4.9.y by using ring_buffer]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/events/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4664,11 +4664,11 @@ static void perf_mmap_open(struct vm_are
static void perf_mmap_close(struct vm_area_struct *vma)
{
struct perf_event *event = vma->vm_file->private_data;
-
struct ring_buffer *rb = ring_buffer_get(event);
struct user_struct *mmap_user = rb->mmap_user;
int mmap_locked = rb->mmap_locked;
unsigned long size = perf_data_size(rb);
+ bool detach_rest = false;

if (event->pmu->event_unmapped)
event->pmu->event_unmapped(event);
@@ -4687,7 +4687,8 @@ static void perf_mmap_close(struct vm_ar
mutex_unlock(&event->mmap_mutex);
}

- atomic_dec(&rb->mmap_count);
+ if (atomic_dec_and_test(&rb->mmap_count))
+ detach_rest = true;

if (!atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
goto out_put;
@@ -4696,7 +4697,7 @@ static void perf_mmap_close(struct vm_ar
mutex_unlock(&event->mmap_mutex);

/* If there's still other mmap()s of this buffer, we're done. */
- if (atomic_read(&rb->mmap_count))
+ if (!detach_rest)
goto out_put;

/*

2020-11-17 14:08:24

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 62/64] reboot: fix overflow parsing reboot cpu number

From: Matteo Croce <[email protected]>

commit df5b0ab3e08a156701b537809914b339b0daa526 upstream.

Limit the CPU number to num_possible_cpus(), because setting it to a
value lower than INT_MAX but higher than NR_CPUS produces the following
error on reboot and shutdown:

BUG: unable to handle page fault for address: ffffffff90ab1bb0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0
Oops: 0000 [#1] SMP
CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:migrate_to_reboot_cpu+0xe/0x60
Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c
RSP: 0018:ffffc90000013e08 EFLAGS: 00010246
RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000
RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0
RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000
FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__do_sys_reboot.cold+0x34/0x5b
do_syscall_64+0x2d/0x40

Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel")
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
[sudip: use reboot_mode instead of mode]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/reboot.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -519,6 +519,13 @@ static int __init reboot_setup(char *str
reboot_cpu = simple_strtoul(str+3, NULL, 0);
else
reboot_mode = REBOOT_SOFT;
+ if (reboot_cpu >= num_possible_cpus()) {
+ pr_err("Ignoring the CPU number in reboot= option. "
+ "CPU %d exceeds possible cpu number %d\n",
+ reboot_cpu, num_possible_cpus());
+ reboot_cpu = 0;
+ break;
+ }
break;

case 'g':

2020-11-17 14:08:37

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 64/64] Convert trailing spaces and periods in path components

From: Boris Protopopov <[email protected]>

commit 57c176074057531b249cf522d90c22313fa74b0b upstream.

When converting trailing spaces and periods in paths, do so
for every component of the path, not just the last component.
If the conversion is not done for every path component, then
subsequent operations in directories with trailing spaces or
periods (e.g. create(), mkdir()) will fail with ENOENT. This
is because on the server, the directory will have a special
symbol in its name, and the client needs to provide the same.

Signed-off-by: Boris Protopopov <[email protected]>
Acked-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/cifs_unicode.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/fs/cifs/cifs_unicode.c
+++ b/fs/cifs/cifs_unicode.c
@@ -493,7 +493,13 @@ cifsConvertToUTF16(__le16 *target, const
else if (map_chars == SFM_MAP_UNI_RSVD) {
bool end_of_string;

- if (i == srclen - 1)
+ /**
+ * Remap spaces and periods found at the end of every
+ * component of the path. The special cases of '.' and
+ * '..' do not need to be dealt with explicitly because
+ * they are addressed in namei.c:link_path_walk().
+ **/
+ if ((i == srclen - 1) || (source[i+1] == '\\'))
end_of_string = true;
else
end_of_string = false;

2020-11-17 14:08:38

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 33/64] ext4: unlock xattr_sem properly in ext4_inline_data_truncate()

From: Joseph Qi <[email protected]>

commit 7067b2619017d51e71686ca9756b454de0e5826a upstream.

It takes xattr_sem to check inline data again but without unlock it
in case not have. So unlock it before return.

Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly")
Reported-by: Dan Carpenter <[email protected]>
Cc: Tao Ma <[email protected]>
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/inline.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1892,6 +1892,7 @@ void ext4_inline_data_truncate(struct in

ext4_write_lock_xattr(inode, &no_expand);
if (!ext4_has_inline_data(inode)) {
+ ext4_write_unlock_xattr(inode, &no_expand);
*has_inline = 0;
ext4_journal_stop(handle);
return;

2020-11-17 14:08:38

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 34/64] usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode

From: Chris Brandt <[email protected]>

commit 6d853c9e4104b4fc8d55dc9cd3b99712aa347174 upstream.

Renesas R-Car and RZ/G SoCs have a firmware download mode over USB.
However, on reset a banner string is transmitted out which is not expected
to be echoed back and will corrupt the protocol.

Cc: stable <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: Chris Brandt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/class/cdc-acm.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1693,6 +1693,15 @@ static const struct usb_device_id acm_id
{ USB_DEVICE(0x0870, 0x0001), /* Metricom GS Modem */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
+ { USB_DEVICE(0x045b, 0x023c), /* Renesas USB Download mode */
+ .driver_info = DISABLE_ECHO, /* Don't echo banner */
+ },
+ { USB_DEVICE(0x045b, 0x0248), /* Renesas USB Download mode */
+ .driver_info = DISABLE_ECHO, /* Don't echo banner */
+ },
+ { USB_DEVICE(0x045b, 0x024D), /* Renesas USB Download mode */
+ .driver_info = DISABLE_ECHO, /* Don't echo banner */
+ },
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; [email protected] */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},

2020-11-17 14:08:44

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 36/64] ocfs2: initialize ip_next_orphan

From: Wengang Wang <[email protected]>

commit f5785283dd64867a711ca1fb1f5bb172f252ecdf upstream.

Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.

In one node in the cluster, there is the following callback trace:

# cat /proc/21473/stack
__ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
ocfs2_evict_inode+0x152/0x820 [ocfs2]
evict+0xae/0x1a0
iput+0x1c6/0x230
ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
ocfs2_dir_foreach+0x29/0x30 [ocfs2]
ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
process_one_work+0x169/0x4a0
worker_thread+0x5b/0x560
kthread+0xcb/0xf0
ret_from_fork+0x61/0x90

The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,

2067 /* Skip inodes which are already added to recover list, since dio may
2068 * happen concurrently with unlink/rename */
2069 if (OCFS2_I(iter)->ip_next_orphan) {
2070 iput(iter);
2071 return 0;
2072 }
2073

The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().

While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference). So I don't think the inode is really in
the recover list (no vmcore to confirm).

Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up
for unlinked inodes. The on disk inode eviction could involve a lot of
IOs which may need long time to finish. That means this node could hold
the cluster lock for very long time, that can lead to the lock requests
(from other nodes) to the orhpan directory hang for long time.

Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.

This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.

Fix: initialize ip_next_orphan as NULL.

Signed-off-by: Wengang Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ocfs2/super.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1751,6 +1751,7 @@ static void ocfs2_inode_init_once(void *

oi->ip_blkno = 0ULL;
oi->ip_clusters = 0;
+ oi->ip_next_orphan = NULL;

ocfs2_resv_init_once(&oi->ip_la_data_resv);

2020-11-17 14:08:51

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 63/64] ext4: fix leaking sysfs kobject after failed mount

From: Eric Biggers <[email protected]>

commit cb8d53d2c97369029cc638c9274ac7be0a316c75 upstream.

ext4_unregister_sysfs() only deletes the kobject. The reference to it
needs to be put separately, like ext4_put_super() does.

This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).

Reported-by: [email protected]
Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ext4/super.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4168,6 +4168,7 @@ cantfind_ext4:
#ifdef CONFIG_QUOTA
failed_mount8:
ext4_unregister_sysfs(sb);
+ kobject_put(&sbi->s_kobj);
#endif
failed_mount7:
ext4_unregister_li_request(sb);

2020-11-17 14:08:52

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 56/64] xen/events: switch user event channels to lateeoi model

From: Juergen Gross <[email protected]>

commit c44b849cee8c3ac587da3b0980e01f77500d158c upstream.

Instead of disabling the irq when an event is received and enabling
it again when handled by the user process use the lateeoi model.

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/evtchn.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -173,7 +173,6 @@ static irqreturn_t evtchn_interrupt(int
"Interrupt for port %d, but apparently not enabled; per-user %p\n",
evtchn->port, u);

- disable_irq_nosync(irq);
evtchn->enabled = false;

spin_lock(&u->ring_prod_lock);
@@ -299,7 +298,7 @@ static ssize_t evtchn_write(struct file
evtchn = find_evtchn(u, port);
if (evtchn && !evtchn->enabled) {
evtchn->enabled = true;
- enable_irq(irq_from_evtchn(port));
+ xen_irq_lateeoi(irq_from_evtchn(port), 0);
}
}

@@ -399,8 +398,8 @@ static int evtchn_bind_to_user(struct pe
if (rc < 0)
goto err;

- rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, 0,
- u->name, evtchn);
+ rc = bind_evtchn_to_irqhandler_lateeoi(port, evtchn_interrupt, 0,
+ u->name, evtchn);
if (rc < 0)
goto err;

2020-11-17 14:08:59

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 55/64] xen/pciback: use lateeoi irq binding

From: Juergen Gross <[email protected]>

commit c2711441bc961b37bba0615dd7135857d189035f upstream.

In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving pcifront use the lateeoi irq
binding for pciback and unmask the event channel only just before
leaving the event handling function.

Restructure the handling to support that scheme. Basically an event can
come in for two reasons: either a normal request for a pciback action,
which is handled in a worker, or in case the guest has finished an AER
request which was requested by pciback.

When an AER request is issued to the guest and a normal pciback action
is currently active issue an EOI early in order to be able to receive
another event when the AER request has been finished by the guest.

Let the worker processing the normal requests run until no further
request is pending, instead of starting a new worker ion that case.
Issue the EOI only just before leaving the worker.

This scheme allows to drop calling the generic function
xen_pcibk_test_and_schedule_op() after processing of any request as
the handling of both request types is now separated more cleanly.

This is part of XSA-332.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/xen-pciback/pci_stub.c | 14 ++++-----
drivers/xen/xen-pciback/pciback.h | 12 +++++++-
drivers/xen/xen-pciback/pciback_ops.c | 48 ++++++++++++++++++++++++++--------
drivers/xen/xen-pciback/xenbus.c | 2 -
4 files changed, 56 insertions(+), 20 deletions(-)

--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -681,10 +681,17 @@ static pci_ers_result_t common_process(s
wmb();
notify_remote_via_irq(pdev->evtchn_irq);

+ /* Enable IRQ to signal "request done". */
+ xen_pcibk_lateeoi(pdev, 0);
+
ret = wait_event_timeout(xen_pcibk_aer_wait_queue,
!(test_bit(_XEN_PCIB_active, (unsigned long *)
&sh_info->flags)), 300*HZ);

+ /* Enable IRQ for pcifront request if not already active. */
+ if (!test_bit(_PDEVF_op_active, &pdev->flags))
+ xen_pcibk_lateeoi(pdev, 0);
+
if (!ret) {
if (test_bit(_XEN_PCIB_active,
(unsigned long *)&sh_info->flags)) {
@@ -698,13 +705,6 @@ static pci_ers_result_t common_process(s
}
clear_bit(_PCIB_op_pending, (unsigned long *)&pdev->flags);

- if (test_bit(_XEN_PCIF_active,
- (unsigned long *)&sh_info->flags)) {
- dev_dbg(&psdev->dev->dev,
- "schedule pci_conf service in " DRV_NAME "\n");
- xen_pcibk_test_and_schedule_op(psdev->pdev);
- }
-
res = (pci_ers_result_t)aer_op->err;
return res;
}
--- a/drivers/xen/xen-pciback/pciback.h
+++ b/drivers/xen/xen-pciback/pciback.h
@@ -13,6 +13,7 @@
#include <linux/spinlock.h>
#include <linux/workqueue.h>
#include <linux/atomic.h>
+#include <xen/events.h>
#include <xen/interface/io/pciif.h>

#define DRV_NAME "xen-pciback"
@@ -26,6 +27,8 @@ struct pci_dev_entry {
#define PDEVF_op_active (1<<(_PDEVF_op_active))
#define _PCIB_op_pending (1)
#define PCIB_op_pending (1<<(_PCIB_op_pending))
+#define _EOI_pending (2)
+#define EOI_pending (1<<(_EOI_pending))

struct xen_pcibk_device {
void *pci_dev_data;
@@ -182,12 +185,17 @@ static inline void xen_pcibk_release_dev
irqreturn_t xen_pcibk_handle_event(int irq, void *dev_id);
void xen_pcibk_do_op(struct work_struct *data);

+static inline void xen_pcibk_lateeoi(struct xen_pcibk_device *pdev,
+ unsigned int eoi_flag)
+{
+ if (test_and_clear_bit(_EOI_pending, &pdev->flags))
+ xen_irq_lateeoi(pdev->evtchn_irq, eoi_flag);
+}
+
int xen_pcibk_xenbus_register(void);
void xen_pcibk_xenbus_unregister(void);

extern int verbose_request;
-
-void xen_pcibk_test_and_schedule_op(struct xen_pcibk_device *pdev);
#endif

/* Handles shared IRQs that can to device domain and control domain. */
--- a/drivers/xen/xen-pciback/pciback_ops.c
+++ b/drivers/xen/xen-pciback/pciback_ops.c
@@ -296,26 +296,41 @@ int xen_pcibk_disable_msix(struct xen_pc
return 0;
}
#endif
+
+static inline bool xen_pcibk_test_op_pending(struct xen_pcibk_device *pdev)
+{
+ return test_bit(_XEN_PCIF_active,
+ (unsigned long *)&pdev->sh_info->flags) &&
+ !test_and_set_bit(_PDEVF_op_active, &pdev->flags);
+}
+
/*
* Now the same evtchn is used for both pcifront conf_read_write request
* as well as pcie aer front end ack. We use a new work_queue to schedule
* xen_pcibk conf_read_write service for avoiding confict with aer_core
* do_recovery job which also use the system default work_queue
*/
-void xen_pcibk_test_and_schedule_op(struct xen_pcibk_device *pdev)
+static void xen_pcibk_test_and_schedule_op(struct xen_pcibk_device *pdev)
{
+ bool eoi = true;
+
/* Check that frontend is requesting an operation and that we are not
* already processing a request */
- if (test_bit(_XEN_PCIF_active, (unsigned long *)&pdev->sh_info->flags)
- && !test_and_set_bit(_PDEVF_op_active, &pdev->flags)) {
+ if (xen_pcibk_test_op_pending(pdev)) {
queue_work(xen_pcibk_wq, &pdev->op_work);
+ eoi = false;
}
/*_XEN_PCIB_active should have been cleared by pcifront. And also make
sure xen_pcibk is waiting for ack by checking _PCIB_op_pending*/
if (!test_bit(_XEN_PCIB_active, (unsigned long *)&pdev->sh_info->flags)
&& test_bit(_PCIB_op_pending, &pdev->flags)) {
wake_up(&xen_pcibk_aer_wait_queue);
+ eoi = false;
}
+
+ /* EOI if there was nothing to do. */
+ if (eoi)
+ xen_pcibk_lateeoi(pdev, XEN_EOI_FLAG_SPURIOUS);
}

/* Performing the configuration space reads/writes must not be done in atomic
@@ -323,10 +338,8 @@ void xen_pcibk_test_and_schedule_op(stru
* use of semaphores). This function is intended to be called from a work
* queue in process context taking a struct xen_pcibk_device as a parameter */

-void xen_pcibk_do_op(struct work_struct *data)
+static void xen_pcibk_do_one_op(struct xen_pcibk_device *pdev)
{
- struct xen_pcibk_device *pdev =
- container_of(data, struct xen_pcibk_device, op_work);
struct pci_dev *dev;
struct xen_pcibk_dev_data *dev_data = NULL;
struct xen_pci_op *op = &pdev->op;
@@ -399,16 +412,31 @@ void xen_pcibk_do_op(struct work_struct
smp_mb__before_atomic(); /* /after/ clearing PCIF_active */
clear_bit(_PDEVF_op_active, &pdev->flags);
smp_mb__after_atomic(); /* /before/ final check for work */
+}

- /* Check to see if the driver domain tried to start another request in
- * between clearing _XEN_PCIF_active and clearing _PDEVF_op_active.
- */
- xen_pcibk_test_and_schedule_op(pdev);
+void xen_pcibk_do_op(struct work_struct *data)
+{
+ struct xen_pcibk_device *pdev =
+ container_of(data, struct xen_pcibk_device, op_work);
+
+ do {
+ xen_pcibk_do_one_op(pdev);
+ } while (xen_pcibk_test_op_pending(pdev));
+
+ xen_pcibk_lateeoi(pdev, 0);
}

irqreturn_t xen_pcibk_handle_event(int irq, void *dev_id)
{
struct xen_pcibk_device *pdev = dev_id;
+ bool eoi;
+
+ /* IRQs might come in before pdev->evtchn_irq is written. */
+ if (unlikely(pdev->evtchn_irq != irq))
+ pdev->evtchn_irq = irq;
+
+ eoi = test_and_set_bit(_EOI_pending, &pdev->flags);
+ WARN(eoi, "IRQ while EOI pending\n");

xen_pcibk_test_and_schedule_op(pdev);

--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -124,7 +124,7 @@ static int xen_pcibk_do_attach(struct xe

pdev->sh_info = vaddr;

- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
pdev->xdev->otherend_id, remote_evtchn, xen_pcibk_handle_event,
0, DRV_NAME, pdev);
if (err < 0) {

2020-11-17 14:09:14

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 09/64] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context

From: Vincent Mailhol <[email protected]>

[ Upstream commit 2283f79b22684d2812e5c76fc2280aae00390365 ]

If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
not always, the case), the 'WARN_ON(in_irq)' in
net/core/skbuff.c#skb_release_head_state() might be triggered, under network
congestion circumstances, together with the potential risk of a NULL pointer
dereference.

The root cause of this issue is the call to kfree_skb() instead of
dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().

This patch prevents the skb to be freed within the call to netif_rx() by
incrementing its reference count with skb_get(). The skb is finally freed by
one of the in-irq-context safe functions: dev_consume_skb_any() or
dev_kfree_skb_any(). The "any" version is used because some drivers might call
can_get_echo_skb() in a normal context.

The reason for this issue to occur is that initially, in the core network
stack, loopback skb were not supposed to be received in hardware IRQ context.
The CAN stack is an exeption.

This bug was previously reported back in 2017 in [1] but the proposed patch
never got accepted.

While [1] directly modifies net/core/dev.c, we try to propose here a
smoother modification local to CAN network stack (the assumption
behind is that only CAN devices are affected by this issue).

[1] http://lore.kernel.org/r/[email protected]

Signed-off-by: Vincent Mailhol <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/can/dev.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 9dd968ee792e0..3c0f141262ad5 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -466,7 +466,11 @@ unsigned int can_get_echo_skb(struct net_device *dev, unsigned int idx)
if (!skb)
return 0;

- netif_rx(skb);
+ skb_get(skb);
+ if (netif_rx(skb) == NET_RX_SUCCESS)
+ dev_consume_skb_any(skb);
+ else
+ dev_kfree_skb_any(skb);

return len;
}
--
2.27.0

2020-11-17 14:09:20

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 24/64] mac80211: fix use of skb payload instead of header

From: Johannes Berg <[email protected]>

[ Upstream commit 14f46c1e5108696ec1e5a129e838ecedf108c7bf ]

When ieee80211_skb_resize() is called from ieee80211_build_hdr()
the skb has no 802.11 header yet, in fact it consist only of the
payload as the ethernet frame is removed. As such, we're using
the payload data for ieee80211_is_mgmt(), which is of course
completely wrong. This didn't really hurt us because these are
always data frames, so we could only have added more tailroom
than we needed if we determined it was a management frame and
sdata->crypto_tx_tailroom_needed_cnt was false.

However, syzbot found that of course there need not be any payload,
so we're using at best uninitialized memory for the check.

Fix this to pass explicitly the kind of frame that we have instead
of checking there, by replacing the "bool may_encrypt" argument
with an argument that can carry the three possible states - it's
not going to be encrypted, it's a management frame, or it's a data
frame (and then we check sdata->crypto_tx_tailroom_needed_cnt).

Reported-by: [email protected]
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/mac80211/tx.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 98c34c3adf392..4466413c5eecc 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1594,19 +1594,24 @@ static bool ieee80211_tx(struct ieee80211_sub_if_data *sdata,

/* device xmit handlers */

+enum ieee80211_encrypt {
+ ENCRYPT_NO,
+ ENCRYPT_MGMT,
+ ENCRYPT_DATA,
+};
+
static int ieee80211_skb_resize(struct ieee80211_sub_if_data *sdata,
struct sk_buff *skb,
- int head_need, bool may_encrypt)
+ int head_need,
+ enum ieee80211_encrypt encrypt)
{
struct ieee80211_local *local = sdata->local;
- struct ieee80211_hdr *hdr;
bool enc_tailroom;
int tail_need = 0;

- hdr = (struct ieee80211_hdr *) skb->data;
- enc_tailroom = may_encrypt &&
- (sdata->crypto_tx_tailroom_needed_cnt ||
- ieee80211_is_mgmt(hdr->frame_control));
+ enc_tailroom = encrypt == ENCRYPT_MGMT ||
+ (encrypt == ENCRYPT_DATA &&
+ sdata->crypto_tx_tailroom_needed_cnt);

if (enc_tailroom) {
tail_need = IEEE80211_ENCRYPT_TAILROOM;
@@ -1639,21 +1644,27 @@ void ieee80211_xmit(struct ieee80211_sub_if_data *sdata,
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
int headroom;
- bool may_encrypt;
+ enum ieee80211_encrypt encrypt;

- may_encrypt = !(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT);
+ if (info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT)
+ encrypt = ENCRYPT_NO;
+ else if (ieee80211_is_mgmt(hdr->frame_control))
+ encrypt = ENCRYPT_MGMT;
+ else
+ encrypt = ENCRYPT_DATA;

headroom = local->tx_headroom;
- if (may_encrypt)
+ if (encrypt != ENCRYPT_NO)
headroom += sdata->encrypt_headroom;
headroom -= skb_headroom(skb);
headroom = max_t(int, 0, headroom);

- if (ieee80211_skb_resize(sdata, skb, headroom, may_encrypt)) {
+ if (ieee80211_skb_resize(sdata, skb, headroom, encrypt)) {
ieee80211_free_txskb(&local->hw, skb);
return;
}

+ /* reload after potential resize */
hdr = (struct ieee80211_hdr *) skb->data;
info->control.vif = &sdata->vif;

@@ -2346,7 +2357,7 @@ static struct sk_buff *ieee80211_build_hdr(struct ieee80211_sub_if_data *sdata,
head_need += sdata->encrypt_headroom;
head_need += local->tx_headroom;
head_need = max_t(int, 0, head_need);
- if (ieee80211_skb_resize(sdata, skb, head_need, true)) {
+ if (ieee80211_skb_resize(sdata, skb, head_need, ENCRYPT_DATA)) {
ieee80211_free_txskb(&local->hw, skb);
skb = NULL;
return ERR_PTR(-ENOMEM);
@@ -2756,7 +2767,7 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata,
if (unlikely(ieee80211_skb_resize(sdata, skb,
max_t(int, extra_head + hw_headroom -
skb_headroom(skb), 0),
- false))) {
+ ENCRYPT_NO))) {
kfree_skb(skb);
return true;
}
--
2.27.0

2020-11-17 14:09:41

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 32/64] ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA

From: Kaixu Xia <[email protected]>

commit 174fe5ba2d1ea0d6c5ab2a7d4aa058d6d497ae4d upstream.

The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled. Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument. It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.

[ Fixed up the commit description to better explain why the fix
works. --TYT ]

Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options")
Signed-off-by: Kaixu Xia <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/super.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1452,8 +1452,8 @@ static const struct mount_opts {
MOPT_SET | MOPT_Q},
{Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA |
EXT4_MOUNT_GRPQUOTA), MOPT_CLEAR | MOPT_Q},
- {Opt_usrjquota, 0, MOPT_Q},
- {Opt_grpjquota, 0, MOPT_Q},
+ {Opt_usrjquota, 0, MOPT_Q | MOPT_STRING},
+ {Opt_grpjquota, 0, MOPT_Q | MOPT_STRING},
{Opt_offusrjquota, 0, MOPT_Q},
{Opt_offgrpjquota, 0, MOPT_Q},
{Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT},

2020-11-17 14:10:05

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 50/64] xen/events: fix race in evtchn_fifo_unmask()

From: Juergen Gross <[email protected]>

commit f01337197419b7e8a492e83089552b77d3b5fb90 upstream.

Unmasking a fifo event channel can result in unmasking it twice, once
directly in the kernel and once via a hypercall in case the event was
pending.

Fix that by doing the local unmask only if the event is not pending.

This is part of XSA-332.

Cc: [email protected]
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/events/events_fifo.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -227,19 +227,25 @@ static bool evtchn_fifo_is_masked(unsign
return sync_test_bit(EVTCHN_FIFO_BIT(MASKED, word), BM(word));
}
/*
- * Clear MASKED, spinning if BUSY is set.
+ * Clear MASKED if not PENDING, spinning if BUSY is set.
+ * Return true if mask was cleared.
*/
-static void clear_masked(volatile event_word_t *word)
+static bool clear_masked_cond(volatile event_word_t *word)
{
event_word_t new, old, w;

w = *word;

do {
+ if (w & (1 << EVTCHN_FIFO_PENDING))
+ return false;
+
old = w & ~(1 << EVTCHN_FIFO_BUSY);
new = old & ~(1 << EVTCHN_FIFO_MASKED);
w = sync_cmpxchg(word, old, new);
} while (w != old);
+
+ return true;
}

static void evtchn_fifo_unmask(unsigned port)
@@ -248,8 +254,7 @@ static void evtchn_fifo_unmask(unsigned

BUG_ON(!irqs_disabled());

- clear_masked(word);
- if (evtchn_fifo_is_pending(port)) {
+ if (!clear_masked_cond(word)) {
struct evtchn_unmask unmask = { .port = port };
(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
}

2020-11-17 14:10:32

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 18/64] geneve: add transport ports in route lookup for geneve

From: Mark Gray <[email protected]>

commit 34beb21594519ce64a55a498c2fe7d567bc1ca20 upstream.

This patch adds transport ports information for route lookup so that
IPsec can select Geneve tunnel traffic to do encryption. This is
needed for OVS/OVN IPsec with encrypted Geneve tunnels.

This can be tested by configuring a host-host VPN using an IKE
daemon and specifying port numbers. For example, for an
Openswan-type configuration, the following parameters should be
configured on both hosts and IPsec set up as-per normal:

$ cat /etc/ipsec.conf

conn in
...
left=$IP1
right=$IP2
...
leftprotoport=udp/6081
rightprotoport=udp
...
conn out
...
left=$IP1
right=$IP2
...
leftprotoport=udp
rightprotoport=udp/6081
...

The tunnel can then be setup using "ip" on both hosts (but
changing the relevant IP addresses):

$ ip link add tun type geneve id 1000 remote $IP2
$ ip addr add 192.168.0.1/24 dev tun
$ ip link set tun up

This can then be tested by pinging from $IP1:

$ ping 192.168.0.2

Without this patch the traffic is unencrypted on the wire.

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Qiuyu Xiao <[email protected]>
Signed-off-by: Mark Gray <[email protected]>
Reviewed-by: Greg Rose <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 4.4:
- Use geneve->dst_port instead of geneve->cfg.info.key.tp_dst
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/geneve.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index ec13e2ae6d16e..ee38299f9c578 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -711,7 +711,8 @@ static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
struct net_device *dev,
struct flowi4 *fl4,
- struct ip_tunnel_info *info)
+ struct ip_tunnel_info *info,
+ __be16 dport, __be16 sport)
{
struct geneve_dev *geneve = netdev_priv(dev);
struct rtable *rt = NULL;
@@ -720,6 +721,8 @@ static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
memset(fl4, 0, sizeof(*fl4));
fl4->flowi4_mark = skb->mark;
fl4->flowi4_proto = IPPROTO_UDP;
+ fl4->fl4_dport = dport;
+ fl4->fl4_sport = sport;

if (info) {
fl4->daddr = info->key.u.ipv4.dst;
@@ -754,7 +757,8 @@ static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
struct net_device *dev,
struct flowi6 *fl6,
- struct ip_tunnel_info *info)
+ struct ip_tunnel_info *info,
+ __be16 dport, __be16 sport)
{
struct geneve_dev *geneve = netdev_priv(dev);
struct geneve_sock *gs6 = geneve->sock6;
@@ -764,6 +768,8 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
memset(fl6, 0, sizeof(*fl6));
fl6->flowi6_mark = skb->mark;
fl6->flowi6_proto = IPPROTO_UDP;
+ fl6->fl6_dport = dport;
+ fl6->fl6_sport = sport;

if (info) {
fl6->daddr = info->key.u.ipv6.dst;
@@ -834,13 +840,14 @@ static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
goto tx_error;
}

- rt = geneve_get_v4_rt(skb, dev, &fl4, info);
+ sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
+ rt = geneve_get_v4_rt(skb, dev, &fl4, info,
+ geneve->dst_port, sport);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
goto tx_error;
}

- sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
skb_reset_mac_header(skb);

if (info) {
@@ -916,13 +923,14 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
}
}

- dst = geneve_get_v6_dst(skb, dev, &fl6, info);
+ sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
+ dst = geneve_get_v6_dst(skb, dev, &fl6, info,
+ geneve->dst_port, sport);
if (IS_ERR(dst)) {
err = PTR_ERR(dst);
goto tx_error;
}

- sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
skb_reset_mac_header(skb);

if (info) {
@@ -1011,9 +1019,14 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
struct dst_entry *dst;
struct flowi6 fl6;
#endif
+ __be16 sport;

if (ip_tunnel_info_af(info) == AF_INET) {
- rt = geneve_get_v4_rt(skb, dev, &fl4, info);
+ sport = udp_flow_src_port(geneve->net, skb,
+ 1, USHRT_MAX, true);
+
+ rt = geneve_get_v4_rt(skb, dev, &fl4, info,
+ geneve->dst_port, sport);
if (IS_ERR(rt))
return PTR_ERR(rt);

@@ -1021,7 +1034,11 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
info->key.u.ipv4.src = fl4.saddr;
#if IS_ENABLED(CONFIG_IPV6)
} else if (ip_tunnel_info_af(info) == AF_INET6) {
- dst = geneve_get_v6_dst(skb, dev, &fl6, info);
+ sport = udp_flow_src_port(geneve->net, skb,
+ 1, USHRT_MAX, true);
+
+ dst = geneve_get_v6_dst(skb, dev, &fl6, info,
+ geneve->dst_port, sport);
if (IS_ERR(dst))
return PTR_ERR(dst);

@@ -1032,8 +1049,7 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
return -EINVAL;
}

- info->key.tp_src = udp_flow_src_port(geneve->net, skb,
- 1, USHRT_MAX, true);
+ info->key.tp_src = sport;
info->key.tp_dst = geneve->dst_port;
return 0;
}
--
2.27.0

2020-11-17 14:10:53

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 30/64] cosa: Add missing kfree in error path of cosa_write

From: Wang Hai <[email protected]>

[ Upstream commit 52755b66ddcef2e897778fac5656df18817b59ab ]

If memory allocation for 'kbuf' succeed, cosa_write() doesn't have a
corresponding kfree() in exception handling. Thus add kfree() for this
function implementation.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Acked-by: Jan "Yenya" Kasprzak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/wan/cosa.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wan/cosa.c b/drivers/net/wan/cosa.c
index 848ea6a399f23..cbda69e58e084 100644
--- a/drivers/net/wan/cosa.c
+++ b/drivers/net/wan/cosa.c
@@ -903,6 +903,7 @@ static ssize_t cosa_write(struct file *file,
chan->tx_status = 1;
spin_unlock_irqrestore(&cosa->lock, flags);
up(&chan->wsem);
+ kfree(kbuf);
return -ERESTARTSYS;
}
}
--
2.27.0

2020-11-17 14:11:03

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 05/64] btrfs: reschedule when cloning lots of extents

From: Johannes Thumshirn <[email protected]>

[ Upstream commit 6b613cc97f0ace77f92f7bc112b8f6ad3f52baf8 ]

We have several occurrences of a soft lockup from fstest's generic/175
testcase, which look more or less like this one:

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030]
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 10030 Comm: xfs_io Tainted: G L 5.9.0-rc5+ #768
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x77/0xa0
panic+0xfa/0x2cb
watchdog_timer_fn.cold+0x85/0xa5
? lockup_detector_update_enable+0x50/0x50
__hrtimer_run_queues+0x99/0x4c0
? recalibrate_cpu_khz+0x10/0x10
hrtimer_run_queues+0x9f/0xb0
update_process_times+0x28/0x80
tick_handle_periodic+0x1b/0x60
__sysvec_apic_timer_interrupt+0x76/0x210
asm_call_on_stack+0x12/0x20
</IRQ>
sysvec_apic_timer_interrupt+0x7f/0x90
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs]
RSP: 0018:ffffc90007123a58 EFLAGS: 00000282
RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000
RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032
R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000
R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0
? btrfs_tree_unlock+0x15b/0x1a0 [btrfs]
btrfs_release_path+0x67/0x80 [btrfs]
btrfs_insert_replace_extent+0x177/0x2c0 [btrfs]
btrfs_replace_file_extents+0x472/0x7c0 [btrfs]
btrfs_clone+0x9ba/0xbd0 [btrfs]
btrfs_clone_files.isra.0+0xeb/0x140 [btrfs]
? file_update_time+0xcd/0x120
btrfs_remap_file_range+0x322/0x3b0 [btrfs]
do_clone_file_range+0xb7/0x1e0
vfs_clone_file_range+0x30/0xa0
ioctl_file_clone+0x8a/0xc0
do_vfs_ioctl+0x5b2/0x6f0
__x64_sys_ioctl+0x37/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f87977fc247
RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247
RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003
RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000
R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000
Kernel Offset: disabled
---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

All of these lockup reports have the call chain btrfs_clone_files() ->
btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with
both source and destination extents locked and loops over the source
extent to create the clones.

Conditionally reschedule in the btrfs_clone() loop, to give some time back
to other processes.

CC: [email protected] # 4.4+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/btrfs/ioctl.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 67366515a29d2..f35e18e76f160 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3856,6 +3856,8 @@ process_slot:
ret = -EINTR;
goto out;
}
+
+ cond_resched();
}
ret = 0;

--
2.27.0

2020-11-17 14:11:10

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 26/64] cfg80211: regulatory: Fix inconsistent format argument

From: Ye Bin <[email protected]>

[ Upstream commit db18d20d1cb0fde16d518fb5ccd38679f174bc04 ]

Fix follow warning:
[net/wireless/reg.c:3619]: (warning) %d in format string (no. 2)
requires 'int' but the argument type is 'unsigned int'.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Ye Bin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/wireless/reg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 474923175b108..dcbf5cd44bb37 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -2775,7 +2775,7 @@ static void print_rd_rules(const struct ieee80211_regdomain *rd)
power_rule = &reg_rule->power_rule;

if (reg_rule->flags & NL80211_RRF_AUTO_BW)
- snprintf(bw, sizeof(bw), "%d KHz, %d KHz AUTO",
+ snprintf(bw, sizeof(bw), "%d KHz, %u KHz AUTO",
freq_range->max_bandwidth_khz,
reg_get_max_bandwidth(rd, reg_rule));
else
--
2.27.0

2020-11-17 14:11:11

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 46/64] random32: make prandom_u32() output unpredictable

From: George Spelvin <[email protected]>

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.

Reported-by: Amit Klein <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Florian Westphal <[email protected]>
Cc: Marc Plumb <[email protected]>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
[wt: backported to 4.4 -- no latent_entropy, drop prandom_reseed_late]
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/char/random.c | 2
include/linux/prandom.h | 36 +++
kernel/time/timer.c | 7
lib/random32.c | 463 +++++++++++++++++++++++++++++-------------------
4 files changed, 317 insertions(+), 191 deletions(-)

--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -678,7 +678,6 @@ retry:
r->initialized = 1;
r->entropy_total = 0;
if (r == &nonblocking_pool) {
- prandom_reseed_late();
process_random_ready_list();
wake_up_all(&urandom_init_wait);
pr_notice("random: %s pool is initialized\n", r->name);
@@ -923,7 +922,6 @@ void add_interrupt_randomness(int irq, i

fast_mix(fast_pool);
add_interrupt_bench(cycles);
- this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);

if ((fast_pool->count < 64) &&
!time_after(now, fast_pool->last + HZ))
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nby
void prandom_seed(u32 seed);
void prandom_reseed_late(void);

+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function. Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+ v0 += v1, v1 = rol64(v1, 13), v2 += v3, v3 = rol64(v3, 16), \
+ v1 ^= v0, v0 = rol64(v0, 32), v3 ^= v2, \
+ v0 += v3, v3 = rol64(v3, 21), v2 += v1, v1 = rol64(v1, 17), \
+ v3 ^= v0, v1 ^= v2, v2 = rol64(v2, 32) \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+ v0 += v1, v1 = rol32(v1, 5), v2 += v3, v3 = rol32(v3, 8), \
+ v1 ^= v0, v0 = rol32(v0, 16), v3 ^= v2, \
+ v0 += v3, v3 = rol32(v3, 7), v2 += v1, v1 = rol32(v1, 13), \
+ v3 ^= v0, v1 ^= v2, v2 = rol32(v2, 16) \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
struct rnd_state {
__u32 s1, s2, s3, s4;
};

-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
u32 prandom_u32_state(struct rnd_state *state);
void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1432,13 +1432,6 @@ void update_process_times(int user_tick)
#endif
scheduler_tick();
run_posix_cpu_timers(p);
-
- /* The current CPU might make use of net randoms without receiving IRQs
- * to renew them often enough. Let's update the net_rand_state from a
- * non-constant value that's not affine to the number of calls to make
- * sure it's updated when there's some activity (we don't care in idle).
- */
- this_cpu_add(net_rand_state.s1, rol32(jiffies, 24) + user_tick);
}

/*
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -39,16 +39,6 @@
#include <linux/sched.h>
#include <asm/unaligned.h>

-#ifdef CONFIG_RANDOM32_SELFTEST
-static void __init prandom_state_selftest(void);
-#else
-static inline void prandom_state_selftest(void)
-{
-}
-#endif
-
-DEFINE_PER_CPU(struct rnd_state, net_rand_state);
-
/**
* prandom_u32_state - seeded pseudo-random number generator.
* @state: pointer to state structure holding seeded state.
@@ -69,25 +59,6 @@ u32 prandom_u32_state(struct rnd_state *
EXPORT_SYMBOL(prandom_u32_state);

/**
- * prandom_u32 - pseudo random number generator
- *
- * A 32 bit pseudo-random number is generated using a fast
- * algorithm suitable for simulation. This algorithm is NOT
- * considered safe for cryptographic use.
- */
-u32 prandom_u32(void)
-{
- struct rnd_state *state = &get_cpu_var(net_rand_state);
- u32 res;
-
- res = prandom_u32_state(state);
- put_cpu_var(state);
-
- return res;
-}
-EXPORT_SYMBOL(prandom_u32);
-
-/**
* prandom_bytes_state - get the requested number of pseudo-random bytes
*
* @state: pointer to state structure holding seeded state.
@@ -118,20 +89,6 @@ void prandom_bytes_state(struct rnd_stat
}
EXPORT_SYMBOL(prandom_bytes_state);

-/**
- * prandom_bytes - get the requested number of pseudo-random bytes
- * @buf: where to copy the pseudo-random bytes to
- * @bytes: the requested number of bytes
- */
-void prandom_bytes(void *buf, size_t bytes)
-{
- struct rnd_state *state = &get_cpu_var(net_rand_state);
-
- prandom_bytes_state(state, buf, bytes);
- put_cpu_var(state);
-}
-EXPORT_SYMBOL(prandom_bytes);
-
static void prandom_warmup(struct rnd_state *state)
{
/* Calling RNG ten times to satisfy recurrence condition */
@@ -147,97 +104,6 @@ static void prandom_warmup(struct rnd_st
prandom_u32_state(state);
}

-static u32 __extract_hwseed(void)
-{
- unsigned int val = 0;
-
- (void)(arch_get_random_seed_int(&val) ||
- arch_get_random_int(&val));
-
- return val;
-}
-
-static void prandom_seed_early(struct rnd_state *state, u32 seed,
- bool mix_with_hwseed)
-{
-#define LCG(x) ((x) * 69069U) /* super-duper LCG */
-#define HWSEED() (mix_with_hwseed ? __extract_hwseed() : 0)
- state->s1 = __seed(HWSEED() ^ LCG(seed), 2U);
- state->s2 = __seed(HWSEED() ^ LCG(state->s1), 8U);
- state->s3 = __seed(HWSEED() ^ LCG(state->s2), 16U);
- state->s4 = __seed(HWSEED() ^ LCG(state->s3), 128U);
-}
-
-/**
- * prandom_seed - add entropy to pseudo random number generator
- * @seed: seed value
- *
- * Add some additional seeding to the prandom pool.
- */
-void prandom_seed(u32 entropy)
-{
- int i;
- /*
- * No locking on the CPUs, but then somewhat random results are, well,
- * expected.
- */
- for_each_possible_cpu(i) {
- struct rnd_state *state = &per_cpu(net_rand_state, i);
-
- state->s1 = __seed(state->s1 ^ entropy, 2U);
- prandom_warmup(state);
- }
-}
-EXPORT_SYMBOL(prandom_seed);
-
-/*
- * Generate some initially weak seeding values to allow
- * to start the prandom_u32() engine.
- */
-static int __init prandom_init(void)
-{
- int i;
-
- prandom_state_selftest();
-
- for_each_possible_cpu(i) {
- struct rnd_state *state = &per_cpu(net_rand_state, i);
- u32 weak_seed = (i + jiffies) ^ random_get_entropy();
-
- prandom_seed_early(state, weak_seed, true);
- prandom_warmup(state);
- }
-
- return 0;
-}
-core_initcall(prandom_init);
-
-static void __prandom_timer(unsigned long dontcare);
-
-static DEFINE_TIMER(seed_timer, __prandom_timer, 0, 0);
-
-static void __prandom_timer(unsigned long dontcare)
-{
- u32 entropy;
- unsigned long expires;
-
- get_random_bytes(&entropy, sizeof(entropy));
- prandom_seed(entropy);
-
- /* reseed every ~60 seconds, in [40 .. 80) interval with slack */
- expires = 40 + prandom_u32_max(40);
- seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC);
-
- add_timer(&seed_timer);
-}
-
-static void __init __prandom_start_seed_timer(void)
-{
- set_timer_slack(&seed_timer, HZ);
- seed_timer.expires = jiffies + msecs_to_jiffies(40 * MSEC_PER_SEC);
- add_timer(&seed_timer);
-}
-
void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
{
int i;
@@ -256,51 +122,6 @@ void prandom_seed_full_state(struct rnd_
}
}

-/*
- * Generate better values after random number generator
- * is fully initialized.
- */
-static void __prandom_reseed(bool late)
-{
- unsigned long flags;
- static bool latch = false;
- static DEFINE_SPINLOCK(lock);
-
- /* Asking for random bytes might result in bytes getting
- * moved into the nonblocking pool and thus marking it
- * as initialized. In this case we would double back into
- * this function and attempt to do a late reseed.
- * Ignore the pointless attempt to reseed again if we're
- * already waiting for bytes when the nonblocking pool
- * got initialized.
- */
-
- /* only allow initial seeding (late == false) once */
- if (!spin_trylock_irqsave(&lock, flags))
- return;
-
- if (latch && !late)
- goto out;
-
- latch = true;
- prandom_seed_full_state(&net_rand_state);
-out:
- spin_unlock_irqrestore(&lock, flags);
-}
-
-void prandom_reseed_late(void)
-{
- __prandom_reseed(true);
-}
-
-static int __init prandom_reseed(void)
-{
- __prandom_reseed(false);
- __prandom_start_seed_timer();
- return 0;
-}
-late_initcall(prandom_reseed);
-
#ifdef CONFIG_RANDOM32_SELFTEST
static struct prandom_test1 {
u32 seed;
@@ -420,7 +241,28 @@ static struct prandom_test2 {
{ 407983964U, 921U, 728767059U },
};

-static void __init prandom_state_selftest(void)
+static u32 __extract_hwseed(void)
+{
+ unsigned int val = 0;
+
+ (void)(arch_get_random_seed_int(&val) ||
+ arch_get_random_int(&val));
+
+ return val;
+}
+
+static void prandom_seed_early(struct rnd_state *state, u32 seed,
+ bool mix_with_hwseed)
+{
+#define LCG(x) ((x) * 69069U) /* super-duper LCG */
+#define HWSEED() (mix_with_hwseed ? __extract_hwseed() : 0)
+ state->s1 = __seed(HWSEED() ^ LCG(seed), 2U);
+ state->s2 = __seed(HWSEED() ^ LCG(state->s1), 8U);
+ state->s3 = __seed(HWSEED() ^ LCG(state->s2), 16U);
+ state->s4 = __seed(HWSEED() ^ LCG(state->s3), 128U);
+}
+
+static int __init prandom_state_selftest(void)
{
int i, j, errors = 0, runs = 0;
bool error = false;
@@ -460,5 +302,266 @@ static void __init prandom_state_selftes
pr_warn("prandom: %d/%d self tests failed\n", errors, runs);
else
pr_info("prandom: %d self tests passed\n", runs);
+ return 0;
+}
+core_initcall(prandom_state_selftest);
+#endif
+
+/*
+ * The prandom_u32() implementation is now completely separate from the
+ * prandom_state() functions, which are retained (for now) for compatibility.
+ *
+ * Because of (ab)use in the networking code for choosing random TCP/UDP port
+ * numbers, which open DoS possibilities if guessable, we want something
+ * stronger than a standard PRNG. But the performance requirements of
+ * the network code do not allow robust crypto for this application.
+ *
+ * So this is a homebrew Junior Spaceman implementation, based on the
+ * lowest-latency trustworthy crypto primitive available, SipHash.
+ * (The authors of SipHash have not been consulted about this abuse of
+ * their work.)
+ *
+ * Standard SipHash-2-4 uses 2n+4 rounds to hash n words of input to
+ * one word of output. This abbreviated version uses 2 rounds per word
+ * of output.
+ */
+
+struct siprand_state {
+ unsigned long v0;
+ unsigned long v1;
+ unsigned long v2;
+ unsigned long v3;
+};
+
+static DEFINE_PER_CPU(struct siprand_state, net_rand_state);
+
+/*
+ * This is the core CPRNG function. As "pseudorandom", this is not used
+ * for truly valuable things, just intended to be a PITA to guess.
+ * For maximum speed, we do just two SipHash rounds per word. This is
+ * the same rate as 4 rounds per 64 bits that SipHash normally uses,
+ * so hopefully it's reasonably secure.
+ *
+ * There are two changes from the official SipHash finalization:
+ * - We omit some constants XORed with v2 in the SipHash spec as irrelevant;
+ * they are there only to make the output rounds distinct from the input
+ * rounds, and this application has no input rounds.
+ * - Rather than returning v0^v1^v2^v3, return v1+v3.
+ * If you look at the SipHash round, the last operation on v3 is
+ * "v3 ^= v0", so "v0 ^ v3" just undoes that, a waste of time.
+ * Likewise "v1 ^= v2". (The rotate of v2 makes a difference, but
+ * it still cancels out half of the bits in v2 for no benefit.)
+ * Second, since the last combining operation was xor, continue the
+ * pattern of alternating xor/add for a tiny bit of extra non-linearity.
+ */
+static inline u32 siprand_u32(struct siprand_state *s)
+{
+ unsigned long v0 = s->v0, v1 = s->v1, v2 = s->v2, v3 = s->v3;
+
+ PRND_SIPROUND(v0, v1, v2, v3);
+ PRND_SIPROUND(v0, v1, v2, v3);
+ s->v0 = v0; s->v1 = v1; s->v2 = v2; s->v3 = v3;
+ return v1 + v3;
+}
+
+
+/**
+ * prandom_u32 - pseudo random number generator
+ *
+ * A 32 bit pseudo-random number is generated using a fast
+ * algorithm suitable for simulation. This algorithm is NOT
+ * considered safe for cryptographic use.
+ */
+u32 prandom_u32(void)
+{
+ struct siprand_state *state = get_cpu_ptr(&net_rand_state);
+ u32 res = siprand_u32(state);
+
+ put_cpu_ptr(&net_rand_state);
+ return res;
+}
+EXPORT_SYMBOL(prandom_u32);
+
+/**
+ * prandom_bytes - get the requested number of pseudo-random bytes
+ * @buf: where to copy the pseudo-random bytes to
+ * @bytes: the requested number of bytes
+ */
+void prandom_bytes(void *buf, size_t bytes)
+{
+ struct siprand_state *state = get_cpu_ptr(&net_rand_state);
+ u8 *ptr = buf;
+
+ while (bytes >= sizeof(u32)) {
+ put_unaligned(siprand_u32(state), (u32 *)ptr);
+ ptr += sizeof(u32);
+ bytes -= sizeof(u32);
+ }
+
+ if (bytes > 0) {
+ u32 rem = siprand_u32(state);
+
+ do {
+ *ptr++ = (u8)rem;
+ rem >>= BITS_PER_BYTE;
+ } while (--bytes > 0);
+ }
+ put_cpu_ptr(&net_rand_state);
}
+EXPORT_SYMBOL(prandom_bytes);
+
+/**
+ * prandom_seed - add entropy to pseudo random number generator
+ * @entropy: entropy value
+ *
+ * Add some additional seed material to the prandom pool.
+ * The "entropy" is actually our IP address (the only caller is
+ * the network code), not for unpredictability, but to ensure that
+ * different machines are initialized differently.
+ */
+void prandom_seed(u32 entropy)
+{
+ int i;
+
+ add_device_randomness(&entropy, sizeof(entropy));
+
+ for_each_possible_cpu(i) {
+ struct siprand_state *state = per_cpu_ptr(&net_rand_state, i);
+ unsigned long v0 = state->v0, v1 = state->v1;
+ unsigned long v2 = state->v2, v3 = state->v3;
+
+ do {
+ v3 ^= entropy;
+ PRND_SIPROUND(v0, v1, v2, v3);
+ PRND_SIPROUND(v0, v1, v2, v3);
+ v0 ^= entropy;
+ } while (unlikely(!v0 || !v1 || !v2 || !v3));
+
+ WRITE_ONCE(state->v0, v0);
+ WRITE_ONCE(state->v1, v1);
+ WRITE_ONCE(state->v2, v2);
+ WRITE_ONCE(state->v3, v3);
+ }
+}
+EXPORT_SYMBOL(prandom_seed);
+
+/*
+ * Generate some initially weak seeding values to allow
+ * the prandom_u32() engine to be started.
+ */
+static int __init prandom_init_early(void)
+{
+ int i;
+ unsigned long v0, v1, v2, v3;
+
+ if (!arch_get_random_long(&v0))
+ v0 = jiffies;
+ if (!arch_get_random_long(&v1))
+ v1 = random_get_entropy();
+ v2 = v0 ^ PRND_K0;
+ v3 = v1 ^ PRND_K1;
+
+ for_each_possible_cpu(i) {
+ struct siprand_state *state;
+
+ v3 ^= i;
+ PRND_SIPROUND(v0, v1, v2, v3);
+ PRND_SIPROUND(v0, v1, v2, v3);
+ v0 ^= i;
+
+ state = per_cpu_ptr(&net_rand_state, i);
+ state->v0 = v0; state->v1 = v1;
+ state->v2 = v2; state->v3 = v3;
+ }
+
+ return 0;
+}
+core_initcall(prandom_init_early);
+
+
+/* Stronger reseeding when available, and periodically thereafter. */
+static void prandom_reseed(unsigned long dontcare);
+
+static DEFINE_TIMER(seed_timer, prandom_reseed, 0, 0);
+
+static void prandom_reseed(unsigned long dontcare)
+{
+ unsigned long expires;
+ int i;
+
+ /*
+ * Reinitialize each CPU's PRNG with 128 bits of key.
+ * No locking on the CPUs, but then somewhat random results are,
+ * well, expected.
+ */
+ for_each_possible_cpu(i) {
+ struct siprand_state *state;
+ unsigned long v0 = get_random_long(), v2 = v0 ^ PRND_K0;
+ unsigned long v1 = get_random_long(), v3 = v1 ^ PRND_K1;
+#if BITS_PER_LONG == 32
+ int j;
+
+ /*
+ * On 32-bit machines, hash in two extra words to
+ * approximate 128-bit key length. Not that the hash
+ * has that much security, but this prevents a trivial
+ * 64-bit brute force.
+ */
+ for (j = 0; j < 2; j++) {
+ unsigned long m = get_random_long();
+
+ v3 ^= m;
+ PRND_SIPROUND(v0, v1, v2, v3);
+ PRND_SIPROUND(v0, v1, v2, v3);
+ v0 ^= m;
+ }
#endif
+ /*
+ * Probably impossible in practice, but there is a
+ * theoretical risk that a race between this reseeding
+ * and the target CPU writing its state back could
+ * create the all-zero SipHash fixed point.
+ *
+ * To ensure that never happens, ensure the state
+ * we write contains no zero words.
+ */
+ state = per_cpu_ptr(&net_rand_state, i);
+ WRITE_ONCE(state->v0, v0 ? v0 : -1ul);
+ WRITE_ONCE(state->v1, v1 ? v1 : -1ul);
+ WRITE_ONCE(state->v2, v2 ? v2 : -1ul);
+ WRITE_ONCE(state->v3, v3 ? v3 : -1ul);
+ }
+
+ /* reseed every ~60 seconds, in [40 .. 80) interval with slack */
+ expires = round_jiffies(jiffies + 40 * HZ + prandom_u32_max(40 * HZ));
+ mod_timer(&seed_timer, expires);
+}
+
+/*
+ * The random ready callback can be called from almost any interrupt.
+ * To avoid worrying about whether it's safe to delay that interrupt
+ * long enough to seed all CPUs, just schedule an immediate timer event.
+ */
+static void prandom_timer_start(struct random_ready_callback *unused)
+{
+ mod_timer(&seed_timer, jiffies);
+}
+
+/*
+ * Start periodic full reseeding as soon as strong
+ * random numbers are available.
+ */
+static int __init prandom_init_late(void)
+{
+ static struct random_ready_callback random_ready = {
+ .func = prandom_timer_start
+ };
+ int ret = add_random_ready_callback(&random_ready);
+
+ if (ret == -EALREADY) {
+ prandom_timer_start(&random_ready);
+ ret = 0;
+ }
+ return ret;
+}
+late_initcall(prandom_init_late);

2020-11-17 14:11:40

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 41/64] swiotlb: fix "x86: Dont panic if can not alloc buffer for swiotlb"

From: Stefano Stabellini <[email protected]>

commit e9696d259d0fb5d239e8c28ca41089838ea76d13 upstream.

kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling

memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);

If the allocation must fail, no_iotlb_memory is set.

Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.

When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.

Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.

Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.

Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")

Reported-by: Elliott Mitchell <[email protected]>
Tested-by: Elliott Mitchell <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
lib/swiotlb.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -195,6 +195,7 @@ int __init swiotlb_init_with_tbl(char *t
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+ no_iotlb_memory = false;

if (verbose)
swiotlb_print_info();
@@ -225,9 +226,11 @@ swiotlb_init(int verbose)
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;

- if (io_tlb_start)
+ if (io_tlb_start) {
memblock_free_early(io_tlb_start,
PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ io_tlb_start = 0;
+ }
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
}
@@ -326,6 +329,7 @@ swiotlb_late_init_with_tbl(char *tlb, un
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+ no_iotlb_memory = false;

swiotlb_print_info();

2020-11-17 14:17:34

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 47/64] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

From: Anand K Mistry <[email protected]>

commit 1978b3a53a74e3230cd46932b149c6e62e832e9a upstream.

On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and

spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

At the same time, IBPB can be set to conditional.

However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the

spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.

More generally, the following cases are possible:

1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

[ bp: Massage commit message and comment; space out statements for
better readability. ]

Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/bugs.c | 52 ++++++++++++++++++++++++++++-----------------
1 file changed, 33 insertions(+), 19 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1223,6 +1223,14 @@ static int ssb_prctl_set(struct task_str
return 0;
}

+static bool is_spec_ib_user_controlled(void)
+{
+ return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL ||
+ spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP ||
+ spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL ||
+ spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP;
+}
+
static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
{
switch (ctrl) {
@@ -1230,17 +1238,26 @@ static int ib_prctl_set(struct task_stru
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return 0;
- /*
- * Indirect branch speculation is always disabled in strict
- * mode. It can neither be enabled if it was force-disabled
- * by a previous prctl call.

+ /*
+ * With strict mode for both IBPB and STIBP, the instruction
+ * code paths avoid checking this task flag and instead,
+ * unconditionally run the instruction. However, STIBP and IBPB
+ * are independent and either can be set to conditionally
+ * enabled regardless of the mode of the other.
+ *
+ * If either is set to conditional, allow the task flag to be
+ * updated, unless it was force-disabled by a previous prctl
+ * call. Currently, this is possible on an AMD CPU which has the
+ * feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the
+ * kernel is booted with 'spectre_v2_user=seccomp', then
+ * spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and
+ * spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED.
*/
- if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ||
+ if (!is_spec_ib_user_controlled() ||
task_spec_ib_force_disable(task))
return -EPERM;
+
task_clear_spec_ib_disable(task);
task_update_spec_tif(task);
break;
@@ -1253,10 +1270,10 @@ static int ib_prctl_set(struct task_stru
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return -EPERM;
- if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
+
+ if (!is_spec_ib_user_controlled())
return 0;
+
task_set_spec_ib_disable(task);
if (ctrl == PR_SPEC_FORCE_DISABLE)
task_set_spec_ib_force_disable(task);
@@ -1319,20 +1336,17 @@ static int ib_prctl_get(struct task_stru
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return PR_SPEC_ENABLE;
- else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
- return PR_SPEC_DISABLE;
- else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL ||
- spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) {
+ else if (is_spec_ib_user_controlled()) {
if (task_spec_ib_force_disable(task))
return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE;
if (task_spec_ib_disable(task))
return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
- } else
+ } else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
+ spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
+ spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
+ return PR_SPEC_DISABLE;
+ else
return PR_SPEC_NOT_AFFECTED;
}

2020-11-17 16:12:14

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 4.4 38/64] drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]

From: Thomas Zimmermann <[email protected]>

commit 06ad8d339524bf94b89859047822c31df6ace239 upstream.

The gma500 driver expects 3 pipelines in several it's IRQ functions.
Accessing struct drm_device.vblank[], this fails with devices that only
have 2 pipelines. An example KASAN report is shown below.

[ 62.267688] ==================================================================
[ 62.268856] BUG: KASAN: slab-out-of-bounds in psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.269450] Read of size 1 at addr ffff8880012bc6d0 by task systemd-udevd/285
[ 62.269949]
[ 62.270192] CPU: 0 PID: 285 Comm: systemd-udevd Tainted: G E 5.10.0-rc1-1-default+ #572
[ 62.270807] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012
[ 62.271366] Call Trace:
[ 62.271705] dump_stack+0xae/0xe5
[ 62.272180] print_address_description.constprop.0+0x17/0xf0
[ 62.272987] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.273474] __kasan_report.cold+0x20/0x38
[ 62.273989] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.274460] kasan_report+0x3a/0x50
[ 62.274891] psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.275380] drm_irq_install+0x131/0x1f0
<...>
[ 62.300751] Allocated by task 285:
[ 62.301223] kasan_save_stack+0x1b/0x40
[ 62.301731] __kasan_kmalloc.constprop.0+0xbf/0xd0
[ 62.302293] drmm_kmalloc+0x55/0x100
[ 62.302773] drm_vblank_init+0x77/0x210

Resolve the issue by only handling vblank entries up to the number of
CRTCs.

I'm adding a Fixes tag for reference, although the bug has been present
since the driver's initial commit.

Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Fixes: 5c49fd3aa0ab ("gma500: Add the core DRM files and headers")
Cc: Alan Cox <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Patrik Jakobsson <[email protected]>
Cc: [email protected]
Cc: [email protected]#v3.3+
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/gma500/psb_irq.c | 34 ++++++++++++----------------------
1 file changed, 12 insertions(+), 22 deletions(-)

--- a/drivers/gpu/drm/gma500/psb_irq.c
+++ b/drivers/gpu/drm/gma500/psb_irq.c
@@ -350,6 +350,7 @@ int psb_irq_postinstall(struct drm_devic
{
struct drm_psb_private *dev_priv = dev->dev_private;
unsigned long irqflags;
+ unsigned int i;

spin_lock_irqsave(&dev_priv->irqmask_lock, irqflags);

@@ -362,20 +363,12 @@ int psb_irq_postinstall(struct drm_devic
PSB_WVDC32(dev_priv->vdc_irq_mask, PSB_INT_ENABLE_R);
PSB_WVDC32(0xFFFFFFFF, PSB_HWSTAM);

- if (dev->vblank[0].enabled)
- psb_enable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE);
- else
- psb_disable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE);
-
- if (dev->vblank[1].enabled)
- psb_enable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE);
- else
- psb_disable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE);
-
- if (dev->vblank[2].enabled)
- psb_enable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE);
- else
- psb_disable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE);
+ for (i = 0; i < dev->num_crtcs; ++i) {
+ if (dev->vblank[i].enabled)
+ psb_enable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE);
+ else
+ psb_disable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE);
+ }

if (dev_priv->ops->hotplug_enable)
dev_priv->ops->hotplug_enable(dev, true);
@@ -388,6 +381,7 @@ void psb_irq_uninstall(struct drm_device
{
struct drm_psb_private *dev_priv = dev->dev_private;
unsigned long irqflags;
+ unsigned int i;

spin_lock_irqsave(&dev_priv->irqmask_lock, irqflags);

@@ -396,14 +390,10 @@ void psb_irq_uninstall(struct drm_device

PSB_WVDC32(0xFFFFFFFF, PSB_HWSTAM);

- if (dev->vblank[0].enabled)
- psb_disable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE);
-
- if (dev->vblank[1].enabled)
- psb_disable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE);
-
- if (dev->vblank[2].enabled)
- psb_disable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE);
+ for (i = 0; i < dev->num_crtcs; ++i) {
+ if (dev->vblank[i].enabled)
+ psb_disable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE);
+ }

dev_priv->vdc_irq_mask &= _PSB_IRQ_SGX_FLAG |
_PSB_IRQ_MSVDX_FLAG |

2020-11-17 18:29:34

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH 4.4 00/64] 4.4.244-rc1 review

Hi!

> This is the start of the stable review cycle for the 4.4.244 release.
> There are 64 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

CIP testing did not find any problems here:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.4.y

Tested-by: Pavel Machek (CIP) <[email protected]>

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

Attachments:

(No filename) (603.00 B)
signature.asc (201.00 B)
Download all attachments

2020-11-18 11:30:00

by Naresh Kamboju

[permalink] [raw]

Subject: Re: [PATCH 4.4 00/64] 4.4.244-rc1 review

On Tue, 17 Nov 2020 at 18:37, Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 4.4.244 release.
> There are 64 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.244-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

Summary
------------------------------------------------------------------------

kernel: 4.4.244-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 5c64a4febafe0af1834cf497df8985d917a94b05
git describe: v4.4.243-65-g5c64a4febafe
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.4.y/build/v4.4.243-65-g5c64a4febafe

No regressions (compared to build v4.4.243)

No fixes (compared to build v4.4.243)

Ran 32841 total tests in the following environments and test suites.

Environments
--------------
- i386
- juno-r2 - arm64
- juno-r2-compat
- juno-r2-kasan
- qemu-arm64-clang
- qemu-arm64-kasan
- qemu-x86_64-clang
- qemu-x86_64-kasan
- qemu_arm
- qemu_arm64
- qemu_arm64-compat
- qemu_i386
- qemu_x86_64
- qemu_x86_64-compat
- x15 - arm
- x86_64
- x86-kasan

Test Suites
-----------
* build
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* v4l2-compliance
* kvm-unit-tests
* install-android-platform-tools-r2600

Summary
------------------------------------------------------------------------

kernel: 4.4.244-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.244-rc1-hikey-20201117-859
git commit: cdeaa8f4dfeb6e5920b76b6ee2d57b77b2810576
git describe: 4.4.244-rc1-hikey-20201117-859
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.244-rc1-hikey-20201117-859

No regressions (compared to build 4.4.244-rc1-hikey-20201114-857)

No fixes (compared to build 4.4.244-rc1-hikey-20201114-857)

Ran 1760 total tests in the following environments and test suites.

Environments
--------------
- hi6220-hikey - arm64

Test Suites
-----------
* build
* install-android-platform-tools-r2600
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

2020-11-18 15:24:24

by Guenter Roeck

[permalink] [raw]

Subject: Re: [PATCH 4.4 00/64] 4.4.244-rc1 review

On Tue, Nov 17, 2020 at 02:04:23PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.244 release.
> There are 64 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 165 pass: 165 fail: 0
Qemu test results:
total: 328 pass: 328 fail: 0

Reviewed-by: Guenter Roeck <[email protected]>

Guenter