2021-12-06 14:59:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 00/52] 4.4.294-rc1 review

This is the start of the stable review cycle for the 4.4.294 release.
There are 52 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.294-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.294-rc1

Pierre Gondois <[email protected]>
serial: pl011: Add ACPI SBSA UART match id

Sven Eckelmann <[email protected]>
tty: serial: msm_serial: Deactivate RX DMA for polling support

Maciej W. Rozycki <[email protected]>
vgacon: Propagate console boot parameters before calling `vc_resize'

Helge Deller <[email protected]>
parisc: Fix "make install" on newer debian releases

Arnd Bergmann <[email protected]>
siphash: use _unaligned version by default

Zhou Qingyang <[email protected]>
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()

Randy Dunlap <[email protected]>
natsemi: xtensa: fix section mismatch warnings

Linus Torvalds <[email protected]>
fget: check that the fd still exists after getting a ref to it

Jens Axboe <[email protected]>
fs: add fget_many() and fput_many()

Baokun Li <[email protected]>
sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl

Baokun Li <[email protected]>
sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl

Masami Hiramatsu <[email protected]>
kprobes: Limit max data_size of the kretprobe instances

Teng Qi <[email protected]>
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()

zhangyue <[email protected]>
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound

Mike Christie <[email protected]>
scsi: iscsi: Unblock session then wake up error handler

Vasily Gorbik <[email protected]>
s390/setup: avoid using memblock_enforce_memory_limit

Slark Xiao <[email protected]>
platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep

liuguoqiang <[email protected]>
net: return correct error code

Mike Kravetz <[email protected]>
hugetlb: take PMD sharing into account when flushing tlb/caches

Juergen Gross <[email protected]>
tty: hvc: replace BUG_ON() with negative return value

Juergen Gross <[email protected]>
xen/netfront: don't trust the backend response data blindly

Juergen Gross <[email protected]>
xen/netfront: disentangle tx_skb_freelist

Juergen Gross <[email protected]>
xen/netfront: don't read data from request on the ring page

Juergen Gross <[email protected]>
xen/netfront: read response from backend only once

Juergen Gross <[email protected]>
xen/blkfront: don't trust the backend response data blindly

Juergen Gross <[email protected]>
xen/blkfront: don't take local copy of a request from the ring page

Juergen Gross <[email protected]>
xen/blkfront: read response from backend only once

Juergen Gross <[email protected]>
xen: sync include/xen/interface/io/ring.h with Xen's newest version

Alexander Mikhalitsyn <[email protected]>
shm: extend forced shm destroy to support objects from several IPC nses

Miklos Szeredi <[email protected]>
fuse: release pipe buf after last use

Miklos Szeredi <[email protected]>
fuse: fix page stealing

Lin Ma <[email protected]>
NFC: add NCI_UNREG flag to eliminate the race

David Hildenbrand <[email protected]>
proc/vmcore: fix clearing user buffer by properly using clear_user()

Nadav Amit <[email protected]>
hugetlbfs: flush TLBs correctly after huge_pmd_unshare

Steven Rostedt (VMware) <[email protected]>
tracing: Check pid filtering when creating events

Eric Dumazet <[email protected]>
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows

Sreekanth Reddy <[email protected]>
scsi: mpt3sas: Fix kernel panic during drive powercycle test

Takashi Iwai <[email protected]>
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE

Trond Myklebust <[email protected]>
NFSv42: Don't fail clone() unless the OP_CLONE operation failed

Alexander Aring <[email protected]>
net: ieee802154: handle iftypes as u32

Takashi Iwai <[email protected]>
ASoC: topology: Add missing rwsem around snd_ctl_remove() calls

Florian Fainelli <[email protected]>
ARM: dts: BCM5301X: Add interrupt properties to GPIO node

Stefano Stabellini <[email protected]>
xen: detect uninitialized xenbus in xenbus_init

Stefano Stabellini <[email protected]>
xen: don't continue xenstore initialization in case of errors

Dan Carpenter <[email protected]>
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()

Takashi Iwai <[email protected]>
ALSA: ctxfi: Fix out-of-range access

Todd Kjos <[email protected]>
binder: fix test regression due to sender_euid change

Mathias Nyman <[email protected]>
usb: hub: Fix locking issues with address0_mutex

Mathias Nyman <[email protected]>
usb: hub: Fix usb enumeration issue due to address0 race

Mingjie Zhang <[email protected]>
USB: serial: option: add Fibocom FM101-GL variants

Daniele Palmas <[email protected]>
USB: serial: option: add Telit LE910S1 0x9200 composition

Lee Jones <[email protected]>
staging: ion: Prevent incorrect reference counting behavour


-------------

Diffstat:

Makefile | 4 +-
arch/arm/boot/dts/bcm5301x.dtsi | 2 +
arch/arm/include/asm/tlb.h | 8 +
arch/arm/mach-socfpga/core.h | 2 +-
arch/arm/mach-socfpga/platsmp.c | 8 +-
arch/ia64/include/asm/tlb.h | 10 +
arch/parisc/install.sh | 1 +
arch/s390/include/asm/tlb.h | 13 ++
arch/s390/kernel/setup.c | 3 -
arch/sh/include/asm/tlb.h | 10 +
arch/um/include/asm/tlb.h | 12 +
drivers/android/binder.c | 2 +-
drivers/ata/sata_fsl.c | 20 +-
drivers/block/xen-blkfront.c | 126 +++++++---
drivers/net/ethernet/dec/tulip/de4x5.c | 34 +--
drivers/net/ethernet/natsemi/xtsonic.c | 2 +-
.../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 10 +-
drivers/net/xen-netfront.c | 257 +++++++++++++--------
drivers/platform/x86/thinkpad_acpi.c | 12 -
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
drivers/scsi/scsi_transport_iscsi.c | 6 +-
drivers/staging/android/ion/ion.c | 6 +
drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 3 +-
drivers/tty/hvc/hvc_xen.c | 17 +-
drivers/tty/serial/amba-pl011.c | 1 +
drivers/tty/serial/msm_serial.c | 3 +
drivers/usb/core/hub.c | 23 +-
drivers/usb/serial/option.c | 5 +
drivers/video/console/vgacon.c | 14 +-
drivers/xen/xenbus/xenbus_probe.c | 27 ++-
fs/file.c | 19 +-
fs/file_table.c | 9 +-
fs/fuse/dev.c | 10 +-
fs/nfs/nfs42xdr.c | 3 +-
fs/proc/vmcore.c | 15 +-
include/asm-generic/tlb.h | 7 +
include/linux/file.h | 2 +
include/linux/fs.h | 4 +-
include/linux/ipc_namespace.h | 15 ++
include/linux/kprobes.h | 2 +
include/linux/sched.h | 2 +-
include/linux/shm.h | 13 +-
include/linux/siphash.h | 14 +-
include/net/nfc/nci_core.h | 1 +
include/net/nl802154.h | 7 +-
include/xen/interface/io/ring.h | 257 ++++++++++-----------
ipc/shm.c | 176 ++++++++++----
kernel/kprobes.c | 3 +
kernel/trace/trace_events.c | 7 +
lib/siphash.c | 12 +-
mm/hugetlb.c | 58 ++++-
net/ipv4/devinet.c | 2 +-
net/ipv4/tcp_cubic.c | 5 +-
net/nfc/nci/core.c | 19 +-
sound/pci/ctxfi/ctamixer.c | 14 +-
sound/pci/ctxfi/ctdaio.c | 16 +-
sound/pci/ctxfi/ctresource.c | 7 +-
sound/pci/ctxfi/ctresource.h | 4 +-
sound/pci/ctxfi/ctsrc.c | 7 +-
sound/soc/soc-topology.c | 3 +
60 files changed, 894 insertions(+), 462 deletions(-)




2021-12-06 14:59:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 12/52] ASoC: topology: Add missing rwsem around snd_ctl_remove() calls

From: Takashi Iwai <[email protected]>

[ Upstream commit 7e567b5ae06315ef2d70666b149962e2bb4b97af ]

snd_ctl_remove() has to be called with card->controls_rwsem held (when
called after the card instantiation). This patch add the missing
rwsem calls around it.

Fixes: 8a9782346dcc ("ASoC: topology: Add topology core")
Signed-off-by: Takashi Iwai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
sound/soc/soc-topology.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/sound/soc/soc-topology.c b/sound/soc/soc-topology.c
index 0675ab3fec6c7..ff12f2aa4e51d 100644
--- a/sound/soc/soc-topology.c
+++ b/sound/soc/soc-topology.c
@@ -1831,6 +1831,7 @@ EXPORT_SYMBOL_GPL(snd_soc_tplg_widget_remove_all);
/* remove dynamic controls from the component driver */
int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index)
{
+ struct snd_card *card = comp->card->snd_card;
struct snd_soc_dobj *dobj, *next_dobj;
int pass = SOC_TPLG_PASS_END;

@@ -1838,6 +1839,7 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index)
while (pass >= SOC_TPLG_PASS_START) {

/* remove mixer controls */
+ down_write(&card->controls_rwsem);
list_for_each_entry_safe(dobj, next_dobj, &comp->dobj_list,
list) {

@@ -1870,6 +1872,7 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index)
break;
}
}
+ up_write(&card->controls_rwsem);
pass--;
}

--
2.33.0




2021-12-06 14:59:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 21/52] NFC: add NCI_UNREG flag to eliminate the race

From: Lin Ma <[email protected]>

commit 48b71a9e66c2eab60564b1b1c85f4928ed04e406 upstream.

There are two sites that calls queue_work() after the
destroy_workqueue() and lead to possible UAF.

The first site is nci_send_cmd(), which can happen after the
nci_close_device as below

nfcmrvl_nci_unregister_dev | nfc_genl_dev_up
nci_close_device |
flush_workqueue |
del_timer_sync |
nci_unregister_device | nfc_get_device
destroy_workqueue | nfc_dev_up
nfc_unregister_device | nci_dev_up
device_del | nci_open_device
| __nci_request
| nci_send_cmd
| queue_work !!!

Another site is nci_cmd_timer, awaked by the nci_cmd_work from the
nci_send_cmd.

... | ...
nci_unregister_device | queue_work
destroy_workqueue |
nfc_unregister_device | ...
device_del | nci_cmd_work
| mod_timer
| ...
| nci_cmd_timer
| queue_work !!!

For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/nfc/nci_core.h | 1 +
net/nfc/nci/core.c | 19 +++++++++++++++++--
2 files changed, 18 insertions(+), 2 deletions(-)

--- a/include/net/nfc/nci_core.h
+++ b/include/net/nfc/nci_core.h
@@ -42,6 +42,7 @@ enum nci_flag {
NCI_UP,
NCI_DATA_EXCHANGE,
NCI_DATA_EXCHANGE_TO,
+ NCI_UNREG,
};

/* NCI device states */
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -401,6 +401,11 @@ static int nci_open_device(struct nci_de

mutex_lock(&ndev->req_lock);

+ if (test_bit(NCI_UNREG, &ndev->flags)) {
+ rc = -ENODEV;
+ goto done;
+ }
+
if (test_bit(NCI_UP, &ndev->flags)) {
rc = -EALREADY;
goto done;
@@ -464,6 +469,10 @@ done:
static int nci_close_device(struct nci_dev *ndev)
{
nci_req_cancel(ndev, ENODEV);
+
+ /* This mutex needs to be held as a barrier for
+ * caller nci_unregister_device
+ */
mutex_lock(&ndev->req_lock);

if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
@@ -501,8 +510,8 @@ static int nci_close_device(struct nci_d
/* Flush cmd wq */
flush_workqueue(ndev->cmd_wq);

- /* Clear flags */
- ndev->flags = 0;
+ /* Clear flags except NCI_UNREG */
+ ndev->flags &= BIT(NCI_UNREG);

mutex_unlock(&ndev->req_lock);

@@ -1182,6 +1191,12 @@ void nci_unregister_device(struct nci_de
{
struct nci_conn_info *conn_info, *n;

+ /* This set_bit is not protected with specialized barrier,
+ * However, it is fine because the mutex_lock(&ndev->req_lock);
+ * in nci_close_device() will help to emit one.
+ */
+ set_bit(NCI_UNREG, &ndev->flags);
+
nci_close_device(ndev);

destroy_workqueue(ndev->cmd_wq);



2021-12-06 14:59:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 22/52] fuse: fix page stealing

From: Miklos Szeredi <[email protected]>

commit 712a951025c0667ff00b25afc360f74e639dfabe upstream.

It is possible to trigger a crash by splicing anon pipe bufs to the fuse
device.

The reason for this is that anon_pipe_buf_release() will reuse buf->page if
the refcount is 1, but that page might have already been stolen and its
flags modified (e.g. PG_lru added).

This happens in the unlikely case of fuse_dev_splice_write() getting around
to calling pipe_buf_release() after a page has been stolen, added to the
page cache and removed from the page cache.

Fix by calling pipe_buf_release() right after the page was inserted into
the page cache. In this case the page has an elevated refcount so any
release function will know that the page isn't reusable.

Reported-by: Frank Dinoff <[email protected]>
Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1yw@mail.gmail.com/
Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
Cc: <[email protected]> # v2.6.35
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/fuse/dev.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -922,6 +922,13 @@ static int fuse_try_move_page(struct fus
return err;
}

+ /*
+ * Release while we have extra ref on stolen page. Otherwise
+ * anon_pipe_buf_release() might think the page can be reused.
+ */
+ buf->ops->release(cs->pipe, buf);
+ buf->ops = NULL;
+
page_cache_get(newpage);

if (!(buf->flags & PIPE_BUF_FLAG_LRU))
@@ -2090,7 +2097,8 @@ static ssize_t fuse_dev_splice_write(str
out_free:
for (idx = 0; idx < nbuf; idx++) {
struct pipe_buffer *buf = &bufs[idx];
- buf->ops->release(pipe, buf);
+ if (buf->ops)
+ buf->ops->release(pipe, buf);
}
pipe_unlock(pipe);




2021-12-06 14:59:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 23/52] fuse: release pipe buf after last use

From: Miklos Szeredi <[email protected]>

commit 473441720c8616dfaf4451f9c7ea14f0eb5e5d65 upstream.

Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.

This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.

Reported-by: Justin Forbes <[email protected]>
Fixes: 712a951025c0 ("fuse: fix page stealing")
Cc: <[email protected]> # v2.6.35
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/fuse/dev.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -922,6 +922,11 @@ static int fuse_try_move_page(struct fus
return err;
}

+ page_cache_get(newpage);
+
+ if (!(buf->flags & PIPE_BUF_FLAG_LRU))
+ lru_cache_add_file(newpage);
+
/*
* Release while we have extra ref on stolen page. Otherwise
* anon_pipe_buf_release() might think the page can be reused.
@@ -929,11 +934,6 @@ static int fuse_try_move_page(struct fus
buf->ops->release(cs->pipe, buf);
buf->ops = NULL;

- page_cache_get(newpage);
-
- if (!(buf->flags & PIPE_BUF_FLAG_LRU))
- lru_cache_add_file(newpage);
-
err = 0;
spin_lock(&cs->req->waitq.lock);
if (test_bit(FR_ABORTED, &cs->req->flags))



2021-12-06 14:59:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 24/52] shm: extend forced shm destroy to support objects from several IPC nses

From: Alexander Mikhalitsyn <[email protected]>

commit 85b6d24646e4125c591639841169baa98a2da503 upstream.

Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.

This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).

This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.

To achieve that we do several things:

1. add a namespace (non-refcounted) pointer to the struct shmid_kernel

2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns

3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).

Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".

Q/A

Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.

Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <[email protected]>
Signed-off-by: Manfred Spraul <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Pavel Tikhomirov <[email protected]>
Cc: Vasily Averin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/ipc_namespace.h | 15 +++
include/linux/sched.h | 2
include/linux/shm.h | 13 ++-
ipc/shm.c | 176 +++++++++++++++++++++++++++++++-----------
4 files changed, 159 insertions(+), 47 deletions(-)

--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -123,6 +123,16 @@ static inline struct ipc_namespace *get_
return ns;
}

+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (atomic_inc_not_zero(&ns->count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -138,6 +148,11 @@ static inline struct ipc_namespace *get_
{
return ns;
}
+
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}

static inline void put_ipc_ns(struct ipc_namespace *ns)
{
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2792,7 +2792,7 @@ static inline int thread_group_empty(str
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -19,9 +19,18 @@ struct shmid_kernel /* private to the ke
pid_t shm_lprid;
struct user_struct *mlock_user;

- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
};

/* shm_mode upper byte flags */
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -90,6 +90,7 @@ static void do_shm_rmid(struct ipc_names
{
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);

if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -180,10 +181,43 @@ static void shm_rcu_free(struct rcu_head
ipc_rcu_free(head);
}

-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
+{
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}


@@ -238,7 +272,7 @@ static void shm_destroy(struct ipc_names
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_user);
@@ -259,10 +293,10 @@ static void shm_destroy(struct ipc_names
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}

@@ -293,7 +327,7 @@ static void shm_close(struct vm_area_str
shp->shm_lprid = task_tgid_vnr(current);
shp->shm_dtim = get_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -314,10 +348,10 @@ static int shm_try_destroy_orphaned(int
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;

- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -335,48 +369,97 @@ void shm_destroy_orphaned(struct ipc_nam
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;

- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);

- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;

- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;

- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }

- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}

static int shm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -607,7 +690,11 @@ static int newseg(struct ipc_namespace *
goto no_id;
}

+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
+ task_unlock(current);

/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1252,7 +1339,8 @@ out_nattch:
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);



2021-12-06 15:00:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 25/52] xen: sync include/xen/interface/io/ring.h with Xens newest version

From: Juergen Gross <[email protected]>

commit 629a5d87e26fe96bcaab44cbb81f5866af6f7008 upstream.

Sync include/xen/interface/io/ring.h with Xen's newest version in
order to get the RING_COPY_RESPONSE() and RING_RESPONSE_PROD_OVERFLOW()
macros.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/xen/interface/io/ring.h | 271 +++++++++++++++++++---------------------
1 file changed, 131 insertions(+), 140 deletions(-)

--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -24,82 +24,79 @@ typedef unsigned int RING_IDX;
* A ring contains as many entries as will fit, rounded down to the nearest
* power of two (so we can mask with (size-1) to loop around).
*/
-#define __CONST_RING_SIZE(_s, _sz) \
- (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \
- sizeof(((struct _s##_sring *)0)->ring[0])))
-
+#define __CONST_RING_SIZE(_s, _sz) \
+ (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \
+ sizeof(((struct _s##_sring *)0)->ring[0])))
/*
* The same for passing in an actual pointer instead of a name tag.
*/
-#define __RING_SIZE(_s, _sz) \
- (__RD32(((_sz) - (long)&(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))
+#define __RING_SIZE(_s, _sz) \
+ (__RD32(((_sz) - (long)(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))

/*
* Macros to make the correct C datatypes for a new kind of ring.
*
* To make a new ring datatype, you need to have two message structures,
- * let's say struct request, and struct response already defined.
+ * let's say request_t, and response_t already defined.
*
* In a header where you want the ring datatype declared, you then do:
*
- * DEFINE_RING_TYPES(mytag, struct request, struct response);
+ * DEFINE_RING_TYPES(mytag, request_t, response_t);
*
* These expand out to give you a set of types, as you can see below.
* The most important of these are:
*
- * struct mytag_sring - The shared ring.
- * struct mytag_front_ring - The 'front' half of the ring.
- * struct mytag_back_ring - The 'back' half of the ring.
+ * mytag_sring_t - The shared ring.
+ * mytag_front_ring_t - The 'front' half of the ring.
+ * mytag_back_ring_t - The 'back' half of the ring.
*
* To initialize a ring in your code you need to know the location and size
* of the shared memory area (PAGE_SIZE, for instance). To initialise
* the front half:
*
- * struct mytag_front_ring front_ring;
- * SHARED_RING_INIT((struct mytag_sring *)shared_page);
- * FRONT_RING_INIT(&front_ring, (struct mytag_sring *)shared_page,
- * PAGE_SIZE);
+ * mytag_front_ring_t front_ring;
+ * SHARED_RING_INIT((mytag_sring_t *)shared_page);
+ * FRONT_RING_INIT(&front_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
*
* Initializing the back follows similarly (note that only the front
* initializes the shared ring):
*
- * struct mytag_back_ring back_ring;
- * BACK_RING_INIT(&back_ring, (struct mytag_sring *)shared_page,
- * PAGE_SIZE);
+ * mytag_back_ring_t back_ring;
+ * BACK_RING_INIT(&back_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
*/

-#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \
- \
-/* Shared ring entry */ \
-union __name##_sring_entry { \
- __req_t req; \
- __rsp_t rsp; \
-}; \
- \
-/* Shared ring page */ \
-struct __name##_sring { \
- RING_IDX req_prod, req_event; \
- RING_IDX rsp_prod, rsp_event; \
- uint8_t pad[48]; \
- union __name##_sring_entry ring[1]; /* variable-length */ \
-}; \
- \
-/* "Front" end's private variables */ \
-struct __name##_front_ring { \
- RING_IDX req_prod_pvt; \
- RING_IDX rsp_cons; \
- unsigned int nr_ents; \
- struct __name##_sring *sring; \
-}; \
- \
-/* "Back" end's private variables */ \
-struct __name##_back_ring { \
- RING_IDX rsp_prod_pvt; \
- RING_IDX req_cons; \
- unsigned int nr_ents; \
- struct __name##_sring *sring; \
-};
-
+#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \
+ \
+/* Shared ring entry */ \
+union __name##_sring_entry { \
+ __req_t req; \
+ __rsp_t rsp; \
+}; \
+ \
+/* Shared ring page */ \
+struct __name##_sring { \
+ RING_IDX req_prod, req_event; \
+ RING_IDX rsp_prod, rsp_event; \
+ uint8_t __pad[48]; \
+ union __name##_sring_entry ring[1]; /* variable-length */ \
+}; \
+ \
+/* "Front" end's private variables */ \
+struct __name##_front_ring { \
+ RING_IDX req_prod_pvt; \
+ RING_IDX rsp_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
+/* "Back" end's private variables */ \
+struct __name##_back_ring { \
+ RING_IDX rsp_prod_pvt; \
+ RING_IDX req_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
/*
* Macros for manipulating rings.
*
@@ -116,105 +113,99 @@ struct __name##_back_ring { \
*/

/* Initialising empty rings */
-#define SHARED_RING_INIT(_s) do { \
- (_s)->req_prod = (_s)->rsp_prod = 0; \
- (_s)->req_event = (_s)->rsp_event = 1; \
- memset((_s)->pad, 0, sizeof((_s)->pad)); \
+#define SHARED_RING_INIT(_s) do { \
+ (_s)->req_prod = (_s)->rsp_prod = 0; \
+ (_s)->req_event = (_s)->rsp_event = 1; \
+ (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \
} while(0)

-#define FRONT_RING_INIT(_r, _s, __size) do { \
- (_r)->req_prod_pvt = 0; \
- (_r)->rsp_cons = 0; \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
- (_r)->sring = (_s); \
+#define FRONT_RING_ATTACH(_r, _s, _i, __size) do { \
+ (_r)->req_prod_pvt = (_i); \
+ (_r)->rsp_cons = (_i); \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
} while (0)

-#define BACK_RING_INIT(_r, _s, __size) do { \
- (_r)->rsp_prod_pvt = 0; \
- (_r)->req_cons = 0; \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
- (_r)->sring = (_s); \
-} while (0)
+#define FRONT_RING_INIT(_r, _s, __size) FRONT_RING_ATTACH(_r, _s, 0, __size)

-/* Initialize to existing shared indexes -- for recovery */
-#define FRONT_RING_ATTACH(_r, _s, __size) do { \
- (_r)->sring = (_s); \
- (_r)->req_prod_pvt = (_s)->req_prod; \
- (_r)->rsp_cons = (_s)->rsp_prod; \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
+#define BACK_RING_ATTACH(_r, _s, _i, __size) do { \
+ (_r)->rsp_prod_pvt = (_i); \
+ (_r)->req_cons = (_i); \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
} while (0)

-#define BACK_RING_ATTACH(_r, _s, __size) do { \
- (_r)->sring = (_s); \
- (_r)->rsp_prod_pvt = (_s)->rsp_prod; \
- (_r)->req_cons = (_s)->req_prod; \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
-} while (0)
+#define BACK_RING_INIT(_r, _s, __size) BACK_RING_ATTACH(_r, _s, 0, __size)

/* How big is this ring? */
-#define RING_SIZE(_r) \
+#define RING_SIZE(_r) \
((_r)->nr_ents)

/* Number of free requests (for use on front side only). */
-#define RING_FREE_REQUESTS(_r) \
+#define RING_FREE_REQUESTS(_r) \
(RING_SIZE(_r) - ((_r)->req_prod_pvt - (_r)->rsp_cons))

/* Test if there is an empty slot available on the front ring.
* (This is only meaningful from the front. )
*/
-#define RING_FULL(_r) \
+#define RING_FULL(_r) \
(RING_FREE_REQUESTS(_r) == 0)

/* Test if there are outstanding messages to be processed on a ring. */
-#define RING_HAS_UNCONSUMED_RESPONSES(_r) \
+#define RING_HAS_UNCONSUMED_RESPONSES(_r) \
((_r)->sring->rsp_prod - (_r)->rsp_cons)

-#define RING_HAS_UNCONSUMED_REQUESTS(_r) \
- ({ \
- unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \
- unsigned int rsp = RING_SIZE(_r) - \
- ((_r)->req_cons - (_r)->rsp_prod_pvt); \
- req < rsp ? req : rsp; \
- })
+#define RING_HAS_UNCONSUMED_REQUESTS(_r) ({ \
+ unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \
+ unsigned int rsp = RING_SIZE(_r) - \
+ ((_r)->req_cons - (_r)->rsp_prod_pvt); \
+ req < rsp ? req : rsp; \
+})

/* Direct access to individual ring elements, by index. */
-#define RING_GET_REQUEST(_r, _idx) \
+#define RING_GET_REQUEST(_r, _idx) \
(&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req))

+#define RING_GET_RESPONSE(_r, _idx) \
+ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp))
+
/*
- * Get a local copy of a request.
+ * Get a local copy of a request/response.
*
- * Use this in preference to RING_GET_REQUEST() so all processing is
+ * Use this in preference to RING_GET_{REQUEST,RESPONSE}() so all processing is
* done on a local copy that cannot be modified by the other end.
*
* Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this
- * to be ineffective where _req is a struct which consists of only bitfields.
+ * to be ineffective where dest is a struct which consists of only bitfields.
*/
-#define RING_COPY_REQUEST(_r, _idx, _req) do { \
- /* Use volatile to force the copy into _req. */ \
- *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \
+#define RING_COPY_(type, r, idx, dest) do { \
+ /* Use volatile to force the copy into dest. */ \
+ *(dest) = *(volatile typeof(dest))RING_GET_##type(r, idx); \
} while (0)

-#define RING_GET_RESPONSE(_r, _idx) \
- (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp))
+#define RING_COPY_REQUEST(r, idx, req) RING_COPY_(REQUEST, r, idx, req)
+#define RING_COPY_RESPONSE(r, idx, rsp) RING_COPY_(RESPONSE, r, idx, rsp)

/* Loop termination condition: Would the specified index overflow the ring? */
-#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \
+#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \
(((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r))

/* Ill-behaved frontend determination: Can there be this many requests? */
-#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \
+#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \
(((_prod) - (_r)->rsp_prod_pvt) > RING_SIZE(_r))

-
-#define RING_PUSH_REQUESTS(_r) do { \
- wmb(); /* back sees requests /before/ updated producer index */ \
- (_r)->sring->req_prod = (_r)->req_prod_pvt; \
+/* Ill-behaved backend determination: Can there be this many responses? */
+#define RING_RESPONSE_PROD_OVERFLOW(_r, _prod) \
+ (((_prod) - (_r)->rsp_cons) > RING_SIZE(_r))
+
+#define RING_PUSH_REQUESTS(_r) do { \
+ wmb(); /* back sees requests /before/ updated producer index */ \
+ (_r)->sring->req_prod = (_r)->req_prod_pvt; \
} while (0)

-#define RING_PUSH_RESPONSES(_r) do { \
- wmb(); /* front sees responses /before/ updated producer index */ \
- (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
+#define RING_PUSH_RESPONSES(_r) do { \
+ wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
} while (0)

/*
@@ -247,40 +238,40 @@ struct __name##_back_ring { \
* field appropriately.
*/

-#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
- RING_IDX __old = (_r)->sring->req_prod; \
- RING_IDX __new = (_r)->req_prod_pvt; \
- wmb(); /* back sees requests /before/ updated producer index */ \
- (_r)->sring->req_prod = __new; \
- mb(); /* back sees new requests /before/ we check req_event */ \
- (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
- (RING_IDX)(__new - __old)); \
-} while (0)
-
-#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
- RING_IDX __old = (_r)->sring->rsp_prod; \
- RING_IDX __new = (_r)->rsp_prod_pvt; \
- wmb(); /* front sees responses /before/ updated producer index */ \
- (_r)->sring->rsp_prod = __new; \
- mb(); /* front sees new responses /before/ we check rsp_event */ \
- (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
- (RING_IDX)(__new - __old)); \
-} while (0)
-
-#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \
- (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
- if (_work_to_do) break; \
- (_r)->sring->req_event = (_r)->req_cons + 1; \
- mb(); \
- (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
-} while (0)
-
-#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \
- (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
- if (_work_to_do) break; \
- (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
- mb(); \
- (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->req_prod; \
+ RING_IDX __new = (_r)->req_prod_pvt; \
+ wmb(); /* back sees requests /before/ updated producer index */ \
+ (_r)->sring->req_prod = __new; \
+ mb(); /* back sees new requests /before/ we check req_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
+ (RING_IDX)(__new - __old)); \
+} while (0)
+
+#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->rsp_prod; \
+ RING_IDX __new = (_r)->rsp_prod_pvt; \
+ wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = __new; \
+ mb(); /* front sees new resps /before/ we check rsp_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
+ (RING_IDX)(__new - __old)); \
+} while (0)
+
+#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->req_event = (_r)->req_cons + 1; \
+ mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+} while (0)
+
+#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
+ mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
} while (0)

#endif /* __XEN_PUBLIC_IO_RING_H__ */



2021-12-06 15:00:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 26/52] xen/blkfront: read response from backend only once

From: Juergen Gross <[email protected]>

commit 71b66243f9898d0e54296b4e7035fb33cdcb0707 upstream.

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1296,7 +1296,7 @@ static void blkif_completion(struct blk_
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
{
struct request *req;
- struct blkif_response *bret;
+ struct blkif_response bret;
RING_IDX i, rp;
unsigned long flags;
struct blkfront_info *info = (struct blkfront_info *)dev_id;
@@ -1316,8 +1316,9 @@ static irqreturn_t blkif_interrupt(int i
for (i = info->ring.rsp_cons; i != rp; i++) {
unsigned long id;

- bret = RING_GET_RESPONSE(&info->ring, i);
- id = bret->id;
+ RING_COPY_RESPONSE(&info->ring, i, &bret);
+ id = bret.id;
+
/*
* The backend has messed up and given us an id that we would
* never have given to it (we stamp it up to BLK_RING_SIZE -
@@ -1325,29 +1326,29 @@ static irqreturn_t blkif_interrupt(int i
*/
if (id >= BLK_RING_SIZE(info)) {
WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
/* We can't safely get the 'struct request' as
* the id is busted. */
continue;
}
req = info->shadow[id].request;

- if (bret->operation != BLKIF_OP_DISCARD)
- blkif_completion(&info->shadow[id], info, bret);
+ if (bret.operation != BLKIF_OP_DISCARD)
+ blkif_completion(&info->shadow[id], info, &bret);

if (add_id_to_freelist(info, id)) {
WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
continue;
}

- error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
- switch (bret->operation) {
+ error = (bret.status == BLKIF_RSP_OKAY) ? 0 : -EIO;
+ switch (bret.operation) {
case BLKIF_OP_DISCARD:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
@@ -1358,15 +1359,15 @@ static irqreturn_t blkif_interrupt(int i
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
}
- if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+ if (unlikely(bret.status == BLKIF_RSP_ERROR &&
info->shadow[id].req.u.rw.nr_segments == 0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
}
if (unlikely(error)) {
@@ -1378,9 +1379,9 @@ static irqreturn_t blkif_interrupt(int i
/* fall through */
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
- if (unlikely(bret->status != BLKIF_RSP_OKAY))
+ if (unlikely(bret.status != BLKIF_RSP_OKAY))
dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret->status);
+ "request: %x\n", bret.status);

blk_mq_complete_request(req, error);
break;



2021-12-06 15:00:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 27/52] xen/blkfront: dont take local copy of a request from the ring page

From: Juergen Gross <[email protected]>

commit 8f5a695d99000fc3aa73934d7ced33cfc64dcdab upstream.

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/xen-blkfront.c | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)

--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -456,16 +456,31 @@ static int blkif_ioctl(struct block_devi
return 0;
}

+static unsigned long blkif_ring_get_request(struct blkfront_info *info,
+ struct request *req,
+ struct blkif_request **ring_req)
+{
+ unsigned long id;
+
+ *ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+ info->ring.req_prod_pvt++;
+
+ id = get_id_from_freelist(info);
+ info->shadow[id].request = req;
+ info->shadow[id].req.u.rw.id = id;
+
+ return id;
+}
+
static int blkif_queue_discard_req(struct request *req)
{
struct blkfront_info *info = req->rq_disk->private_data;
- struct blkif_request *ring_req;
+ struct blkif_request *ring_req, *final_ring_req;
unsigned long id;

/* Fill out a communications ring structure. */
- ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
- id = get_id_from_freelist(info);
- info->shadow[id].request = req;
+ id = blkif_ring_get_request(info, req, &final_ring_req);
+ ring_req = &info->shadow[id].req;

ring_req->operation = BLKIF_OP_DISCARD;
ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -478,8 +493,8 @@ static int blkif_queue_discard_req(struc

info->ring.req_prod_pvt++;

- /* Keep a private copy so we can reissue requests when recovering. */
- info->shadow[id].req = *ring_req;
+ /* Copy the request to the ring page. */
+ *final_ring_req = *ring_req;

return 0;
}
@@ -569,7 +584,7 @@ static void blkif_setup_rw_req_grant(uns
static int blkif_queue_rw_req(struct request *req)
{
struct blkfront_info *info = req->rq_disk->private_data;
- struct blkif_request *ring_req;
+ struct blkif_request *ring_req, *final_ring_req;
unsigned long id;
int i;
struct setup_rw_req setup = {
@@ -613,9 +628,8 @@ static int blkif_queue_rw_req(struct req
new_persistent_gnts = 0;

/* Fill out a communications ring structure. */
- ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
- id = get_id_from_freelist(info);
- info->shadow[id].request = req;
+ id = blkif_ring_get_request(info, req, &final_ring_req);
+ ring_req = &info->shadow[id].req;

BUG_ON(info->max_indirect_segments == 0 &&
GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -696,8 +710,8 @@ static int blkif_queue_rw_req(struct req

info->ring.req_prod_pvt++;

- /* Keep a private copy so we can reissue requests when recovering. */
- info->shadow[id].req = *ring_req;
+ /* Copy request(s) to the ring page. */
+ *final_ring_req = *ring_req;

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);



2021-12-06 15:00:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 28/52] xen/blkfront: dont trust the backend response data blindly

From: Juergen Gross <[email protected]>

commit b94e4b147fd1992ad450e1fea1fdaa3738753373 upstream.

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected. Recovering from
this state is possible only via removing and adding the virtual device
again (e.g. via a suspend/resume cycle).

Make all warning messages issued due to valid error responses rate
limited in order to avoid message floods being triggered by a malicious
backend.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/xen-blkfront.c | 57 ++++++++++++++++++++++++++++++++++---------
1 file changed, 46 insertions(+), 11 deletions(-)

--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -64,6 +64,7 @@ enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
BLKIF_STATE_SUSPENDED,
+ BLKIF_STATE_ERROR,
};

struct grant {
@@ -79,6 +80,7 @@ struct blk_shadow {
struct grant **indirect_grants;
struct scatterlist *sg;
unsigned int num_sg;
+ bool inflight;
};

struct split_bio {
@@ -495,6 +497,7 @@ static int blkif_queue_discard_req(struc

/* Copy the request to the ring page. */
*final_ring_req = *ring_req;
+ info->shadow[id].inflight = true;

return 0;
}
@@ -712,6 +715,7 @@ static int blkif_queue_rw_req(struct req

/* Copy request(s) to the ring page. */
*final_ring_req = *ring_req;
+ info->shadow[id].inflight = true;

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
@@ -1324,11 +1328,17 @@ static irqreturn_t blkif_interrupt(int i
}

again:
- rp = info->ring.sring->rsp_prod;
+ rp = READ_ONCE(info->ring.sring->rsp_prod);
rmb(); /* Ensure we see queued responses up to 'rp'. */
+ if (RING_RESPONSE_PROD_OVERFLOW(&info->ring, rp)) {
+ pr_alert("%s: illegal number of responses %u\n",
+ info->gd->disk_name, rp - info->ring.rsp_cons);
+ goto err;
+ }

for (i = info->ring.rsp_cons; i != rp; i++) {
unsigned long id;
+ unsigned int op;

RING_COPY_RESPONSE(&info->ring, i, &bret);
id = bret.id;
@@ -1339,14 +1349,28 @@ static irqreturn_t blkif_interrupt(int i
* look in get_id_from_freelist.
*/
if (id >= BLK_RING_SIZE(info)) {
- WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret.operation), id);
- /* We can't safely get the 'struct request' as
- * the id is busted. */
- continue;
+ pr_alert("%s: response has incorrect id (%ld)\n",
+ info->gd->disk_name, id);
+ goto err;
+ }
+ if (!info->shadow[id].inflight) {
+ pr_alert("%s: response references no pending request\n",
+ info->gd->disk_name);
+ goto err;
}
+
+ info->shadow[id].inflight = false;
req = info->shadow[id].request;

+ op = info->shadow[id].req.operation;
+ if (op == BLKIF_OP_INDIRECT)
+ op = info->shadow[id].req.u.indirect.indirect_op;
+ if (bret.operation != op) {
+ pr_alert("%s: response has wrong operation (%u instead of %u)\n",
+ info->gd->disk_name, bret.operation, op);
+ goto err;
+ }
+
if (bret.operation != BLKIF_OP_DISCARD)
blkif_completion(&info->shadow[id], info, &bret);

@@ -1361,7 +1385,8 @@ static irqreturn_t blkif_interrupt(int i
case BLKIF_OP_DISCARD:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
info->feature_discard = 0;
@@ -1374,13 +1399,13 @@ static irqreturn_t blkif_interrupt(int i
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
}
if (unlikely(bret.status == BLKIF_RSP_ERROR &&
info->shadow[id].req.u.rw.nr_segments == 0)) {
- printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
error = -EOPNOTSUPP;
}
@@ -1394,8 +1419,9 @@ static irqreturn_t blkif_interrupt(int i
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
if (unlikely(bret.status != BLKIF_RSP_OKAY))
- dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret.status);
+ dev_dbg_ratelimited(&info->xbdev->dev,
+ "Bad return from blkdev data request: %x\n",
+ bret.status);

blk_mq_complete_request(req, error);
break;
@@ -1419,6 +1445,14 @@ static irqreturn_t blkif_interrupt(int i
spin_unlock_irqrestore(&info->io_lock, flags);

return IRQ_HANDLED;
+
+ err:
+ info->connected = BLKIF_STATE_ERROR;
+
+ spin_unlock_irqrestore(&info->io_lock, flags);
+
+ pr_alert("%s disabled for further use\n", info->gd->disk_name);
+ return IRQ_HANDLED;
}


@@ -1928,6 +1962,7 @@ out_of_memory:
info->shadow[i].sg = NULL;
kfree(info->shadow[i].indirect_grants);
info->shadow[i].indirect_grants = NULL;
+ info->shadow[i].inflight = false;
}
if (!list_empty(&info->indirect_pages)) {
struct page *indirect_page, *n;



2021-12-06 15:00:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 29/52] xen/netfront: read response from backend only once

From: Juergen Gross <[email protected]>

commit 8446066bf8c1f9f7b7412c43fbea0fb87464d75b upstream.

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netfront.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -387,13 +387,13 @@ static void xennet_tx_buf_gc(struct netf
rmb(); /* Ensure we see responses up to 'rp'. */

for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
- struct xen_netif_tx_response *txrsp;
+ struct xen_netif_tx_response txrsp;

- txrsp = RING_GET_RESPONSE(&queue->tx, cons);
- if (txrsp->status == XEN_NETIF_RSP_NULL)
+ RING_COPY_RESPONSE(&queue->tx, cons, &txrsp);
+ if (txrsp.status == XEN_NETIF_RSP_NULL)
continue;

- id = txrsp->id;
+ id = txrsp.id;
skb = queue->tx_skbs[id].skb;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
@@ -736,7 +736,7 @@ static int xennet_get_extras(struct netf
RING_IDX rp)

{
- struct xen_netif_extra_info *extra;
+ struct xen_netif_extra_info extra;
struct device *dev = &queue->info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
int err = 0;
@@ -752,24 +752,22 @@ static int xennet_get_extras(struct netf
break;
}

- extra = (struct xen_netif_extra_info *)
- RING_GET_RESPONSE(&queue->rx, ++cons);
+ RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);

- if (unlikely(!extra->type ||
- extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+ if (unlikely(!extra.type ||
+ extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
if (net_ratelimit())
dev_warn(dev, "Invalid extra type: %d\n",
- extra->type);
+ extra.type);
err = -EINVAL;
} else {
- memcpy(&extras[extra->type - 1], extra,
- sizeof(*extra));
+ extras[extra.type - 1] = extra;
}

skb = xennet_get_rx_skb(queue, cons);
ref = xennet_get_rx_ref(queue, cons);
xennet_move_rx_slot(queue, skb, ref);
- } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+ } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);

queue->rx.rsp_cons = cons;
return err;
@@ -779,7 +777,7 @@ static int xennet_get_responses(struct n
struct netfront_rx_info *rinfo, RING_IDX rp,
struct sk_buff_head *list)
{
- struct xen_netif_rx_response *rx = &rinfo->rx;
+ struct xen_netif_rx_response *rx = &rinfo->rx, rx_local;
struct xen_netif_extra_info *extras = rinfo->extras;
struct device *dev = &queue->info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
@@ -837,7 +835,8 @@ next:
break;
}

- rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
+ RING_COPY_RESPONSE(&queue->rx, cons + slots, &rx_local);
+ rx = &rx_local;
skb = xennet_get_rx_skb(queue, cons + slots);
ref = xennet_get_rx_ref(queue, cons + slots);
slots++;
@@ -892,10 +891,11 @@ static int xennet_fill_frags(struct netf
struct sk_buff *nskb;

while ((nskb = __skb_dequeue(list))) {
- struct xen_netif_rx_response *rx =
- RING_GET_RESPONSE(&queue->rx, ++cons);
+ struct xen_netif_rx_response rx;
skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];

+ RING_COPY_RESPONSE(&queue->rx, ++cons, &rx);
+
if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;

@@ -910,7 +910,7 @@ static int xennet_fill_frags(struct netf

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
skb_frag_page(nfrag),
- rx->offset, rx->status, PAGE_SIZE);
+ rx.offset, rx.status, PAGE_SIZE);

skb_shinfo(nskb)->nr_frags = 0;
kfree_skb(nskb);
@@ -1008,7 +1008,7 @@ static int xennet_poll(struct napi_struc
i = queue->rx.rsp_cons;
work_done = 0;
while ((i != rp) && (work_done < budget)) {
- memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx));
+ RING_COPY_RESPONSE(&queue->rx, i, rx);
memset(extras, 0, sizeof(rinfo.extras));

err = xennet_get_responses(queue, &rinfo, rp, &tmpq);



2021-12-06 15:00:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 13/52] net: ieee802154: handle iftypes as u32

From: Alexander Aring <[email protected]>

[ Upstream commit 451dc48c806a7ce9fbec5e7a24ccf4b2c936e834 ]

This patch fixes an issue that an u32 netlink value is handled as a
signed enum value which doesn't fit into the range of u32 netlink type.
If it's handled as -1 value some BIT() evaluation ends in a
shift-out-of-bounds issue. To solve the issue we set the to u32 max which
is s32 "-1" value to keep backwards compatibility and let the followed enum
values start counting at 0. This brings the compiler to never handle the
enum as signed and a check if the value is above NL802154_IFTYPE_MAX should
filter -1 out.

Fixes: f3ea5e44231a ("ieee802154: add new interface command")
Signed-off-by: Alexander Aring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/nl802154.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/nl802154.h b/include/net/nl802154.h
index 32cb3e591e07b..53f140fc1983c 100644
--- a/include/net/nl802154.h
+++ b/include/net/nl802154.h
@@ -19,6 +19,8 @@
*
*/

+#include <linux/types.h>
+
#define NL802154_GENL_NAME "nl802154"

enum nl802154_commands {
@@ -143,10 +145,9 @@ enum nl802154_attrs {
};

enum nl802154_iftype {
- /* for backwards compatibility TODO */
- NL802154_IFTYPE_UNSPEC = -1,
+ NL802154_IFTYPE_UNSPEC = (~(__u32)0),

- NL802154_IFTYPE_NODE,
+ NL802154_IFTYPE_NODE = 0,
NL802154_IFTYPE_MONITOR,
NL802154_IFTYPE_COORD,

--
2.33.0




2021-12-06 15:00:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 35/52] net: return correct error code

From: liuguoqiang <[email protected]>

[ Upstream commit 6def480181f15f6d9ec812bca8cbc62451ba314c ]

When kmemdup called failed and register_net_sysctl return NULL, should
return ENOMEM instead of ENOBUFS

Signed-off-by: liuguoqiang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/devinet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2cb8612e7821e..35961ae1d120c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2237,7 +2237,7 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
free:
kfree(t);
out:
- return -ENOBUFS;
+ return -ENOMEM;
}

static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf)
--
2.33.0




2021-12-06 15:00:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 36/52] platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep

From: Slark Xiao <[email protected]>

[ Upstream commit 39f53292181081d35174a581a98441de5da22bc9 ]

When WWAN device wake from S3 deep, under thinkpad platform,
WWAN would be disabled. This disable status could be checked
by command 'nmcli r wwan' or 'rfkill list'.

Issue analysis as below:
When host resume from S3 deep, thinkpad_acpi driver would
call hotkey_resume() function. Finnaly, it will use
wan_get_status to check the current status of WWAN device.
During this resume progress, wan_get_status would always
return off even WWAN boot up completely.
In patch V2, Hans said 'sw_state should be unchanged
after a suspend/resume. It's better to drop the
tpacpi_rfk_update_swstate call all together from the
resume path'.
And it's confimed by Lenovo that GWAN is no longer
available from WHL generation because the design does not
match with current pin control.

Signed-off-by: Slark Xiao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/platform/x86/thinkpad_acpi.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index f3954af14f52f..466a0d0162c3d 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -1168,15 +1168,6 @@ static int tpacpi_rfk_update_swstate(const struct tpacpi_rfk *tp_rfk)
return status;
}

-/* Query FW and update rfkill sw state for all rfkill switches */
-static void tpacpi_rfk_update_swstate_all(void)
-{
- unsigned int i;
-
- for (i = 0; i < TPACPI_RFK_SW_MAX; i++)
- tpacpi_rfk_update_swstate(tpacpi_rfkill_switches[i]);
-}
-
/*
* Sync the HW-blocking state of all rfkill switches,
* do notice it causes the rfkill core to schedule uevents
@@ -3015,9 +3006,6 @@ static void tpacpi_send_radiosw_update(void)
if (wlsw == TPACPI_RFK_RADIO_OFF)
tpacpi_rfk_update_hwblock_state(true);

- /* Sync sw blocking state */
- tpacpi_rfk_update_swstate_all();
-
/* Sync hw blocking state last if it is hw-unblocked */
if (wlsw == TPACPI_RFK_RADIO_ON)
tpacpi_rfk_update_hwblock_state(false);
--
2.33.0




2021-12-06 15:00:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 38/52] scsi: iscsi: Unblock session then wake up error handler

From: Mike Christie <[email protected]>

[ Upstream commit a0c2f8b6709a9a4af175497ca65f93804f57b248 ]

We can race where iscsi_session_recovery_timedout() has woken up the error
handler thread and it's now setting the devices to offline, and
session_recovery_timedout()'s call to scsi_target_unblock() is also trying
to set the device's state to transport-offline. We can then get a mix of
states.

For the case where we can't relogin we want the devices to be in
transport-offline so when we have repaired the connection
__iscsi_unblock_session() can set the state back to running.

Set the device state then call into libiscsi to wake up the error handler.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Lee Duncan <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/scsi_transport_iscsi.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 9906a3b562e93..269277c1d9dcc 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1896,12 +1896,12 @@ static void session_recovery_timedout(struct work_struct *work)
}
spin_unlock_irqrestore(&session->lock, flags);

- if (session->transport->session_recovery_timedout)
- session->transport->session_recovery_timedout(session);
-
ISCSI_DBG_TRANS_SESSION(session, "Unblocking SCSI target\n");
scsi_target_unblock(&session->dev, SDEV_TRANSPORT_OFFLINE);
ISCSI_DBG_TRANS_SESSION(session, "Completed unblocking SCSI target\n");
+
+ if (session->transport->session_recovery_timedout)
+ session->transport->session_recovery_timedout(session);
}

static void __iscsi_unblock_session(struct work_struct *work)
--
2.33.0




2021-12-06 15:00:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 30/52] xen/netfront: dont read data from request on the ring page

From: Juergen Gross <[email protected]>

commit 162081ec33c2686afa29d91bf8d302824aa846c7 upstream.

In order to avoid a malicious backend being able to influence the local
processing of a request build the request locally first and then copy
it to the ring page. Any reading from the request influencing the
processing in the frontend needs to be done on the local instance.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netfront.c | 80 ++++++++++++++++++++-------------------------
1 file changed, 37 insertions(+), 43 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -433,7 +433,8 @@ struct xennet_gnttab_make_txreq {
struct netfront_queue *queue;
struct sk_buff *skb;
struct page *page;
- struct xen_netif_tx_request *tx; /* Last request */
+ struct xen_netif_tx_request *tx; /* Last request on ring page */
+ struct xen_netif_tx_request tx_local; /* Last request local copy*/
unsigned int size;
};

@@ -461,30 +462,27 @@ static void xennet_tx_setup_grant(unsign
queue->grant_tx_page[id] = page;
queue->grant_tx_ref[id] = ref;

- tx->id = id;
- tx->gref = ref;
- tx->offset = offset;
- tx->size = len;
- tx->flags = 0;
+ info->tx_local.id = id;
+ info->tx_local.gref = ref;
+ info->tx_local.offset = offset;
+ info->tx_local.size = len;
+ info->tx_local.flags = 0;
+
+ *tx = info->tx_local;

info->tx = tx;
- info->size += tx->size;
+ info->size += info->tx_local.size;
}

static struct xen_netif_tx_request *xennet_make_first_txreq(
- struct netfront_queue *queue, struct sk_buff *skb,
- struct page *page, unsigned int offset, unsigned int len)
+ struct xennet_gnttab_make_txreq *info,
+ unsigned int offset, unsigned int len)
{
- struct xennet_gnttab_make_txreq info = {
- .queue = queue,
- .skb = skb,
- .page = page,
- .size = 0,
- };
+ info->size = 0;

- gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+ gnttab_for_one_grant(info->page, offset, len, xennet_tx_setup_grant, info);

- return info.tx;
+ return info->tx;
}

static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
@@ -497,35 +495,27 @@ static void xennet_make_one_txreq(unsign
xennet_tx_setup_grant(gfn, offset, len, data);
}

-static struct xen_netif_tx_request *xennet_make_txreqs(
- struct netfront_queue *queue, struct xen_netif_tx_request *tx,
- struct sk_buff *skb, struct page *page,
+static void xennet_make_txreqs(
+ struct xennet_gnttab_make_txreq *info,
+ struct page *page,
unsigned int offset, unsigned int len)
{
- struct xennet_gnttab_make_txreq info = {
- .queue = queue,
- .skb = skb,
- .tx = tx,
- };
-
/* Skip unused frames from start of page */
page += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;

while (len) {
- info.page = page;
- info.size = 0;
+ info->page = page;
+ info->size = 0;

gnttab_foreach_grant_in_range(page, offset, len,
xennet_make_one_txreq,
- &info);
+ info);

page++;
offset = 0;
- len -= info.size;
+ len -= info->size;
}
-
- return info.tx;
}

/*
@@ -578,7 +568,7 @@ static int xennet_start_xmit(struct sk_b
{
struct netfront_info *np = netdev_priv(dev);
struct netfront_stats *tx_stats = this_cpu_ptr(np->tx_stats);
- struct xen_netif_tx_request *tx, *first_tx;
+ struct xen_netif_tx_request *first_tx;
unsigned int i;
int notify;
int slots;
@@ -587,6 +577,7 @@ static int xennet_start_xmit(struct sk_b
unsigned int len;
unsigned long flags;
struct netfront_queue *queue = NULL;
+ struct xennet_gnttab_make_txreq info = { };
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;

@@ -629,21 +620,24 @@ static int xennet_start_xmit(struct sk_b
}

/* First request for the linear area. */
- first_tx = tx = xennet_make_first_txreq(queue, skb,
- page, offset, len);
- offset += tx->size;
+ info.queue = queue;
+ info.skb = skb;
+ info.page = page;
+ first_tx = xennet_make_first_txreq(&info, offset, len);
+ offset += info.tx_local.size;
if (offset == PAGE_SIZE) {
page++;
offset = 0;
}
- len -= tx->size;
+ len -= info.tx_local.size;

if (skb->ip_summed == CHECKSUM_PARTIAL)
/* local packet? */
- tx->flags |= XEN_NETTXF_csum_blank | XEN_NETTXF_data_validated;
+ first_tx->flags |= XEN_NETTXF_csum_blank |
+ XEN_NETTXF_data_validated;
else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
/* remote but checksummed. */
- tx->flags |= XEN_NETTXF_data_validated;
+ first_tx->flags |= XEN_NETTXF_data_validated;

/* Optional extra info after the first request. */
if (skb_shinfo(skb)->gso_size) {
@@ -652,7 +646,7 @@ static int xennet_start_xmit(struct sk_b
gso = (struct xen_netif_extra_info *)
RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);

- tx->flags |= XEN_NETTXF_extra_info;
+ first_tx->flags |= XEN_NETTXF_extra_info;

gso->u.gso.size = skb_shinfo(skb)->gso_size;
gso->u.gso.type = (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) ?
@@ -666,13 +660,13 @@ static int xennet_start_xmit(struct sk_b
}

/* Requests for the rest of the linear area. */
- tx = xennet_make_txreqs(queue, tx, skb, page, offset, len);
+ xennet_make_txreqs(&info, page, offset, len);

/* Requests for all the frags. */
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- tx = xennet_make_txreqs(queue, tx, skb,
- skb_frag_page(frag), frag->page_offset,
+ xennet_make_txreqs(&info, skb_frag_page(frag),
+ frag->page_offset,
skb_frag_size(frag));
}




2021-12-06 15:00:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 32/52] xen/netfront: dont trust the backend response data blindly

From: Juergen Gross <[email protected]>

commit a884daa61a7d91650987e855464526aef219590f upstream.

Today netfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Note that only the tx queue needs special id handling, as for the rx
queue the id is equal to the index in the ring page.

Introduce a new indicator for the device whether it is broken and let
the device stop working when it is set. Set this indicator in case the
backend sets any weird data.


Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netfront.c | 80 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 75 insertions(+), 5 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -125,10 +125,12 @@ struct netfront_queue {
struct sk_buff *tx_skbs[NET_TX_RING_SIZE];
unsigned short tx_link[NET_TX_RING_SIZE];
#define TX_LINK_NONE 0xffff
+#define TX_PENDING 0xfffe
grant_ref_t gref_tx_head;
grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
struct page *grant_tx_page[NET_TX_RING_SIZE];
unsigned tx_skb_freelist;
+ unsigned int tx_pend_queue;

spinlock_t rx_lock ____cacheline_aligned_in_smp;
struct xen_netif_rx_front_ring rx;
@@ -154,6 +156,9 @@ struct netfront_info {
struct netfront_stats __percpu *rx_stats;
struct netfront_stats __percpu *tx_stats;

+ /* Is device behaving sane? */
+ bool broken;
+
atomic_t rx_gso_checksum_fixup;
};

@@ -338,7 +343,7 @@ static int xennet_open(struct net_device
unsigned int i = 0;
struct netfront_queue *queue = NULL;

- if (!np->queues)
+ if (!np->queues || np->broken)
return -ENODEV;

for (i = 0; i < num_queues; ++i) {
@@ -365,11 +370,17 @@ static void xennet_tx_buf_gc(struct netf
RING_IDX cons, prod;
unsigned short id;
struct sk_buff *skb;
+ const struct device *dev = &queue->info->netdev->dev;

BUG_ON(!netif_carrier_ok(queue->info->netdev));

do {
prod = queue->tx.sring->rsp_prod;
+ if (RING_RESPONSE_PROD_OVERFLOW(&queue->tx, prod)) {
+ dev_alert(dev, "Illegal number of responses %u\n",
+ prod - queue->tx.rsp_cons);
+ goto err;
+ }
rmb(); /* Ensure we see responses up to 'rp'. */

for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
@@ -379,14 +390,27 @@ static void xennet_tx_buf_gc(struct netf
if (txrsp.status == XEN_NETIF_RSP_NULL)
continue;

- id = txrsp.id;
+ id = txrsp.id;
+ if (id >= RING_SIZE(&queue->tx)) {
+ dev_alert(dev,
+ "Response has incorrect id (%u)\n",
+ id);
+ goto err;
+ }
+ if (queue->tx_link[id] != TX_PENDING) {
+ dev_alert(dev,
+ "Response for inactive request\n");
+ goto err;
+ }
+
+ queue->tx_link[id] = TX_LINK_NONE;
skb = queue->tx_skbs[id];
queue->tx_skbs[id] = NULL;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
- pr_alert("%s: warning -- grant still in use by backend domain\n",
- __func__);
- BUG();
+ dev_alert(dev,
+ "Grant still in use by backend domain\n");
+ goto err;
}
gnttab_end_foreign_access_ref(
queue->grant_tx_ref[id], GNTMAP_readonly);
@@ -414,6 +438,12 @@ static void xennet_tx_buf_gc(struct netf
} while ((cons == prod) && (prod != queue->tx.sring->rsp_prod));

xennet_maybe_wake_tx(queue);
+
+ return;
+
+ err:
+ queue->info->broken = true;
+ dev_alert(dev, "Disabled for further use\n");
}

struct xennet_gnttab_make_txreq {
@@ -457,6 +487,12 @@ static void xennet_tx_setup_grant(unsign

*tx = info->tx_local;

+ /*
+ * Put the request in the pending queue, it will be set to be pending
+ * when the producer index is about to be raised.
+ */
+ add_id_to_list(&queue->tx_pend_queue, queue->tx_link, id);
+
info->tx = tx;
info->size += info->tx_local.size;
}
@@ -549,6 +585,15 @@ static u16 xennet_select_queue(struct ne
return queue_idx;
}

+static void xennet_mark_tx_pending(struct netfront_queue *queue)
+{
+ unsigned int i;
+
+ while ((i = get_id_from_list(&queue->tx_pend_queue, queue->tx_link)) !=
+ TX_LINK_NONE)
+ queue->tx_link[i] = TX_PENDING;
+}
+
#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)

static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -571,6 +616,8 @@ static int xennet_start_xmit(struct sk_b
/* Drop the packet if no queues are set up */
if (num_queues < 1)
goto drop;
+ if (unlikely(np->broken))
+ goto drop;
/* Determine which queue to transmit this SKB on */
queue_index = skb_get_queue_mapping(skb);
queue = &np->queues[queue_index];
@@ -660,6 +707,8 @@ static int xennet_start_xmit(struct sk_b
/* First request has the packet length. */
first_tx->size = skb->len;

+ xennet_mark_tx_pending(queue);
+
RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->tx, notify);
if (notify)
notify_remote_via_irq(queue->tx_irq);
@@ -984,6 +1033,13 @@ static int xennet_poll(struct napi_struc
skb_queue_head_init(&tmpq);

rp = queue->rx.sring->rsp_prod;
+ if (RING_RESPONSE_PROD_OVERFLOW(&queue->rx, rp)) {
+ dev_alert(&dev->dev, "Illegal number of responses %u\n",
+ rp - queue->rx.rsp_cons);
+ queue->info->broken = true;
+ spin_unlock(&queue->rx_lock);
+ return 0;
+ }
rmb(); /* Ensure we see queued responses up to 'rp'. */

i = queue->rx.rsp_cons;
@@ -1224,6 +1280,9 @@ static irqreturn_t xennet_tx_interrupt(i
struct netfront_queue *queue = dev_id;
unsigned long flags;

+ if (queue->info->broken)
+ return IRQ_HANDLED;
+
spin_lock_irqsave(&queue->tx_lock, flags);
xennet_tx_buf_gc(queue);
spin_unlock_irqrestore(&queue->tx_lock, flags);
@@ -1236,6 +1295,9 @@ static irqreturn_t xennet_rx_interrupt(i
struct netfront_queue *queue = dev_id;
struct net_device *dev = queue->info->netdev;

+ if (queue->info->broken)
+ return IRQ_HANDLED;
+
if (likely(netif_carrier_ok(dev) &&
RING_HAS_UNCONSUMED_RESPONSES(&queue->rx)))
napi_schedule(&queue->napi);
@@ -1257,6 +1319,10 @@ static void xennet_poll_controller(struc
struct netfront_info *info = netdev_priv(dev);
unsigned int num_queues = dev->real_num_tx_queues;
unsigned int i;
+
+ if (info->broken)
+ return;
+
for (i = 0; i < num_queues; ++i)
xennet_interrupt(0, &info->queues[i]);
}
@@ -1627,6 +1693,7 @@ static int xennet_init_queue(struct netf

/* Initialise tx_skb_freelist as a free chain containing every entry. */
queue->tx_skb_freelist = 0;
+ queue->tx_pend_queue = TX_LINK_NONE;
for (i = 0; i < NET_TX_RING_SIZE; i++) {
queue->tx_link[i] = i + 1;
queue->grant_tx_ref[i] = GRANT_INVALID_REF;
@@ -1842,6 +1909,9 @@ static int talk_to_netback(struct xenbus
if (info->queues)
xennet_destroy_queues(info);

+ /* For the case of a reconnect reset the "broken" indicator. */
+ info->broken = false;
+
err = xennet_create_queues(info, &num_queues);
if (err < 0) {
xenbus_dev_fatal(dev, err, "creating queues");



2021-12-06 15:00:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 31/52] xen/netfront: disentangle tx_skb_freelist

From: Juergen Gross <[email protected]>

commit 21631d2d741a64a073e167c27769e73bc7844a2f upstream.

The tx_skb_freelist elements are in a single linked list with the
request id used as link reference. The per element link field is in a
union with the skb pointer of an in use request.

Move the link reference out of the union in order to enable a later
reuse of it for requests which need a populated skb pointer.

Rename add_id_to_freelist() and get_id_from_freelist() to
add_id_to_list() and get_id_from_list() in order to prepare using
those for other lists as well. Define ~0 as value to indicate the end
of a list and place that value into the link for a request not being
on the list.

When freeing a skb zero the skb pointer in the request. Use a NULL
value of the skb pointer instead of skb_entry_is_link() for deciding
whether a request has a skb linked to it.

Remove skb_entry_set_link() and open code it instead as it is really
trivial now.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netfront.c | 61 ++++++++++++++++++---------------------------
1 file changed, 25 insertions(+), 36 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -120,17 +120,11 @@ struct netfront_queue {

/*
* {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
- * are linked from tx_skb_freelist through skb_entry.link.
- *
- * NB. Freelist index entries are always going to be less than
- * PAGE_OFFSET, whereas pointers to skbs will always be equal or
- * greater than PAGE_OFFSET: we use this property to distinguish
- * them.
+ * are linked from tx_skb_freelist through tx_link.
*/
- union skb_entry {
- struct sk_buff *skb;
- unsigned long link;
- } tx_skbs[NET_TX_RING_SIZE];
+ struct sk_buff *tx_skbs[NET_TX_RING_SIZE];
+ unsigned short tx_link[NET_TX_RING_SIZE];
+#define TX_LINK_NONE 0xffff
grant_ref_t gref_tx_head;
grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
struct page *grant_tx_page[NET_TX_RING_SIZE];
@@ -168,33 +162,25 @@ struct netfront_rx_info {
struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1];
};

-static void skb_entry_set_link(union skb_entry *list, unsigned short id)
-{
- list->link = id;
-}
-
-static int skb_entry_is_link(const union skb_entry *list)
-{
- BUILD_BUG_ON(sizeof(list->skb) != sizeof(list->link));
- return (unsigned long)list->skb < PAGE_OFFSET;
-}
-
/*
* Access macros for acquiring freeing slots in tx_skbs[].
*/

-static void add_id_to_freelist(unsigned *head, union skb_entry *list,
- unsigned short id)
+static void add_id_to_list(unsigned *head, unsigned short *list,
+ unsigned short id)
{
- skb_entry_set_link(&list[id], *head);
+ list[id] = *head;
*head = id;
}

-static unsigned short get_id_from_freelist(unsigned *head,
- union skb_entry *list)
+static unsigned short get_id_from_list(unsigned *head, unsigned short *list)
{
unsigned int id = *head;
- *head = list[id].link;
+
+ if (id != TX_LINK_NONE) {
+ *head = list[id];
+ list[id] = TX_LINK_NONE;
+ }
return id;
}

@@ -394,7 +380,8 @@ static void xennet_tx_buf_gc(struct netf
continue;

id = txrsp.id;
- skb = queue->tx_skbs[id].skb;
+ skb = queue->tx_skbs[id];
+ queue->tx_skbs[id] = NULL;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
pr_alert("%s: warning -- grant still in use by backend domain\n",
@@ -407,7 +394,7 @@ static void xennet_tx_buf_gc(struct netf
&queue->gref_tx_head, queue->grant_tx_ref[id]);
queue->grant_tx_ref[id] = GRANT_INVALID_REF;
queue->grant_tx_page[id] = NULL;
- add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
+ add_id_to_list(&queue->tx_skb_freelist, queue->tx_link, id);
dev_kfree_skb_irq(skb);
}

@@ -450,7 +437,7 @@ static void xennet_tx_setup_grant(unsign
struct netfront_queue *queue = info->queue;
struct sk_buff *skb = info->skb;

- id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
+ id = get_id_from_list(&queue->tx_skb_freelist, queue->tx_link);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref));
@@ -458,7 +445,7 @@ static void xennet_tx_setup_grant(unsign
gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
gfn, GNTMAP_readonly);

- queue->tx_skbs[id].skb = skb;
+ queue->tx_skbs[id] = skb;
queue->grant_tx_page[id] = page;
queue->grant_tx_ref[id] = ref;

@@ -1126,17 +1113,18 @@ static void xennet_release_tx_bufs(struc

for (i = 0; i < NET_TX_RING_SIZE; i++) {
/* Skip over entries which are actually freelist references */
- if (skb_entry_is_link(&queue->tx_skbs[i]))
+ if (!queue->tx_skbs[i])
continue;

- skb = queue->tx_skbs[i].skb;
+ skb = queue->tx_skbs[i];
+ queue->tx_skbs[i] = NULL;
get_page(queue->grant_tx_page[i]);
gnttab_end_foreign_access(queue->grant_tx_ref[i],
GNTMAP_readonly,
(unsigned long)page_address(queue->grant_tx_page[i]));
queue->grant_tx_page[i] = NULL;
queue->grant_tx_ref[i] = GRANT_INVALID_REF;
- add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);
+ add_id_to_list(&queue->tx_skb_freelist, queue->tx_link, i);
dev_kfree_skb_irq(skb);
}
}
@@ -1637,13 +1625,14 @@ static int xennet_init_queue(struct netf
snprintf(queue->name, sizeof(queue->name), "vif%s-q%u",
devid, queue->id);

- /* Initialise tx_skbs as a free chain containing every entry. */
+ /* Initialise tx_skb_freelist as a free chain containing every entry. */
queue->tx_skb_freelist = 0;
for (i = 0; i < NET_TX_RING_SIZE; i++) {
- skb_entry_set_link(&queue->tx_skbs[i], i+1);
+ queue->tx_link[i] = i + 1;
queue->grant_tx_ref[i] = GRANT_INVALID_REF;
queue->grant_tx_page[i] = NULL;
}
+ queue->tx_link[NET_TX_RING_SIZE - 1] = TX_LINK_NONE;

/* Clear out rx_skbs */
for (i = 0; i < NET_RX_RING_SIZE; i++) {



2021-12-06 15:01:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 34/52] hugetlb: take PMD sharing into account when flushing tlb/caches

From: Mike Kravetz <[email protected]>

commit dff11abe280b47c21b804a8ace318e0638bb9a49 upstream.

When fixing an issue with PMD sharing and migration, it was discovered via
code inspection that other callers of huge_pmd_unshare potentially have an
issue with cache and tlb flushing.

Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst
case ranges for mmu notifiers. Ensure that this range is flushed if
huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hugetlb.c | 53 +++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 43 insertions(+), 10 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3273,14 +3273,19 @@ void __unmap_hugepage_range(struct mmu_g
struct page *page;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
- const unsigned long mmun_start = start; /* For mmu_notifiers */
- const unsigned long mmun_end = end; /* For mmu_notifiers */
+ unsigned long mmun_start = start; /* For mmu_notifiers */
+ unsigned long mmun_end = end; /* For mmu_notifiers */

WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON(start & ~huge_page_mask(h));
BUG_ON(end & ~huge_page_mask(h));

tlb_start_vma(tlb, vma);
+
+ /*
+ * If sharing possible, alert mmu notifiers of worst case.
+ */
+ adjust_range_if_pmd_sharing_possible(vma, &mmun_start, &mmun_end);
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
address = start;
again:
@@ -3387,12 +3392,23 @@ void unmap_hugepage_range(struct vm_area
{
struct mm_struct *mm;
struct mmu_gather tlb;
+ unsigned long tlb_start = start;
+ unsigned long tlb_end = end;
+
+ /*
+ * If shared PMDs were possibly used within this vma range, adjust
+ * start/end for worst case tlb flushing.
+ * Note that we can not be sure if PMDs are shared until we try to
+ * unmap pages. However, we want to make sure TLB flushing covers
+ * the largest possible range.
+ */
+ adjust_range_if_pmd_sharing_possible(vma, &tlb_start, &tlb_end);

mm = vma->vm_mm;

- tlb_gather_mmu(&tlb, mm, start, end);
+ tlb_gather_mmu(&tlb, mm, tlb_start, tlb_end);
__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
- tlb_finish_mmu(&tlb, start, end);
+ tlb_finish_mmu(&tlb, tlb_start, tlb_end);
}

/*
@@ -4068,11 +4084,21 @@ unsigned long hugetlb_change_protection(
pte_t pte;
struct hstate *h = hstate_vma(vma);
unsigned long pages = 0;
+ unsigned long f_start = start;
+ unsigned long f_end = end;
+ bool shared_pmd = false;
+
+ /*
+ * In the case of shared PMDs, the area to flush could be beyond
+ * start/end. Set f_start/f_end to cover the maximum possible
+ * range if PMD sharing is possible.
+ */
+ adjust_range_if_pmd_sharing_possible(vma, &f_start, &f_end);

BUG_ON(address >= end);
- flush_cache_range(vma, address, end);
+ flush_cache_range(vma, f_start, f_end);

- mmu_notifier_invalidate_range_start(mm, start, end);
+ mmu_notifier_invalidate_range_start(mm, f_start, f_end);
i_mmap_lock_write(vma->vm_file->f_mapping);
for (; address < end; address += huge_page_size(h)) {
spinlock_t *ptl;
@@ -4083,6 +4109,7 @@ unsigned long hugetlb_change_protection(
if (huge_pmd_unshare(mm, &address, ptep)) {
pages++;
spin_unlock(ptl);
+ shared_pmd = true;
continue;
}
pte = huge_ptep_get(ptep);
@@ -4117,12 +4144,18 @@ unsigned long hugetlb_change_protection(
* Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare
* may have cleared our pud entry and done put_page on the page table:
* once we release i_mmap_rwsem, another task can do the final put_page
- * and that page table be reused and filled with junk.
+ * and that page table be reused and filled with junk. If we actually
+ * did unshare a page of pmds, flush the range corresponding to the pud.
*/
- flush_tlb_range(vma, start, end);
- mmu_notifier_invalidate_range(mm, start, end);
+ if (shared_pmd) {
+ flush_tlb_range(vma, f_start, f_end);
+ mmu_notifier_invalidate_range(mm, f_start, f_end);
+ } else {
+ flush_tlb_range(vma, start, end);
+ mmu_notifier_invalidate_range(mm, start, end);
+ }
i_mmap_unlock_write(vma->vm_file->f_mapping);
- mmu_notifier_invalidate_range_end(mm, start, end);
+ mmu_notifier_invalidate_range_end(mm, f_start, f_end);

return pages << h->order;
}



2021-12-06 15:01:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 33/52] tty: hvc: replace BUG_ON() with negative return value

From: Juergen Gross <[email protected]>

commit e679004dec37566f658a255157d3aed9d762a2b7 upstream.

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/tty/hvc/hvc_xen.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -98,7 +98,11 @@ static int __write_console(struct xencon
cons = intf->out_cons;
prod = intf->out_prod;
mb(); /* update queue values before going on */
- BUG_ON((prod - cons) > sizeof(intf->out));
+
+ if ((prod - cons) > sizeof(intf->out)) {
+ pr_err_once("xencons: Illegal ring page indices");
+ return -EINVAL;
+ }

while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
intf->out[MASK_XENCONS_IDX(prod++, intf->out)] = data[sent++];
@@ -126,7 +130,10 @@ static int domU_write_console(uint32_t v
*/
while (len) {
int sent = __write_console(cons, data, len);
-
+
+ if (sent < 0)
+ return sent;
+
data += sent;
len -= sent;

@@ -150,7 +157,11 @@ static int domU_read_console(uint32_t vt
cons = intf->in_cons;
prod = intf->in_prod;
mb(); /* get pointers before reading ring */
- BUG_ON((prod - cons) > sizeof(intf->in));
+
+ if ((prod - cons) > sizeof(intf->in)) {
+ pr_err_once("xencons: Illegal ring page indices");
+ return -EINVAL;
+ }

while (cons != prod && recv < len)
buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];



2021-12-06 15:01:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 42/52] sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl

From: Baokun Li <[email protected]>

commit 6c8ad7e8cf29eb55836e7a0215f967746ab2b504 upstream.

When the `rmmod sata_fsl.ko` command is executed in the PPC64 GNU/Linux,
a bug is reported:
==================================================================
BUG: Unable to handle kernel data access on read at 0x80000800805b502c
Oops: Kernel access of bad area, sig: 11 [#1]
NIP [c0000000000388a4] .ioread32+0x4/0x20
LR [80000000000c6034] .sata_fsl_port_stop+0x44/0xe0 [sata_fsl]
Call Trace:
.free_irq+0x1c/0x4e0 (unreliable)
.ata_host_stop+0x74/0xd0 [libata]
.release_nodes+0x330/0x3f0
.device_release_driver_internal+0x178/0x2c0
.driver_detach+0x64/0xd0
.bus_remove_driver+0x70/0xf0
.driver_unregister+0x38/0x80
.platform_driver_unregister+0x14/0x30
.fsl_sata_driver_exit+0x18/0xa20 [sata_fsl]
.__se_sys_delete_module+0x1ec/0x2d0
.system_call_exception+0xfc/0x1f0
system_call_common+0xf8/0x200
==================================================================

The triggering of the BUG is shown in the following stack:

driver_detach
device_release_driver_internal
__device_release_driver
drv->remove(dev) --> platform_drv_remove/platform_remove
drv->remove(dev) --> sata_fsl_remove
iounmap(host_priv->hcr_base); <---- unmap
kfree(host_priv); <---- free
devres_release_all
release_nodes
dr->node.release(dev, dr->data) --> ata_host_stop
ap->ops->port_stop(ap) --> sata_fsl_port_stop
ioread32(hcr_base + HCONTROL) <---- UAF
host->ops->host_stop(host)

The iounmap(host_priv->hcr_base) and kfree(host_priv) functions should
not be executed in drv->remove. These functions should be executed in
host_stop after port_stop. Therefore, we move these functions to the
new function sata_fsl_host_stop and bind the new function to host_stop.

Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller")
Cc: [email protected]
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Sergei Shtylyov <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/ata/sata_fsl.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/ata/sata_fsl.c
+++ b/drivers/ata/sata_fsl.c
@@ -1406,6 +1406,14 @@ static int sata_fsl_init_controller(stru
return 0;
}

+static void sata_fsl_host_stop(struct ata_host *host)
+{
+ struct sata_fsl_host_priv *host_priv = host->private_data;
+
+ iounmap(host_priv->hcr_base);
+ kfree(host_priv);
+}
+
/*
* scsi mid-layer and libata interface structures
*/
@@ -1438,6 +1446,8 @@ static struct ata_port_operations sata_f
.port_start = sata_fsl_port_start,
.port_stop = sata_fsl_port_stop,

+ .host_stop = sata_fsl_host_stop,
+
.pmp_attach = sata_fsl_pmp_attach,
.pmp_detach = sata_fsl_pmp_detach,
};
@@ -1572,8 +1582,6 @@ static int sata_fsl_remove(struct platfo
ata_host_detach(host);

irq_dispose_mapping(host_priv->irq);
- iounmap(host_priv->hcr_base);
- kfree(host_priv);

return 0;
}



2021-12-06 15:01:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 39/52] net: tulip: de4x5: fix the problem that the array lp->phy[8] may be out of bound

From: zhangyue <[email protected]>

[ Upstream commit 61217be886b5f7402843677e4be7e7e83de9cb41 ]

In line 5001, if all id in the array 'lp->phy[8]' is not 0, when the
'for' end, the 'k' is 8.

At this time, the array 'lp->phy[8]' may be out of bound.

Signed-off-by: zhangyue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/dec/tulip/de4x5.c | 30 +++++++++++++++-----------
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c
index 7799cf33cc6e2..7c4150a83a082 100644
--- a/drivers/net/ethernet/dec/tulip/de4x5.c
+++ b/drivers/net/ethernet/dec/tulip/de4x5.c
@@ -4992,19 +4992,23 @@ mii_get_phy(struct net_device *dev)
}
if ((j == limit) && (i < DE4X5_MAX_MII)) {
for (k=0; k < DE4X5_MAX_PHY && lp->phy[k].id; k++);
- lp->phy[k].addr = i;
- lp->phy[k].id = id;
- lp->phy[k].spd.reg = GENERIC_REG; /* ANLPA register */
- lp->phy[k].spd.mask = GENERIC_MASK; /* 100Mb/s technologies */
- lp->phy[k].spd.value = GENERIC_VALUE; /* TX & T4, H/F Duplex */
- lp->mii_cnt++;
- lp->active++;
- printk("%s: Using generic MII device control. If the board doesn't operate,\nplease mail the following dump to the author:\n", dev->name);
- j = de4x5_debug;
- de4x5_debug |= DEBUG_MII;
- de4x5_dbg_mii(dev, k);
- de4x5_debug = j;
- printk("\n");
+ if (k < DE4X5_MAX_PHY) {
+ lp->phy[k].addr = i;
+ lp->phy[k].id = id;
+ lp->phy[k].spd.reg = GENERIC_REG; /* ANLPA register */
+ lp->phy[k].spd.mask = GENERIC_MASK; /* 100Mb/s technologies */
+ lp->phy[k].spd.value = GENERIC_VALUE; /* TX & T4, H/F Duplex */
+ lp->mii_cnt++;
+ lp->active++;
+ printk("%s: Using generic MII device control. If the board doesn't operate,\nplease mail the following dump to the author:\n", dev->name);
+ j = de4x5_debug;
+ de4x5_debug |= DEBUG_MII;
+ de4x5_dbg_mii(dev, k);
+ de4x5_debug = j;
+ printk("\n");
+ } else {
+ goto purgatory;
+ }
}
}
purgatory:
--
2.33.0




2021-12-06 15:01:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 37/52] s390/setup: avoid using memblock_enforce_memory_limit

From: Vasily Gorbik <[email protected]>

[ Upstream commit 5dbc4cb4667457b0c53bcd7bff11500b3c362975 ]

There is a difference in how architectures treat "mem=" option. For some
that is an amount of online memory, for s390 and x86 this is the limiting
max address. Some memblock api like memblock_enforce_memory_limit()
take limit argument and explicitly treat it as the size of online memory,
and use __find_max_addr to convert it to an actual max address. Current
s390 usage:

memblock_enforce_memory_limit(memblock_end_of_DRAM());

yields different results depending on presence of memory holes (offline
memory blocks in between online memory). If there are no memory holes
limit == max_addr in memblock_enforce_memory_limit() and it does trim
online memory and reserved memory regions. With memory holes present it
actually does nothing.

Since we already use memblock_remove() explicitly to trim online memory
regions to potential limit (think mem=, kdump, addressing limits, etc.)
drop the usage of memblock_enforce_memory_limit() altogether. Trimming
reserved regions should not be required, since we now use
memblock_set_current_limit() to limit allocations and any explicit memory
reservations above the limit is an actual problem we should not hide.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/s390/kernel/setup.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index fdc5e76e1f6b0..a765b4936c10c 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -687,9 +687,6 @@ static void __init setup_memory(void)
storage_key_init_range(reg->base, reg->base + reg->size);
}
psw_set_key(PAGE_DEFAULT_KEY);
-
- /* Only cosmetics */
- memblock_enforce_memory_limit(memblock_end_of_DRAM());
}

/*
--
2.33.0




2021-12-06 15:01:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 41/52] kprobes: Limit max data_size of the kretprobe instances

From: Masami Hiramatsu <[email protected]>

commit 6bbfa44116689469267f1a6e3d233b52114139d2 upstream.

The 'kprobe::data_size' is unsigned, thus it can not be negative. But if
user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
+ sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
kretprobe_instance) or zero. In result, the kretprobe_instance are
allocated without enough memory, and kretprobe accesses outside of
allocated memory.

To avoid this issue, introduce a max limitation of the
kretprobe::data_size. 4KB per instance should be OK.

Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2

Cc: [email protected]
Fixes: f47cd9b553aa ("kprobes: kretprobe user entry-handler")
Reported-by: zhangyue <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/kprobes.h | 2 ++
kernel/kprobes.c | 3 +++
2 files changed, 5 insertions(+)

--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -192,6 +192,8 @@ struct kretprobe {
raw_spinlock_t lock;
};

+#define KRETPROBE_MAX_DATA_SIZE 4096
+
struct kretprobe_instance {
struct hlist_node hlist;
struct kretprobe *rp;
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1899,6 +1899,9 @@ int register_kretprobe(struct kretprobe
}
}

+ if (rp->data_size > KRETPROBE_MAX_DATA_SIZE)
+ return -E2BIG;
+
rp->kp.pre_handler = pre_handler_kretprobe;
rp->kp.post_handler = NULL;
rp->kp.fault_handler = NULL;



2021-12-06 15:01:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 44/52] fs: add fget_many() and fput_many()

From: Jens Axboe <[email protected]>

commit 091141a42e15fe47ada737f3996b317072afcefb upstream.

Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.

Add fget_many(), which works just like fget(), except it takes an
argument for how many references to get on the file. Ditto fput_many(),
which can drop an arbitrary number of references to a file.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/file.c | 15 ++++++++++-----
fs/file_table.c | 9 +++++++--
include/linux/file.h | 2 ++
include/linux/fs.h | 4 +++-
4 files changed, 22 insertions(+), 8 deletions(-)

--- a/fs/file.c
+++ b/fs/file.c
@@ -691,7 +691,7 @@ void do_close_on_exec(struct files_struc
spin_unlock(&files->file_lock);
}

-static struct file *__fget(unsigned int fd, fmode_t mask)
+static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs)
{
struct files_struct *files = current->files;
struct file *file;
@@ -706,7 +706,7 @@ loop:
*/
if (file->f_mode & mask)
file = NULL;
- else if (!get_file_rcu(file))
+ else if (!get_file_rcu_many(file, refs))
goto loop;
}
rcu_read_unlock();
@@ -714,15 +714,20 @@ loop:
return file;
}

+struct file *fget_many(unsigned int fd, unsigned int refs)
+{
+ return __fget(fd, FMODE_PATH, refs);
+}
+
struct file *fget(unsigned int fd)
{
- return __fget(fd, FMODE_PATH);
+ return __fget(fd, FMODE_PATH, 1);
}
EXPORT_SYMBOL(fget);

struct file *fget_raw(unsigned int fd)
{
- return __fget(fd, 0);
+ return __fget(fd, 0, 1);
}
EXPORT_SYMBOL(fget_raw);

@@ -753,7 +758,7 @@ static unsigned long __fget_light(unsign
return 0;
return (unsigned long)file;
} else {
- file = __fget(fd, mask);
+ file = __fget(fd, mask, 1);
if (!file)
return 0;
return FDPUT_FPUT | (unsigned long)file;
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,9 +261,9 @@ void flush_delayed_fput(void)

static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);

-void fput(struct file *file)
+void fput_many(struct file *file, unsigned int refs)
{
- if (atomic_long_dec_and_test(&file->f_count)) {
+ if (atomic_long_sub_and_test(refs, &file->f_count)) {
struct task_struct *task = current;

if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
@@ -282,6 +282,11 @@ void fput(struct file *file)
}
}

+void fput(struct file *file)
+{
+ fput_many(file, 1);
+}
+
/*
* synchronous analog of fput(); for kernel threads that might be needed
* in some umount() (and thus can't use flush_delayed_fput() without
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
struct file;

extern void fput(struct file *);
+extern void fput_many(struct file *, unsigned int);

struct file_operations;
struct vfsmount;
@@ -40,6 +41,7 @@ static inline void fdput(struct fd fd)
}

extern struct file *fget(unsigned int fd);
+extern struct file *fget_many(unsigned int fd, unsigned int refs);
extern struct file *fget_raw(unsigned int fd);
extern unsigned long __fdget(unsigned int fd);
extern unsigned long __fdget_raw(unsigned int fd);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -923,7 +923,9 @@ static inline struct file *get_file(stru
atomic_long_inc(&f->f_count);
return f;
}
-#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count)
+#define get_file_rcu_many(x, cnt) \
+ atomic_long_add_unless(&(x)->f_count, (cnt), 0)
+#define get_file_rcu(x) get_file_rcu_many((x), 1)
#define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1)
#define file_count(x) atomic_long_read(&(x)->f_count)




2021-12-06 15:01:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 15/52] ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE

From: Takashi Iwai <[email protected]>

[ Upstream commit 187bea472600dcc8d2eb714335053264dd437172 ]

When CONFIG_FORTIFY_SOURCE is set, memcpy() checks the potential
buffer overflow and panics. The code in sofcpga bootstrapping
contains the memcpy() calls are mistakenly translated as the shorter
size, hence it triggers a panic as if it were overflowing.

This patch changes the secondary_trampoline and *_end definitions
to arrays for avoiding the false-positive crash above.

Fixes: 9c4566a117a6 ("ARM: socfpga: Enable SMP for socfpga")
Suggested-by: Kees Cook <[email protected]>
Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192473
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/arm/mach-socfpga/core.h | 2 +-
arch/arm/mach-socfpga/platsmp.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mach-socfpga/core.h b/arch/arm/mach-socfpga/core.h
index 5bc6ea87cdf74..1b41e23db98ef 100644
--- a/arch/arm/mach-socfpga/core.h
+++ b/arch/arm/mach-socfpga/core.h
@@ -44,7 +44,7 @@ extern void __iomem *sdr_ctl_base_addr;
u32 socfpga_sdram_self_refresh(u32 sdr_base);
extern unsigned int socfpga_sdram_self_refresh_sz;

-extern char secondary_trampoline, secondary_trampoline_end;
+extern char secondary_trampoline[], secondary_trampoline_end[];

extern unsigned long socfpga_cpu1start_addr;

diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
index 15c8ce8965f43..ff1d13d3ef72e 100644
--- a/arch/arm/mach-socfpga/platsmp.c
+++ b/arch/arm/mach-socfpga/platsmp.c
@@ -31,14 +31,14 @@

static int socfpga_boot_secondary(unsigned int cpu, struct task_struct *idle)
{
- int trampoline_size = &secondary_trampoline_end - &secondary_trampoline;
+ int trampoline_size = secondary_trampoline_end - secondary_trampoline;

if (socfpga_cpu1start_addr) {
/* This will put CPU #1 into reset. */
writel(RSTMGR_MPUMODRST_CPU1,
rst_manager_base_addr + SOCFPGA_RSTMGR_MODMPURST);

- memcpy(phys_to_virt(0), &secondary_trampoline, trampoline_size);
+ memcpy(phys_to_virt(0), secondary_trampoline, trampoline_size);

writel(virt_to_phys(secondary_startup),
sys_manager_base_addr + (socfpga_cpu1start_addr & 0x000000ff));
@@ -56,12 +56,12 @@ static int socfpga_boot_secondary(unsigned int cpu, struct task_struct *idle)

static int socfpga_a10_boot_secondary(unsigned int cpu, struct task_struct *idle)
{
- int trampoline_size = &secondary_trampoline_end - &secondary_trampoline;
+ int trampoline_size = secondary_trampoline_end - secondary_trampoline;

if (socfpga_cpu1start_addr) {
writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr +
SOCFPGA_A10_RSTMGR_MODMPURST);
- memcpy(phys_to_virt(0), &secondary_trampoline, trampoline_size);
+ memcpy(phys_to_virt(0), secondary_trampoline, trampoline_size);

writel(virt_to_phys(secondary_startup),
sys_manager_base_addr + (socfpga_cpu1start_addr & 0x00000fff));
--
2.33.0




2021-12-06 15:01:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 43/52] sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl

From: Baokun Li <[email protected]>

commit 6f48394cf1f3e8486591ad98c11cdadb8f1ef2ad upstream.

Trying to remove the fsl-sata module in the PPC64 GNU/Linux
leads to the following warning:
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/69',
leaking at least 'fsl-sata[ff0221000.sata]'
WARNING: CPU: 3 PID: 1048 at fs/proc/generic.c:722
.remove_proc_entry+0x20c/0x220
IRQMASK: 0
NIP [c00000000033826c] .remove_proc_entry+0x20c/0x220
LR [c000000000338268] .remove_proc_entry+0x208/0x220
Call Trace:
.remove_proc_entry+0x208/0x220 (unreliable)
.unregister_irq_proc+0x104/0x140
.free_desc+0x44/0xb0
.irq_free_descs+0x9c/0xf0
.irq_dispose_mapping+0x64/0xa0
.sata_fsl_remove+0x58/0xa0 [sata_fsl]
.platform_drv_remove+0x40/0x90
.device_release_driver_internal+0x160/0x2c0
.driver_detach+0x64/0xd0
.bus_remove_driver+0x70/0xf0
.driver_unregister+0x38/0x80
.platform_driver_unregister+0x14/0x30
.fsl_sata_driver_exit+0x18/0xa20 [sata_fsl]
---[ end trace 0ea876d4076908f5 ]---

The driver creates the mapping by calling irq_of_parse_and_map(),
so it also has to dispose the mapping. But the easy way out is to
simply use platform_get_irq() instead of irq_of_parse_map(). Also
we should adapt return value checking and propagate error values.

In this case the mapping is not managed by the device but by
the of core, so the device has not to dispose the mapping.

Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller")
Cc: [email protected]
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Sergei Shtylyov <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/ata/sata_fsl.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

--- a/drivers/ata/sata_fsl.c
+++ b/drivers/ata/sata_fsl.c
@@ -1502,9 +1502,9 @@ static int sata_fsl_probe(struct platfor
host_priv->ssr_base = ssr_base;
host_priv->csr_base = csr_base;

- irq = irq_of_parse_and_map(ofdev->dev.of_node, 0);
- if (!irq) {
- dev_err(&ofdev->dev, "invalid irq from platform\n");
+ irq = platform_get_irq(ofdev, 0);
+ if (irq < 0) {
+ retval = irq;
goto error_exit_with_cleanup;
}
host_priv->irq = irq;
@@ -1581,8 +1581,6 @@ static int sata_fsl_remove(struct platfo

ata_host_detach(host);

- irq_dispose_mapping(host_priv->irq);
-
return 0;
}




2021-12-06 15:01:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 14/52] NFSv42: Dont fail clone() unless the OP_CLONE operation failed

From: Trond Myklebust <[email protected]>

[ Upstream commit d3c45824ad65aebf765fcf51366d317a29538820 ]

The failure to retrieve post-op attributes has no bearing on whether or
not the clone operation itself was successful. We must therefore ignore
the return value of decode_getfattr() when looking at the success or
failure of nfs4_xdr_dec_clone().

Fixes: 36022770de6c ("nfs42: add CLONE xdr functions")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/nfs/nfs42xdr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 0ca482a51e532..988d262029580 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -439,8 +439,7 @@ static int nfs4_xdr_dec_clone(struct rpc_rqst *rqstp,
status = decode_clone(xdr);
if (status)
goto out;
- status = decode_getfattr(xdr, res->dst_fattr, res->server);
-
+ decode_getfattr(xdr, res->dst_fattr, res->server);
out:
res->rpc_status = status;
return status;
--
2.33.0




2021-12-06 15:01:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 40/52] net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()

From: Teng Qi <[email protected]>

[ Upstream commit 0fa68da72c3be09e06dd833258ee89c33374195f ]

The definition of macro MOTO_SROM_BUG is:
#define MOTO_SROM_BUG (lp->active == 8 && (get_unaligned_le32(
dev->dev_addr) & 0x00ffffff) == 0x3e0008)

and the if statement
if (MOTO_SROM_BUG) lp->active = 0;

using this macro indicates lp->active could be 8. If lp->active is 8 and
the second comparison of this macro is false. lp->active will remain 8 in:
lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].mci = *p;

However, the length of array lp->phy is 8, so array overflows can occur.
To fix these possible array overflows, we first check lp->active and then
return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).

Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Teng Qi <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/dec/tulip/de4x5.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/ethernet/dec/tulip/de4x5.c
+++ b/drivers/net/ethernet/dec/tulip/de4x5.c
@@ -4701,6 +4701,10 @@ type3_infoblock(struct net_device *dev,
lp->ibn = 3;
lp->active = *p++;
if (MOTO_SROM_BUG) lp->active = 0;
+ /* if (MOTO_SROM_BUG) statement indicates lp->active could
+ * be 8 (i.e. the size of array lp->phy) */
+ if (WARN_ON(lp->active >= ARRAY_SIZE(lp->phy)))
+ return -EINVAL;
lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;



2021-12-06 15:01:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 16/52] scsi: mpt3sas: Fix kernel panic during drive powercycle test

From: Sreekanth Reddy <[email protected]>

[ Upstream commit 0ee4ba13e09c9d9c1cb6abb59da8295d9952328b ]

While looping over shost's sdev list it is possible that one
of the drives is getting removed and its sas_target object is
freed but its sdev object remains intact.

Consequently, a kernel panic can occur while the driver is trying to access
the sas_address field of sas_target object without also checking the
sas_target object for NULL.

Link: https://lore.kernel.org/r/[email protected]
Fixes: f92363d12359 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Sreekanth Reddy <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 49b751a8f5f3b..0e39bb1489ac7 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2904,7 +2904,7 @@ _scsih_ublock_io_device(struct MPT3SAS_ADAPTER *ioc, u64 sas_address)

shost_for_each_device(sdev, ioc->shost) {
sas_device_priv_data = sdev->hostdata;
- if (!sas_device_priv_data)
+ if (!sas_device_priv_data || !sas_device_priv_data->sas_target)
continue;
if (sas_device_priv_data->sas_target->sas_address
!= sas_address)
--
2.33.0




2021-12-06 15:01:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 17/52] tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows

From: Eric Dumazet <[email protected]>

[ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]

While testing BIG TCP patch series, I was expecting that TCP_RR workloads
with 80KB requests/answers would send one 80KB TSO packet,
then being received as a single GRO packet.

It turns out this was not happening, and the root cause was that
cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC.

Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC
needed a budget of ~20 segments.

Ideally these TCP_RR flows should not exit slow start.

Cubic Hystart should reset itself at each round, instead of assuming
every TCP flow is a bulk one.

Note that even after this patch, Hystart can still trigger, depending
on scheduling artifacts, but at a higher CWND/SSTHRESH threshold,
keeping optimal TSO packet sizes.

Tested:

ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072
nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests"

Before:

8605
Ip6InReceives 87541 0.0
Ip6OutRequests 129496 0.0
TcpExtTCPHystartTrainDetect 1 0.0
TcpExtTCPHystartTrainCwnd 30 0.0

After:

8760
Ip6InReceives 88514 0.0
Ip6OutRequests 87975 0.0

Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3")
Co-developed-by: Neal Cardwell <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_cubic.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 9fb3a5e83a7c7..e0b3b194b6049 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -342,8 +342,6 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
return;

if (tcp_in_slow_start(tp)) {
- if (hystart && after(ack, ca->end_seq))
- bictcp_hystart_reset(sk);
acked = tcp_slow_start(tp, acked);
if (!acked)
return;
@@ -394,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
if (ca->found & hystart_detect)
return;

+ if (after(tp->snd_una, ca->end_seq))
+ bictcp_hystart_reset(sk);
+
if (hystart_detect & HYSTART_ACK_TRAIN) {
u32 now = bictcp_clock();

--
2.33.0




2021-12-06 15:01:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 11/52] ARM: dts: BCM5301X: Add interrupt properties to GPIO node

From: Florian Fainelli <[email protected]>

[ Upstream commit 40f7342f0587639e5ad625adaa15efdd3cffb18f ]

The GPIO controller is also an interrupt controller provider and is
currently missing the appropriate 'interrupt-controller' and
'#interrupt-cells' properties to denote that.

Fixes: fb026d3de33b ("ARM: BCM5301X: Add Broadcom's bus-axi to the DTS file")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/arm/boot/dts/bcm5301x.dtsi | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi
index de8ac998604de..47d721241408b 100644
--- a/arch/arm/boot/dts/bcm5301x.dtsi
+++ b/arch/arm/boot/dts/bcm5301x.dtsi
@@ -175,6 +175,8 @@ chipcommon: chipcommon@0 {

gpio-controller;
#gpio-cells = <2>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
};
};

--
2.33.0




2021-12-06 15:01:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 20/52] proc/vmcore: fix clearing user buffer by properly using clear_user()

From: David Hildenbrand <[email protected]>

commit c1e63117711977cc4295b2ce73de29dd17066c82 upstream.

To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":

systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().

To fix, properly use clear_user() when we're dealing with a user buffer.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/proc/vmcore.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -105,14 +105,19 @@ static ssize_t read_from_oldmem(char *bu
nr_bytes = count;

/* If pfn is not ram, return zeros for sparse dump files */
- if (pfn_is_ram(pfn) == 0)
- memset(buf, 0, nr_bytes);
- else {
+ if (pfn_is_ram(pfn) == 0) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
- if (tmp < 0)
- return tmp;
}
+ if (tmp < 0)
+ return tmp;
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;



2021-12-06 15:02:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 18/52] tracing: Check pid filtering when creating events

From: Steven Rostedt (VMware) <[email protected]>

commit 6cb206508b621a9a0a2c35b60540e399225c8243 upstream.

When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.

If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.

Cc: [email protected]
Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/trace/trace_events.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2341,12 +2341,19 @@ static struct trace_event_file *
trace_create_new_event(struct trace_event_call *call,
struct trace_array *tr)
{
+ struct trace_pid_list *pid_list;
struct trace_event_file *file;

file = kmem_cache_alloc(file_cachep, GFP_TRACE);
if (!file)
return NULL;

+ pid_list = rcu_dereference_protected(tr->filtered_pids,
+ lockdep_is_held(&event_mutex));
+
+ if (pid_list)
+ file->flags |= EVENT_FILE_FL_PID_FILTER;
+
file->event_call = call;
file->tr = tr;
atomic_set(&file->sm_ref, 0);



2021-12-06 15:02:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 19/52] hugetlbfs: flush TLBs correctly after huge_pmd_unshare

From: Nadav Amit <[email protected]>

commit a4a118f2eead1d6c49e00765de89878288d4b890 upstream.

When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing. This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.

Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.

Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.

Fixes: 24669e58477e ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/include/asm/tlb.h | 8 ++++++++
arch/ia64/include/asm/tlb.h | 10 ++++++++++
arch/s390/include/asm/tlb.h | 13 +++++++++++++
arch/sh/include/asm/tlb.h | 10 ++++++++++
arch/um/include/asm/tlb.h | 12 ++++++++++++
include/asm-generic/tlb.h | 7 +++++++
mm/hugetlb.c | 5 ++++-
7 files changed, 64 insertions(+), 1 deletion(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -257,6 +257,14 @@ tlb_remove_pmd_tlb_entry(struct mmu_gath
tlb_add_flush(tlb, addr);
}

+static inline void
+tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address,
+ unsigned long size)
+{
+ tlb_add_flush(tlb, address);
+ tlb_add_flush(tlb, address + size - PMD_SIZE);
+}
+
#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -251,6 +251,16 @@ __tlb_remove_tlb_entry (struct mmu_gathe
tlb->end_addr = address + PAGE_SIZE;
}

+static inline void
+tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address,
+ unsigned long size)
+{
+ if (tlb->start_addr > address)
+ tlb->start_addr = address;
+ if (tlb->end_addr < address + size)
+ tlb->end_addr = address + size;
+}
+
#define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)

#define tlb_start_vma(tlb, vma) do { } while (0)
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -97,6 +97,19 @@ static inline void tlb_remove_page(struc
{
free_page_and_swap_cache(page);
}
+static inline void tlb_flush_pmd_range(struct mmu_gather *tlb,
+ unsigned long address, unsigned long size)
+{
+ /*
+ * the range might exceed the original range that was provided to
+ * tlb_gather_mmu(), so we need to update it despite the fact it is
+ * usually not updated.
+ */
+ if (tlb->start > address)
+ tlb->start = address;
+ if (tlb->end < address + size)
+ tlb->end = address + size;
+}

/*
* pte_free_tlb frees a pte table and clears the CRSTE for the
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -65,6 +65,16 @@ tlb_remove_tlb_entry(struct mmu_gather *
tlb->end = address + PAGE_SIZE;
}

+static inline void
+tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address,
+ unsigned long size)
+{
+ if (tlb->start > address)
+ tlb->start = address;
+ if (tlb->end < address + size)
+ tlb->end = address + size;
+}
+
/*
* In the case of tlb vma handling, we can optimise these away in the
* case where we're doing a full MM flush. When we're doing a munmap,
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -110,6 +110,18 @@ static inline void tlb_remove_page(struc
__tlb_remove_page(tlb, page);
}

+static inline void
+tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address,
+ unsigned long size)
+{
+ tlb->need_flush = 1;
+
+ if (tlb->start > address)
+ tlb->start = address;
+ if (tlb->end < address + size)
+ tlb->end = address + size;
+}
+
/**
* tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
*
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -165,6 +165,13 @@ static inline void __tlb_reset_range(str
#define tlb_end_vma __tlb_end_vma
#endif

+static inline void tlb_flush_pmd_range(struct mmu_gather *tlb,
+ unsigned long address, unsigned long size)
+{
+ tlb->start = min(tlb->start, address);
+ tlb->end = max(tlb->end, address + size);
+}
+
#ifndef __tlb_remove_tlb_entry
#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
#endif
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3290,8 +3290,11 @@ again:
continue;

ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, &address, ptep))
+ if (huge_pmd_unshare(mm, &address, ptep)) {
+ tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
+ force_flush = 1;
goto unlock;
+ }

pte = huge_ptep_get(ptep);
if (huge_pte_none(pte))



2021-12-06 15:02:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 52/52] serial: pl011: Add ACPI SBSA UART match id

From: Pierre Gondois <[email protected]>

commit ac442a077acf9a6bf1db4320ec0c3f303be092b3 upstream.

The document 'ACPI for Arm Components 1.0' defines the following
_HID mappings:
-'Prime cell UART (PL011)': ARMH0011
-'SBSA UART': ARMHB000

Use the sbsa-uart driver when a device is described with
the 'ARMHB000' _HID.

Note:
PL011 devices currently use the sbsa-uart driver instead of the
uart-pl011 driver. Indeed, PL011 devices are not bound to a clock
in ACPI. It is not possible to change their baudrate.

Cc: <[email protected]>
Signed-off-by: Pierre Gondois <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/tty/serial/amba-pl011.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2493,6 +2493,7 @@ MODULE_DEVICE_TABLE(of, sbsa_uart_of_mat

static const struct acpi_device_id sbsa_uart_acpi_match[] = {
{ "ARMH0011", 0 },
+ { "ARMHB000", 0 },
{},
};
MODULE_DEVICE_TABLE(acpi, sbsa_uart_acpi_match);



2021-12-06 15:02:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 49/52] parisc: Fix "make install" on newer debian releases

From: Helge Deller <[email protected]>

commit 0f9fee4cdebfbe695c297e5b603a275e2557c1cc upstream.

On newer debian releases the debian-provided "installkernel" script is
installed in /usr/sbin. Fix the kernel install.sh script to look for the
script in this directory as well.

Signed-off-by: Helge Deller <[email protected]>
Cc: <[email protected]> # v3.13+
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/parisc/install.sh | 1 +
1 file changed, 1 insertion(+)

--- a/arch/parisc/install.sh
+++ b/arch/parisc/install.sh
@@ -39,6 +39,7 @@ verify "$3"
if [ -n "${INSTALLKERNEL}" ]; then
if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi
if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi
+ if [ -x /usr/sbin/${INSTALLKERNEL} ]; then exec /usr/sbin/${INSTALLKERNEL} "$@"; fi
fi

# Default install



2021-12-06 15:02:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 45/52] fget: check that the fd still exists after getting a ref to it

From: Linus Torvalds <[email protected]>

commit 054aa8d439b9185d4f5eb9a90282d1ce74772969 upstream.

Jann Horn points out that there is another possible race wrt Unix domain
socket garbage collection, somewhat reminiscent of the one fixed in
commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").

See the extended comment about the garbage collection requirements added
to unix_peek_fds() by that commit for details.

The race comes from how we can locklessly look up a file descriptor just
as it is in the process of being closed, and with the right artificial
timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
Unix domain socket garbage collector could see the reference count
decrement of the close() happen before fget() took its reference to the
file and the file was attached onto a new file descriptor.

This is all (intentionally) correct on the 'struct file *' side, with
RCU lookups and lockless reference counting very much part of the
design. Getting that reference count out of order isn't a problem per
se.

But the garbage collector can get confused by seeing this situation of
having seen a file not having any remaining external references and then
seeing it being attached to an fd.

In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the
fix was to serialize the file descriptor install with the garbage
collector by taking and releasing the unix_gc_lock.

That's not really an option here, but since this all happens when we are
in the process of looking up a file descriptor, we can instead simply
just re-check that the file hasn't been closed in the meantime, and just
re-do the lookup if we raced with a concurrent close() of the same file
descriptor.

Reported-and-tested-by: Jann Horn <[email protected]>
Acked-by: Miklos Szeredi <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/file.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/fs/file.c
+++ b/fs/file.c
@@ -708,6 +708,10 @@ loop:
file = NULL;
else if (!get_file_rcu_many(file, refs))
goto loop;
+ else if (__fcheck_files(files, fd) != file) {
+ fput_many(file, refs);
+ goto loop;
+ }
}
rcu_read_unlock();




2021-12-06 15:03:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 50/52] vgacon: Propagate console boot parameters before calling `vc_resize

From: Maciej W. Rozycki <[email protected]>

commit 3dfac26e2ef29ff2abc2a75aa4cd48fce25a2c4b upstream.

Fix a division by zero in `vgacon_resize' with a backtrace like:

vgacon_resize
vc_do_resize
vgacon_init
do_bind_con_driver
do_unbind_con_driver
fbcon_fb_unbind
do_unregister_framebuffer
do_register_framebuffer
register_framebuffer
__drm_fb_helper_initial_config_and_unlock
drm_helper_hpd_irq_event
dw_hdmi_irq
irq_thread
kthread

caused by `c->vc_cell_height' not having been initialized. This has
only started to trigger with commit 860dafa90259 ("vt: Fix character
height handling with VT_RESIZEX"), however the ultimate offender is
commit 50ec42edd978 ("[PATCH] Detaching fbcon: fix vgacon to allow
retaking of the console").

Said commit has added a call to `vc_resize' whenever `vgacon_init' is
called with the `init' argument set to 0, which did not happen before.
And the call is made before a key vgacon boot parameter retrieved in
`vgacon_startup' has been propagated in `vgacon_init' for `vc_resize' to
use to the console structure being worked on. Previously the parameter
was `c->vc_font.height' and now it is `c->vc_cell_height'.

In this particular scenario the registration of fbcon has failed and vt
resorts to vgacon. Now fbcon does have initialized `c->vc_font.height'
somehow, unlike `c->vc_cell_height', which is why this code did not
crash before, but either way the boot parameters should have been copied
to the console structure ahead of the call to `vc_resize' rather than
afterwards, so that first the call has a chance to use them and second
they do not change the console structure to something possibly different
from what was used by `vc_resize'.

Move the propagation of the vgacon boot parameters ahead of the call to
`vc_resize' then. Adjust the comment accordingly.

Fixes: 50ec42edd978 ("[PATCH] Detaching fbcon: fix vgacon to allow retaking of the console")
Cc: [email protected] # v2.6.18+
Reported-by: Wim Osterholt <[email protected]>
Reported-by: Pavel V. Panteleev <[email protected]>
Signed-off-by: Maciej W. Rozycki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/video/console/vgacon.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ -422,11 +422,17 @@ static void vgacon_init(struct vc_data *
struct uni_pagedir *p;

/*
- * We cannot be loaded as a module, therefore init is always 1,
- * but vgacon_init can be called more than once, and init will
- * not be 1.
+ * We cannot be loaded as a module, therefore init will be 1
+ * if we are the default console, however if we are a fallback
+ * console, for example if fbcon has failed registration, then
+ * init will be 0, so we need to make sure our boot parameters
+ * have been copied to the console structure for vgacon_resize
+ * ultimately called by vc_resize. Any subsequent calls to
+ * vgacon_init init will have init set to 0 too.
*/
c->vc_can_do_color = vga_can_do_color;
+ c->vc_scan_lines = vga_scan_lines;
+ c->vc_font.height = c->vc_cell_height = vga_video_font_height;

/* set dimensions manually if init != 0 since vc_resize() will fail */
if (init) {
@@ -435,8 +441,6 @@ static void vgacon_init(struct vc_data *
} else
vc_resize(c, vga_video_num_columns, vga_video_num_lines);

- c->vc_scan_lines = vga_scan_lines;
- c->vc_font.height = c->vc_cell_height = vga_video_font_height;
c->vc_complement_mask = 0x7700;
if (vga_512_chars)
c->vc_hi_font_mask = 0x0800;



2021-12-06 15:03:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 51/52] tty: serial: msm_serial: Deactivate RX DMA for polling support

From: Sven Eckelmann <[email protected]>

commit 7492ffc90fa126afb67d4392d56cb4134780194a upstream.

The CONSOLE_POLLING mode is used for tools like k(g)db. In this kind of
setup, it is often sharing a serial device with the normal system console.
This is usually no problem because the polling helpers can consume input
values directly (when in kgdb context) and the normal Linux handlers can
only consume new input values after kgdb switched back.

This is not true anymore when RX DMA is enabled for UARTDM controllers.
Single input values can no longer be received correctly. Instead following
seems to happen:

* on 1. input, some old input is read (continuously)
* on 2. input, two old inputs are read (continuously)
* on 3. input, three old input values are read (continuously)
* on 4. input, 4 previous inputs are received

This repeats then for each group of 4 input values.

This behavior changes slightly depending on what state the controller was
when the first input was received. But this makes working with kgdb
basically impossible because control messages are always corrupted when
kgdboc tries to parse them.

RX DMA should therefore be off when CONSOLE_POLLING is enabled to avoid
these kind of problems. No such problem was noticed for TX DMA.

Fixes: 99693945013a ("tty: serial: msm: Add RX DMA support")
Cc: [email protected]
Signed-off-by: Sven Eckelmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/tty/serial/msm_serial.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/tty/serial/msm_serial.c
+++ b/drivers/tty/serial/msm_serial.c
@@ -446,6 +446,9 @@ static void msm_start_rx_dma(struct msm_
u32 val;
int ret;

+ if (IS_ENABLED(CONFIG_CONSOLE_POLL))
+ return;
+
if (!dma->chan)
return;




2021-12-06 15:04:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 48/52] siphash: use _unaligned version by default

From: Arnd Bergmann <[email protected]>

commit f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d upstream.

On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.

Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.

Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.

This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.

Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/
Reported-by: Ard Biesheuvel <[email protected]>
Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Jason A. Donenfeld <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/siphash.h | 14 ++++----------
lib/siphash.c | 12 ++++++------
2 files changed, 10 insertions(+), 16 deletions(-)

--- a/include/linux/siphash.h
+++ b/include/linux/siphash.h
@@ -27,9 +27,7 @@ static inline bool siphash_key_is_zero(c
}

u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key);
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key);
-#endif

u64 siphash_1u64(const u64 a, const siphash_key_t *key);
u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t *key);
@@ -82,10 +80,9 @@ static inline u64 ___siphash_aligned(con
static inline u64 siphash(const void *data, size_t len,
const siphash_key_t *key)
{
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT))
+ if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+ !IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT))
return __siphash_unaligned(data, len, key);
-#endif
return ___siphash_aligned(data, len, key);
}

@@ -96,10 +93,8 @@ typedef struct {

u32 __hsiphash_aligned(const void *data, size_t len,
const hsiphash_key_t *key);
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key);
-#endif

u32 hsiphash_1u32(const u32 a, const hsiphash_key_t *key);
u32 hsiphash_2u32(const u32 a, const u32 b, const hsiphash_key_t *key);
@@ -135,10 +130,9 @@ static inline u32 ___hsiphash_aligned(co
static inline u32 hsiphash(const void *data, size_t len,
const hsiphash_key_t *key)
{
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- if (!IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT))
+ if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+ !IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT))
return __hsiphash_unaligned(data, len, key);
-#endif
return ___hsiphash_aligned(data, len, key);
}

--- a/lib/siphash.c
+++ b/lib/siphash.c
@@ -49,6 +49,7 @@
SIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);

+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -80,8 +81,8 @@ u64 __siphash_aligned(const void *data,
POSTAMBLE
}
EXPORT_SYMBOL(__siphash_aligned);
+#endif

-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -113,7 +114,6 @@ u64 __siphash_unaligned(const void *data
POSTAMBLE
}
EXPORT_SYMBOL(__siphash_unaligned);
-#endif

/**
* siphash_1u64 - compute 64-bit siphash PRF value of a u64
@@ -250,6 +250,7 @@ EXPORT_SYMBOL(siphash_3u32);
HSIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);

+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -280,8 +281,8 @@ u32 __hsiphash_aligned(const void *data,
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_aligned);
+#endif

-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key)
{
@@ -313,7 +314,6 @@ u32 __hsiphash_unaligned(const void *dat
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_unaligned);
-#endif

/**
* hsiphash_1u32 - compute 64-bit hsiphash PRF value of a u32
@@ -418,6 +418,7 @@ EXPORT_SYMBOL(hsiphash_4u32);
HSIPROUND; \
return v1 ^ v3;

+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u32));
@@ -438,8 +439,8 @@ u32 __hsiphash_aligned(const void *data,
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_aligned);
+#endif

-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key)
{
@@ -461,7 +462,6 @@ u32 __hsiphash_unaligned(const void *dat
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_unaligned);
-#endif

/**
* hsiphash_1u32 - compute 32-bit hsiphash PRF value of a u32



2021-12-06 15:04:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 47/52] net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()

From: Zhou Qingyang <[email protected]>

commit e2dabc4f7e7b60299c20a36d6a7b24ed9bf8e572 upstream.

In qlcnic_83xx_add_rings(), the indirect function of
ahw->hw_ops->alloc_mbx_args will be called to allocate memory for
cmd.req.arg, and there is a dereference of it in qlcnic_83xx_add_rings(),
which could lead to a NULL pointer dereference on failure of the
indirect function like qlcnic_83xx_alloc_mbx_args().

Fix this bug by adding a check of alloc_mbx_args(), this patch
imitates the logic of mbx_cmd()'s failure handling.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_QLCNIC=m show no new warnings, and our
static analyzer no longer warns about this code.

Fixes: 7f9664525f9c ("qlcnic: 83xx memory map and HW access routine")
Signed-off-by: Zhou Qingyang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -1076,8 +1076,14 @@ static int qlcnic_83xx_add_rings(struct
sds_mbx_size = sizeof(struct qlcnic_sds_mbx);
context_id = recv_ctx->context_id;
num_sds = adapter->drv_sds_rings - QLCNIC_MAX_SDS_RINGS;
- ahw->hw_ops->alloc_mbx_args(&cmd, adapter,
- QLCNIC_CMD_ADD_RCV_RINGS);
+ err = ahw->hw_ops->alloc_mbx_args(&cmd, adapter,
+ QLCNIC_CMD_ADD_RCV_RINGS);
+ if (err) {
+ dev_err(&adapter->pdev->dev,
+ "Failed to alloc mbx args %d\n", err);
+ return err;
+ }
+
cmd.req.arg[1] = 0 | (num_sds << 8) | (context_id << 16);

/* set up status rings, mbx 2-81 */



2021-12-06 15:04:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 46/52] natsemi: xtensa: fix section mismatch warnings

From: Randy Dunlap <[email protected]>

commit b0f38e15979fa8851e88e8aa371367f264e7b6e9 upstream.

Fix section mismatch warnings in xtsonic. The first one appears to be
bogus and after fixing the second one, the first one is gone.

WARNING: modpost: vmlinux.o(.text+0x529adc): Section mismatch in reference from the function sonic_get_stats() to the function .init.text:set_reset_devices()
The function sonic_get_stats() references
the function __init set_reset_devices().
This is often because sonic_get_stats lacks a __init
annotation or the annotation of set_reset_devices is wrong.

WARNING: modpost: vmlinux.o(.text+0x529b3b): Section mismatch in reference from the function xtsonic_probe() to the function .init.text:sonic_probe1()
The function xtsonic_probe() references
the function __init sonic_probe1().
This is often because xtsonic_probe lacks a __init
annotation or the annotation of sonic_probe1 is wrong.

Fixes: 74f2a5f0ef64 ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Christophe JAILLET <[email protected]>
Cc: Finn Thain <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: [email protected]
Cc: Thomas Bogendoerfer <[email protected]>
Acked-by: Max Filippov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/natsemi/xtsonic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/natsemi/xtsonic.c
+++ b/drivers/net/ethernet/natsemi/xtsonic.c
@@ -128,7 +128,7 @@ static const struct net_device_ops xtson
.ndo_set_mac_address = eth_mac_addr,
};

-static int __init sonic_probe1(struct net_device *dev)
+static int sonic_probe1(struct net_device *dev)
{
static unsigned version_printed = 0;
unsigned int silicon_revision;



2021-12-06 19:34:20

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/52] 4.4.294-rc1 review

Hi!

> This is the start of the stable review cycle for the 4.4.294 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

CIP testing did not find any problems here:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.4.y

Tested-by: Pavel Machek (CIP) <[email protected]>

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Attachments:
(No filename) (641.00 B)
signature.asc (195.00 B)
Download all attachments

2021-12-06 21:58:45

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/52] 4.4.294-rc1 review

On 12/6/21 7:55 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.294 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.294-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2021-12-07 20:39:57

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/52] 4.4.294-rc1 review

On Mon, Dec 06, 2021 at 03:55:44PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.294 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 160 pass: 160 fail: 0
Qemu test results:
total: 341 pass: 341 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2021-12-08 04:15:55

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/52] 4.4.294-rc1 review

On Mon, 6 Dec 2021 at 20:29, Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 4.4.294 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.294-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 4.4.294-rc1
* git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-4.4.y
* git commit: f5b0b7aefd9d2ea9170ed69c7ab2dfd751fba7d1
* git describe: v4.4.293-53-gf5b0b7aefd9d
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.4.y/build/v4.4.293-53-gf5b0b7aefd9d

## No Test Regressions (compared to v4.4.293-22-gd0a6005afb1e)

## No Test Fixes (compared to v4.4.293-22-gd0a6005afb1e)

## Test result summary
total: 56876, pass: 45655, fail: 214, skip: 9659, xfail: 1348

## Build Summary
* arm: 129 total, 129 passed, 0 failed
* arm64: 31 total, 31 passed, 0 failed
* i386: 18 total, 18 passed, 0 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 22 total, 22 passed, 0 failed
* sparc: 12 total, 12 passed, 0 failed
* x15: 1 total, 1 passed, 0 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 18 total, 18 passed, 0 failed

## Test suites summary
* fwts
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-membarrier
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-x86
* kselftest-zram
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* packetdrill
* perf
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org