2022-03-28 21:03:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param

From: Casey Schaufler <[email protected]>

[ Upstream commit ecff30575b5ad0eda149aadad247b7f75411fd47 ]

The usual LSM hook "bail on fail" scheme doesn't work for cases where
a security module may return an error code indicating that it does not
recognize an input. In this particular case Smack sees a mount option
that it recognizes, and returns 0. A call to a BPF hook follows, which
returns -ENOPARAM, which confuses the caller because Smack has processed
its data.

The SELinux hook incorrectly returns 1 on success. There was a time
when this was correct, however the current expectation is that it
return 0 on success. This is repaired.

Reported-by: [email protected]
Signed-off-by: Casey Schaufler <[email protected]>
Acked-by: James Morris <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
security/security.c | 17 +++++++++++++++--
security/selinux/hooks.c | 5 ++---
2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/security/security.c b/security/security.c
index 22261d79f333..f101a53a63ed 100644
--- a/security/security.c
+++ b/security/security.c
@@ -884,9 +884,22 @@ int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
return call_int_hook(fs_context_dup, 0, fc, src_fc);
}

-int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param)
+int security_fs_context_parse_param(struct fs_context *fc,
+ struct fs_parameter *param)
{
- return call_int_hook(fs_context_parse_param, -ENOPARAM, fc, param);
+ struct security_hook_list *hp;
+ int trc;
+ int rc = -ENOPARAM;
+
+ hlist_for_each_entry(hp, &security_hook_heads.fs_context_parse_param,
+ list) {
+ trc = hp->hook.fs_context_parse_param(fc, param);
+ if (trc == 0)
+ rc = 0;
+ else if (trc != -ENOPARAM)
+ return trc;
+ }
+ return rc;
}

int security_sb_alloc(struct super_block *sb)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 5b6895e4fc29..371f67a37f9a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2860,10 +2860,9 @@ static int selinux_fs_context_parse_param(struct fs_context *fc,
return opt;

rc = selinux_add_opt(opt, param->string, &fc->security);
- if (!rc) {
+ if (!rc)
param->string = NULL;
- rc = 1;
- }
+
return rc;
}

--
2.34.1


2022-03-28 21:05:19

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults

From: John David Anglin <[email protected]>

[ Upstream commit e00b0a2ab8ec019c344e53bfc76e31c18bb587b7 ]

Currently, the parisc kernel does not fully support non-access TLB
fault handling for probe instructions. In the fast path, we set the
target register to zero if it is not a shadowed register. The slow
path is not implemented, so we call do_page_fault. The architecture
indicates that non-access faults should not cause a page fault from
disk.

This change adds to code to provide non-access fault support for
probe instructions. It also modifies the handling of faults on
userspace so that if the address lies in a valid VMA and the access
type matches that for the VMA, the probe target register is set to
one. Otherwise, the target register is set to zero.

This was done to make probe instructions more useful for userspace.
Probe instructions are not very useful if they set the target register
to zero whenever a page is not present in memory. Nominally, the
purpose of the probe instruction is determine whether read or write
access to a given address is allowed.

This fixes a problem in function pointer comparison noticed in the
glibc testsuite (stdio-common/tst-vfprintf-user-type). The same
problem is likely in glibc (_dl_lookup_address).

V2 adds flush and lpa instruction support to handle_nadtlb_fault.

Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/parisc/include/asm/traps.h | 1 +
arch/parisc/kernel/traps.c | 2 +
arch/parisc/mm/fault.c | 89 +++++++++++++++++++++++++++++++++
3 files changed, 92 insertions(+)

diff --git a/arch/parisc/include/asm/traps.h b/arch/parisc/include/asm/traps.h
index 34619f010c63..0ccdb738a9a3 100644
--- a/arch/parisc/include/asm/traps.h
+++ b/arch/parisc/include/asm/traps.h
@@ -18,6 +18,7 @@ unsigned long parisc_acctyp(unsigned long code, unsigned int inst);
const char *trap_name(unsigned long code);
void do_page_fault(struct pt_regs *regs, unsigned long code,
unsigned long address);
+int handle_nadtlb_fault(struct pt_regs *regs);
#endif

#endif
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index b6fdebddc8e9..39576a9245c7 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -662,6 +662,8 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
by hand. Technically we need to emulate:
fdc,fdce,pdc,"fic,4f",prober,probeir,probew, probeiw
*/
+ if (code == 17 && handle_nadtlb_fault(regs))
+ return;
fault_address = regs->ior;
fault_space = regs->isr;
break;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e9eabf8f14d7..f114e102aaf2 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -425,3 +425,92 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
}
pagefault_out_of_memory();
}
+
+/* Handle non-access data TLB miss faults.
+ *
+ * For probe instructions, accesses to userspace are considered allowed
+ * if they lie in a valid VMA and the access type matches. We are not
+ * allowed to handle MM faults here so there may be situations where an
+ * actual access would fail even though a probe was successful.
+ */
+int
+handle_nadtlb_fault(struct pt_regs *regs)
+{
+ unsigned long insn = regs->iir;
+ int breg, treg, xreg, val = 0;
+ struct vm_area_struct *vma, *prev_vma;
+ struct task_struct *tsk;
+ struct mm_struct *mm;
+ unsigned long address;
+ unsigned long acc_type;
+
+ switch (insn & 0x380) {
+ case 0x280:
+ /* FDC instruction */
+ fallthrough;
+ case 0x380:
+ /* PDC and FIC instructions */
+ if (printk_ratelimit()) {
+ pr_warn("BUG: nullifying cache flush/purge instruction\n");
+ show_regs(regs);
+ }
+ if (insn & 0x20) {
+ /* Base modification */
+ breg = (insn >> 21) & 0x1f;
+ xreg = (insn >> 16) & 0x1f;
+ if (breg && xreg)
+ regs->gr[breg] += regs->gr[xreg];
+ }
+ regs->gr[0] |= PSW_N;
+ return 1;
+
+ case 0x180:
+ /* PROBE instruction */
+ treg = insn & 0x1f;
+ if (regs->isr) {
+ tsk = current;
+ mm = tsk->mm;
+ if (mm) {
+ /* Search for VMA */
+ address = regs->ior;
+ mmap_read_lock(mm);
+ vma = find_vma_prev(mm, address, &prev_vma);
+ mmap_read_unlock(mm);
+
+ /*
+ * Check if access to the VMA is okay.
+ * We don't allow for stack expansion.
+ */
+ acc_type = (insn & 0x40) ? VM_WRITE : VM_READ;
+ if (vma
+ && address >= vma->vm_start
+ && (vma->vm_flags & acc_type) == acc_type)
+ val = 1;
+ }
+ }
+ if (treg)
+ regs->gr[treg] = val;
+ regs->gr[0] |= PSW_N;
+ return 1;
+
+ case 0x300:
+ /* LPA instruction */
+ if (insn & 0x20) {
+ /* Base modification */
+ breg = (insn >> 21) & 0x1f;
+ xreg = (insn >> 16) & 0x1f;
+ if (breg && xreg)
+ regs->gr[breg] += regs->gr[xreg];
+ }
+ treg = insn & 0x1f;
+ if (treg)
+ regs->gr[treg] = 0;
+ regs->gr[0] |= PSW_N;
+ return 1;
+
+ default:
+ break;
+ }
+
+ return 0;
+}
--
2.34.1

2022-03-28 21:12:35

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length

From: Christian Göttsche <[email protected]>

[ Upstream commit b97df7c098c531010e445da88d02b7bf7bf59ef6 ]

security_sid_to_context() expects a pointer to an u32 as the address
where to store the length of the computed context.

Reported by sparse:

security/selinux/xfrm.c:359:39: warning: incorrect type in arg 4
(different signedness)
security/selinux/xfrm.c:359:39: expected unsigned int
[usertype] *scontext_len
security/selinux/xfrm.c:359:39: got int *

Signed-off-by: Christian Göttsche <[email protected]>
[PM: wrapped commit description]
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
security/selinux/xfrm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 90697317895f..c576832febc6 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -347,7 +347,7 @@ int selinux_xfrm_state_alloc_acquire(struct xfrm_state *x,
int rc;
struct xfrm_sec_ctx *ctx;
char *ctx_str = NULL;
- int str_len;
+ u32 str_len;

if (!polsec)
return 0;
--
2.34.1

2022-03-28 21:16:15

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets

From: Chris Leech <[email protected]>

[ Upstream commit 841aee4d75f18fdfb53935080b03de0c65e9b92c ]

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

======================================================
WARNING: possible circular locking dependency detected
5.15.0-rc3 #1 Not tainted
------------------------------------------------------
fio/1496 is trying to acquire lock:
(sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

but task is already holding lock:
(&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

which lock already depends on the new lock.

other info that might help us debug this:

chain exists of:
sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&xfs_dir_ilock_class/5);
lock(sb_internal);
lock(&xfs_dir_ilock_class/5);
lock(sk_lock-AF_INET);

*** DEADLOCK ***

6 locks held by fio/1496:
#0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
#1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
#2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
#3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
#4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
#5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/

Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/nvme/host/tcp.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 65e00c64a588..d66e2de044e0 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -30,6 +30,44 @@ static int so_priority;
module_param(so_priority, int, 0644);
MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/* lockdep can detect a circular dependency of the form
+ * sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
+ * because dependencies are tracked for both nvme-tcp and user contexts. Using
+ * a separate class prevents lockdep from conflating nvme-tcp socket use with
+ * user-space socket API use.
+ */
+static struct lock_class_key nvme_tcp_sk_key[2];
+static struct lock_class_key nvme_tcp_slock_key[2];
+
+static void nvme_tcp_reclassify_socket(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+
+ if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
+ return;
+
+ switch (sk->sk_family) {
+ case AF_INET:
+ sock_lock_init_class_and_name(sk, "slock-AF_INET-NVME",
+ &nvme_tcp_slock_key[0],
+ "sk_lock-AF_INET-NVME",
+ &nvme_tcp_sk_key[0]);
+ break;
+ case AF_INET6:
+ sock_lock_init_class_and_name(sk, "slock-AF_INET6-NVME",
+ &nvme_tcp_slock_key[1],
+ "sk_lock-AF_INET6-NVME",
+ &nvme_tcp_sk_key[1]);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+#else
+static void nvme_tcp_reclassify_socket(struct socket *sock) { }
+#endif
+
enum nvme_tcp_send_state {
NVME_TCP_SEND_CMD_PDU = 0,
NVME_TCP_SEND_H2C_PDU,
@@ -1469,6 +1507,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
goto err_destroy_mutex;
}

+ nvme_tcp_reclassify_socket(queue->sock);
+
/* Single syn retry */
tcp_sock_set_syncnt(queue->sock->sk, 1);

--
2.34.1

2022-03-28 21:22:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure

From: "Souptick Joarder (HPE)" <[email protected]>

[ Upstream commit e414c25e3399b2b3d7337dc47abccab5c71b7c8f ]

smatch warning was reported as below ->

smatch warnings:
drivers/irqchip/irq-nvic.c:131 nvic_of_init()
warn: 'nvic_base' not released on lines: 97.

Release nvic_base upon failure.

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Souptick Joarder (HPE) <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/irqchip/irq-nvic.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/irqchip/irq-nvic.c b/drivers/irqchip/irq-nvic.c
index ba4759b3e269..94230306e0ee 100644
--- a/drivers/irqchip/irq-nvic.c
+++ b/drivers/irqchip/irq-nvic.c
@@ -107,6 +107,7 @@ static int __init nvic_of_init(struct device_node *node,

if (!nvic_irq_domain) {
pr_warn("Failed to allocate irq domain\n");
+ iounmap(nvic_base);
return -ENOMEM;
}

@@ -116,6 +117,7 @@ static int __init nvic_of_init(struct device_node *node,
if (ret) {
pr_warn("Failed to allocate irq chips\n");
irq_domain_remove(nvic_irq_domain);
+ iounmap(nvic_base);
return ret;
}

--
2.34.1

2022-03-28 21:25:21

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data()

From: Minghao Chi <[email protected]>

[ Upstream commit c9839acfcbe20ce43d363c2a9d0772472d9921c0 ]

Use of_device_get_match_data() to simplify the code.

Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Minghao Chi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/spi/spi-tegra20-slink.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/spi/spi-tegra20-slink.c b/drivers/spi/spi-tegra20-slink.c
index 2a03739a0c60..80c3787deea9 100644
--- a/drivers/spi/spi-tegra20-slink.c
+++ b/drivers/spi/spi-tegra20-slink.c
@@ -1006,14 +1006,8 @@ static int tegra_slink_probe(struct platform_device *pdev)
struct resource *r;
int ret, spi_irq;
const struct tegra_slink_chip_data *cdata = NULL;
- const struct of_device_id *match;

- match = of_match_device(tegra_slink_of_match, &pdev->dev);
- if (!match) {
- dev_err(&pdev->dev, "Error: No device match found\n");
- return -ENODEV;
- }
- cdata = match->data;
+ cdata = of_device_get_match_data(&pdev->dev);

master = spi_alloc_master(&pdev->dev, sizeof(*tspi));
if (!master) {
--
2.34.1

2022-03-28 21:25:38

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability

From: Richard Haines <[email protected]>

[ Upstream commit 65881e1db4e948614d9eb195b8e1197339822949 ]

These ioctls are equivalent to fcntl(fd, F_SETFD, flags), which SELinux
always allows too. Furthermore, a failed FIOCLEX could result in a file
descriptor being leaked to a process that should not have access to it.

As this patch removes access controls, a policy capability needs to be
enabled in policy to always allow these ioctls.

Based-on-patch-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Richard Haines <[email protected]>
[PM: subject line tweak]
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
security/selinux/hooks.c | 6 ++++++
security/selinux/include/policycap.h | 1 +
security/selinux/include/policycap_names.h | 3 ++-
security/selinux/include/security.h | 7 +++++++
4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 371f67a37f9a..141220308307 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3744,6 +3744,12 @@ static int selinux_file_ioctl(struct file *file, unsigned int cmd,
CAP_OPT_NONE, true);
break;

+ case FIOCLEX:
+ case FIONCLEX:
+ if (!selinux_policycap_ioctl_skip_cloexec())
+ error = ioctl_has_perm(cred, file, FILE__IOCTL, (u16) cmd);
+ break;
+
/* default case assumes that the command will go
* to the file's ioctl() function.
*/
diff --git a/security/selinux/include/policycap.h b/security/selinux/include/policycap.h
index 2ec038efbb03..a9e572ca4fd9 100644
--- a/security/selinux/include/policycap.h
+++ b/security/selinux/include/policycap.h
@@ -11,6 +11,7 @@ enum {
POLICYDB_CAPABILITY_CGROUPSECLABEL,
POLICYDB_CAPABILITY_NNP_NOSUID_TRANSITION,
POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS,
+ POLICYDB_CAPABILITY_IOCTL_SKIP_CLOEXEC,
__POLICYDB_CAPABILITY_MAX
};
#define POLICYDB_CAPABILITY_MAX (__POLICYDB_CAPABILITY_MAX - 1)
diff --git a/security/selinux/include/policycap_names.h b/security/selinux/include/policycap_names.h
index b89289f092c9..ebd64afe1def 100644
--- a/security/selinux/include/policycap_names.h
+++ b/security/selinux/include/policycap_names.h
@@ -12,7 +12,8 @@ const char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX] = {
"always_check_network",
"cgroup_seclabel",
"nnp_nosuid_transition",
- "genfs_seclabel_symlinks"
+ "genfs_seclabel_symlinks",
+ "ioctl_skip_cloexec"
};

#endif /* _SELINUX_POLICYCAP_NAMES_H_ */
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index ac0ece01305a..c0d966020ebd 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -219,6 +219,13 @@ static inline bool selinux_policycap_genfs_seclabel_symlinks(void)
return READ_ONCE(state->policycap[POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS]);
}

+static inline bool selinux_policycap_ioctl_skip_cloexec(void)
+{
+ struct selinux_state *state = &selinux_state;
+
+ return READ_ONCE(state->policycap[POLICYDB_CAPABILITY_IOCTL_SKIP_CLOEXEC]);
+}
+
struct selinux_policy_convert_data;

struct selinux_load_state {
--
2.34.1

2022-03-28 21:35:51

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request

From: Zhang Wensheng <[email protected]>

[ Upstream commit ab552fcb17cc9e4afe0e4ac4df95fc7b30e8490a ]

KASAN reports a use-after-free report when doing normal scsi-mq test

[69832.239032] ==================================================================
[69832.241810] BUG: KASAN: use-after-free in bfq_dispatch_request+0x1045/0x44b0
[69832.243267] Read of size 8 at addr ffff88802622ba88 by task kworker/3:1H/155
[69832.244656]
[69832.245007] CPU: 3 PID: 155 Comm: kworker/3:1H Not tainted 5.10.0-10295-g576c6382529e #8
[69832.246626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[69832.249069] Workqueue: kblockd blk_mq_run_work_fn
[69832.250022] Call Trace:
[69832.250541] dump_stack+0x9b/0xce
[69832.251232] ? bfq_dispatch_request+0x1045/0x44b0
[69832.252243] print_address_description.constprop.6+0x3e/0x60
[69832.253381] ? __cpuidle_text_end+0x5/0x5
[69832.254211] ? vprintk_func+0x6b/0x120
[69832.254994] ? bfq_dispatch_request+0x1045/0x44b0
[69832.255952] ? bfq_dispatch_request+0x1045/0x44b0
[69832.256914] kasan_report.cold.9+0x22/0x3a
[69832.257753] ? bfq_dispatch_request+0x1045/0x44b0
[69832.258755] check_memory_region+0x1c1/0x1e0
[69832.260248] bfq_dispatch_request+0x1045/0x44b0
[69832.261181] ? bfq_bfqq_expire+0x2440/0x2440
[69832.262032] ? blk_mq_delay_run_hw_queues+0xf9/0x170
[69832.263022] __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.264011] ? blk_mq_sched_request_inserted+0x100/0x100
[69832.265101] __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.266206] ? blk_mq_do_dispatch_ctx+0x570/0x570
[69832.267147] ? __switch_to+0x5f4/0xee0
[69832.267898] blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.268946] __blk_mq_run_hw_queue+0xc0/0x270
[69832.269840] blk_mq_run_work_fn+0x51/0x60
[69832.278170] process_one_work+0x6d4/0xfe0
[69832.278984] worker_thread+0x91/0xc80
[69832.279726] ? __kthread_parkme+0xb0/0x110
[69832.280554] ? process_one_work+0xfe0/0xfe0
[69832.281414] kthread+0x32d/0x3f0
[69832.282082] ? kthread_park+0x170/0x170
[69832.282849] ret_from_fork+0x1f/0x30
[69832.283573]
[69832.283886] Allocated by task 7725:
[69832.284599] kasan_save_stack+0x19/0x40
[69832.285385] __kasan_kmalloc.constprop.2+0xc1/0xd0
[69832.286350] kmem_cache_alloc_node+0x13f/0x460
[69832.287237] bfq_get_queue+0x3d4/0x1140
[69832.287993] bfq_get_bfqq_handle_split+0x103/0x510
[69832.289015] bfq_init_rq+0x337/0x2d50
[69832.289749] bfq_insert_requests+0x304/0x4e10
[69832.290634] blk_mq_sched_insert_requests+0x13e/0x390
[69832.291629] blk_mq_flush_plug_list+0x4b4/0x760
[69832.292538] blk_flush_plug_list+0x2c5/0x480
[69832.293392] io_schedule_prepare+0xb2/0xd0
[69832.294209] io_schedule_timeout+0x13/0x80
[69832.295014] wait_for_common_io.constprop.1+0x13c/0x270
[69832.296137] submit_bio_wait+0x103/0x1a0
[69832.296932] blkdev_issue_discard+0xe6/0x160
[69832.297794] blk_ioctl_discard+0x219/0x290
[69832.298614] blkdev_common_ioctl+0x50a/0x1750
[69832.304715] blkdev_ioctl+0x470/0x600
[69832.305474] block_ioctl+0xde/0x120
[69832.306232] vfs_ioctl+0x6c/0xc0
[69832.306877] __se_sys_ioctl+0x90/0xa0
[69832.307629] do_syscall_64+0x2d/0x40
[69832.308362] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[69832.309382]
[69832.309701] Freed by task 155:
[69832.310328] kasan_save_stack+0x19/0x40
[69832.311121] kasan_set_track+0x1c/0x30
[69832.311868] kasan_set_free_info+0x1b/0x30
[69832.312699] __kasan_slab_free+0x111/0x160
[69832.313524] kmem_cache_free+0x94/0x460
[69832.314367] bfq_put_queue+0x582/0x940
[69832.315112] __bfq_bfqd_reset_in_service+0x166/0x1d0
[69832.317275] bfq_bfqq_expire+0xb27/0x2440
[69832.318084] bfq_dispatch_request+0x697/0x44b0
[69832.318991] __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.319984] __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.321087] blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.322225] __blk_mq_run_hw_queue+0xc0/0x270
[69832.323114] blk_mq_run_work_fn+0x51/0x60
[69832.323942] process_one_work+0x6d4/0xfe0
[69832.324772] worker_thread+0x91/0xc80
[69832.325518] kthread+0x32d/0x3f0
[69832.326205] ret_from_fork+0x1f/0x30
[69832.326932]
[69832.338297] The buggy address belongs to the object at ffff88802622b968
[69832.338297] which belongs to the cache bfq_queue of size 512
[69832.340766] The buggy address is located 288 bytes inside of
[69832.340766] 512-byte region [ffff88802622b968, ffff88802622bb68)
[69832.343091] The buggy address belongs to the page:
[69832.344097] page:ffffea0000988a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802622a528 pfn:0x26228
[69832.346214] head:ffffea0000988a00 order:2 compound_mapcount:0 compound_pincount:0
[69832.347719] flags: 0x1fffff80010200(slab|head)
[69832.348625] raw: 001fffff80010200 ffffea0000dbac08 ffff888017a57650 ffff8880179fe840
[69832.354972] raw: ffff88802622a528 0000000000120008 00000001ffffffff 0000000000000000
[69832.356547] page dumped because: kasan: bad access detected
[69832.357652]
[69832.357970] Memory state around the buggy address:
[69832.358926] ffff88802622b980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.360358] ffff88802622ba00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.361810] >ffff88802622ba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.363273] ^
[69832.363975] ffff88802622bb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[69832.375960] ffff88802622bb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69832.377405] ==================================================================

In bfq_dispatch_requestfunction, it may have function call:

bfq_dispatch_request
__bfq_dispatch_request
bfq_select_queue
bfq_bfqq_expire
__bfq_bfqd_reset_in_service
bfq_put_queue
kmem_cache_free
In this function call, in_serv_queue has beed expired and meet the
conditions to free. In the function bfq_dispatch_request, the address
of in_serv_queue pointing to has been released. For getting the value
of idle_timer_disabled, it will get flags value from the address which
in_serv_queue pointing to, then the problem of use-after-free happens;

Fix the problem by check in_serv_queue == bfqd->in_service_queue, to
get the value of idle_timer_disabled if in_serve_queue is equel to
bfqd->in_service_queue. If the space of in_serv_queue pointing has
been released, this judge will aviod use-after-free problem.
And if in_serv_queue may be expired or finished, the idle_timer_disabled
will be false which would not give effects to bfq_update_dispatch_stats.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhang Wensheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
block/bfq-iosched.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 36a66e97e3c2..8735f075230f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5181,7 +5181,7 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
struct request *rq;
struct bfq_queue *in_serv_queue;
- bool waiting_rq, idle_timer_disabled;
+ bool waiting_rq, idle_timer_disabled = false;

spin_lock_irq(&bfqd->lock);

@@ -5189,14 +5189,15 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);

rq = __bfq_dispatch_request(hctx);
-
- idle_timer_disabled =
- waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+ if (in_serv_queue == bfqd->in_service_queue) {
+ idle_timer_disabled =
+ waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+ }

spin_unlock_irq(&bfqd->lock);
-
- bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue,
- idle_timer_disabled);
+ bfq_update_dispatch_stats(hctx->queue, rq,
+ idle_timer_disabled ? in_serv_queue : NULL,
+ idle_timer_disabled);

return rq;
}
--
2.34.1

2022-03-28 21:36:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel

From: Dave Stevenson <[email protected]>

[ Upstream commit 5665eee7a3800430e7dc3ef6f25722476b603186 ]

The Atmel is doing some things in the I2C ISR, during which
period it will not respond to further commands. This is
particularly true of the POWERON command.

Increase delays appropriately, and retry should I2C errors be
reported.

Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Detlev Casanova <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
.../regulator/rpi-panel-attiny-regulator.c | 56 +++++++++++++++----
1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/regulator/rpi-panel-attiny-regulator.c b/drivers/regulator/rpi-panel-attiny-regulator.c
index ee46bfbf5eee..991b4730d768 100644
--- a/drivers/regulator/rpi-panel-attiny-regulator.c
+++ b/drivers/regulator/rpi-panel-attiny-regulator.c
@@ -37,11 +37,24 @@ static const struct regmap_config attiny_regmap_config = {
static int attiny_lcd_power_enable(struct regulator_dev *rdev)
{
unsigned int data;
+ int ret, i;

regmap_write(rdev->regmap, REG_POWERON, 1);
+ msleep(80);
+
/* Wait for nPWRDWN to go low to indicate poweron is done. */
- regmap_read_poll_timeout(rdev->regmap, REG_PORTB, data,
- data & BIT(0), 10, 1000000);
+ for (i = 0; i < 20; i++) {
+ ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+ if (!ret) {
+ if (data & BIT(0))
+ break;
+ }
+ usleep_range(10000, 12000);
+ }
+ usleep_range(10000, 12000);
+
+ if (ret)
+ pr_err("%s: regmap_read_poll_timeout failed %d\n", __func__, ret);

/* Default to the same orientation as the closed source
* firmware used for the panel. Runtime rotation
@@ -57,23 +70,34 @@ static int attiny_lcd_power_disable(struct regulator_dev *rdev)
{
regmap_write(rdev->regmap, REG_PWM, 0);
regmap_write(rdev->regmap, REG_POWERON, 0);
- udelay(1);
+ msleep(30);
return 0;
}

static int attiny_lcd_power_is_enabled(struct regulator_dev *rdev)
{
unsigned int data;
- int ret;
+ int ret, i;

- ret = regmap_read(rdev->regmap, REG_POWERON, &data);
+ for (i = 0; i < 10; i++) {
+ ret = regmap_read(rdev->regmap, REG_POWERON, &data);
+ if (!ret)
+ break;
+ usleep_range(10000, 12000);
+ }
if (ret < 0)
return ret;

if (!(data & BIT(0)))
return 0;

- ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+ for (i = 0; i < 10; i++) {
+ ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+ if (!ret)
+ break;
+ usleep_range(10000, 12000);
+ }
+
if (ret < 0)
return ret;

@@ -103,20 +127,32 @@ static int attiny_update_status(struct backlight_device *bl)
{
struct regmap *regmap = bl_get_data(bl);
int brightness = bl->props.brightness;
+ int ret, i;

if (bl->props.power != FB_BLANK_UNBLANK ||
bl->props.fb_blank != FB_BLANK_UNBLANK)
brightness = 0;

- return regmap_write(regmap, REG_PWM, brightness);
+ for (i = 0; i < 10; i++) {
+ ret = regmap_write(regmap, REG_PWM, brightness);
+ if (!ret)
+ break;
+ }
+
+ return ret;
}

static int attiny_get_brightness(struct backlight_device *bl)
{
struct regmap *regmap = bl_get_data(bl);
- int ret, brightness;
+ int ret, brightness, i;
+
+ for (i = 0; i < 10; i++) {
+ ret = regmap_read(regmap, REG_PWM, &brightness);
+ if (!ret)
+ break;
+ }

- ret = regmap_read(regmap, REG_PWM, &brightness);
if (ret)
return ret;

@@ -166,7 +202,7 @@ static int attiny_i2c_probe(struct i2c_client *i2c,
}

regmap_write(regmap, REG_POWERON, 0);
- mdelay(1);
+ msleep(30);

config.dev = &i2c->dev;
config.regmap = regmap;
--
2.34.1

2022-03-28 21:38:09

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion

From: David Woodhouse <[email protected]>

[ Upstream commit 82980b1622d97017053c6792382469d7dc26a486 ]

If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.

However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.

So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).

There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.

At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/rcu/tree.c | 71 ++++++++++++++++++++++++-----------------------
kernel/rcu/tree.h | 4 +--
2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4c25a6283b0..73a4c9d07b86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -91,7 +91,7 @@ static struct rcu_state rcu_state = {
.abbr = RCU_ABBR,
.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
- .ofl_lock = __RAW_SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
+ .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
};

/* Dump rcu_node combining tree at boot to verify correct setup. */
@@ -1175,7 +1175,15 @@ bool rcu_lockdep_current_cpu_online(void)
preempt_disable_notrace();
rdp = this_cpu_ptr(&rcu_data);
rnp = rdp->mynode;
- if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || READ_ONCE(rnp->ofl_seq) & 0x1)
+ /*
+ * Strictly, we care here about the case where the current CPU is
+ * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
+ * not being up to date. So arch_spin_is_locked() might have a
+ * false positive if it's held by some *other* CPU, but that's
+ * OK because that just means a false *negative* on the warning.
+ */
+ if (rdp->grpmask & rcu_rnp_online_cpus(rnp) ||
+ arch_spin_is_locked(&rcu_state.ofl_lock))
ret = true;
preempt_enable_notrace();
return ret;
@@ -1739,7 +1747,6 @@ static void rcu_strict_gp_boundary(void *unused)
*/
static noinline_for_stack bool rcu_gp_init(void)
{
- unsigned long firstseq;
unsigned long flags;
unsigned long oldmask;
unsigned long mask;
@@ -1782,22 +1789,17 @@ static noinline_for_stack bool rcu_gp_init(void)
* of RCU's Requirements documentation.
*/
WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
+ /* Exclude CPU hotplug operations. */
rcu_for_each_leaf_node(rnp) {
- // Wait for CPU-hotplug operations that might have
- // started before this grace period did.
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
- firstseq = READ_ONCE(rnp->ofl_seq);
- if (firstseq & 0x1)
- while (firstseq == READ_ONCE(rnp->ofl_seq))
- schedule_timeout_idle(1); // Can't wake unless RCU is watching.
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
- raw_spin_lock(&rcu_state.ofl_lock);
- raw_spin_lock_irq_rcu_node(rnp);
+ local_irq_save(flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
+ raw_spin_lock_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
!rnp->wait_blkd_tasks) {
/* Nothing to do on this leaf rcu_node structure. */
- raw_spin_unlock_irq_rcu_node(rnp);
- raw_spin_unlock(&rcu_state.ofl_lock);
+ raw_spin_unlock_rcu_node(rnp);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
continue;
}

@@ -1832,8 +1834,9 @@ static noinline_for_stack bool rcu_gp_init(void)
rcu_cleanup_dead_rnp(rnp);
}

- raw_spin_unlock_irq_rcu_node(rnp);
- raw_spin_unlock(&rcu_state.ofl_lock);
+ raw_spin_unlock_rcu_node(rnp);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
}
rcu_gp_slow(gp_preinit_delay); /* Races with CPU hotplug. */

@@ -4287,11 +4290,10 @@ void rcu_cpu_starting(unsigned int cpu)

rnp = rdp->mynode;
mask = rdp->grpmask;
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+ local_irq_save(flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
rcu_dynticks_eqs_online();
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ raw_spin_lock_rcu_node(rnp);
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
newcpu = !(rnp->expmaskinitnext & mask);
rnp->expmaskinitnext |= mask;
@@ -4304,15 +4306,18 @@ void rcu_cpu_starting(unsigned int cpu)

/* An incoming CPU should never be blocking a grace period. */
if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
+ /* rcu_report_qs_rnp() *really* wants some flags to restore */
+ unsigned long flags2;
+
+ local_irq_save(flags2);
rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */
- rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
+ rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags2);
} else {
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ raw_spin_unlock_rcu_node(rnp);
}
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
}

@@ -4326,7 +4331,7 @@ void rcu_cpu_starting(unsigned int cpu)
*/
void rcu_report_dead(unsigned int cpu)
{
- unsigned long flags;
+ unsigned long flags, seq_flags;
unsigned long mask;
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
@@ -4340,10 +4345,8 @@ void rcu_report_dead(unsigned int cpu)

/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- raw_spin_lock(&rcu_state.ofl_lock);
+ local_irq_save(seq_flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
@@ -4354,10 +4357,8 @@ void rcu_report_dead(unsigned int cpu)
}
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- raw_spin_unlock(&rcu_state.ofl_lock);
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(seq_flags);

rdp->cpu_started = false;
}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 486fc901bd08..4b4bcef8a974 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -56,8 +56,6 @@ struct rcu_node {
/* Initialized from ->qsmaskinitnext at the */
/* beginning of each grace period. */
unsigned long qsmaskinitnext;
- unsigned long ofl_seq; /* CPU-hotplug operation sequence count. */
- /* Online CPUs for next grace period. */
unsigned long expmask; /* CPUs or groups that need to check in */
/* to allow the current expedited GP */
/* to complete. */
@@ -355,7 +353,7 @@ struct rcu_state {
const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */

- raw_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+ arch_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
/* Synchronize offline with */
/* GP pre-initialization. */
};
--
2.34.1

2022-03-28 21:41:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges""

From: Paolo Valente <[email protected]>

[ Upstream commit 15729ff8143f8135b03988a100a19e66d7cb7ecd ]

A crash [1] happened to be triggered in conjunction with commit
2d52c58b9c9b ("block, bfq: honor already-setup queue merges"). The
latter was then reverted by commit ebc69e897e17 ("Revert "block, bfq:
honor already-setup queue merges""). Yet, the reverted commit was not
the one introducing the bug. In fact, it actually triggered a UAF
introduced by a different commit, and now fixed by commit d29bd41428cf
("block, bfq: reset last_bfqq_created on group change").

So, there is no point in keeping commit 2d52c58b9c9b ("block, bfq:
honor already-setup queue merges") out. This commit restores it.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503

Reported-by: Holger Hoffstätte <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
block/bfq-iosched.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8735f075230f..1dff82d34b44 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2782,6 +2782,15 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
* are likely to increase the throughput.
*/
bfqq->new_bfqq = new_bfqq;
+ /*
+ * The above assignment schedules the following redirections:
+ * each time some I/O for bfqq arrives, the process that
+ * generated that I/O is disassociated from bfqq and
+ * associated with new_bfqq. Here we increases new_bfqq->ref
+ * in advance, adding the number of processes that are
+ * expected to be associated with new_bfqq as they happen to
+ * issue I/O.
+ */
new_bfqq->ref += process_refs;
return new_bfqq;
}
@@ -2844,6 +2853,10 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
{
struct bfq_queue *in_service_bfqq, *new_bfqq;

+ /* if a merge has already been setup, then proceed with that first */
+ if (bfqq->new_bfqq)
+ return bfqq->new_bfqq;
+
/*
* Check delayed stable merge for rotational or non-queueing
* devs. For this branch to be executed, bfqq must not be
@@ -2945,9 +2958,6 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (bfq_too_late_for_merging(bfqq))
return NULL;

- if (bfqq->new_bfqq)
- return bfqq->new_bfqq;
-
if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
return NULL;

--
2.34.1

2022-03-28 21:46:41

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes

From: Kees Cook <[email protected]>

[ Upstream commit 27e9faf415dbf94af19b9c827842435edbc1fbbc ]

Since STRING_CST may not be NUL terminated, strncmp() was used for check
for equality. However, this may lead to mismatches for longer section
names where the start matches the tested-for string. Test for exact
equality by checking for the presences of NUL termination.

Cc: Alexander Popov <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
scripts/gcc-plugins/stackleak_plugin.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index e9db7dcb3e5f..b04aa8e91a41 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -429,6 +429,23 @@ static unsigned int stackleak_cleanup_execute(void)
return 0;
}

+/*
+ * STRING_CST may or may not be NUL terminated:
+ * https://gcc.gnu.org/onlinedocs/gccint/Constant-expressions.html
+ */
+static inline bool string_equal(tree node, const char *string, int length)
+{
+ if (TREE_STRING_LENGTH(node) < length)
+ return false;
+ if (TREE_STRING_LENGTH(node) > length + 1)
+ return false;
+ if (TREE_STRING_LENGTH(node) == length + 1 &&
+ TREE_STRING_POINTER(node)[length] != '\0')
+ return false;
+ return !memcmp(TREE_STRING_POINTER(node), string, length);
+}
+#define STRING_EQUAL(node, str) string_equal(node, str, strlen(str))
+
static bool stackleak_gate(void)
{
tree section;
@@ -438,13 +455,13 @@ static bool stackleak_gate(void)
if (section && TREE_VALUE(section)) {
section = TREE_VALUE(TREE_VALUE(section));

- if (!strncmp(TREE_STRING_POINTER(section), ".init.text", 10))
+ if (STRING_EQUAL(section, ".init.text"))
return false;
- if (!strncmp(TREE_STRING_POINTER(section), ".devinit.text", 13))
+ if (STRING_EQUAL(section, ".devinit.text"))
return false;
- if (!strncmp(TREE_STRING_POINTER(section), ".cpuinit.text", 13))
+ if (STRING_EQUAL(section, ".cpuinit.text"))
return false;
- if (!strncmp(TREE_STRING_POINTER(section), ".meminit.text", 13))
+ if (STRING_EQUAL(section, ".meminit.text"))
return false;
}

--
2.34.1

2022-03-28 21:50:19

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3

From: Paul Menzel <[email protected]>

[ Upstream commit 633174a7046ec3b4572bec24ef98e6ee89bce14b ]

Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the
errors below:

$ cd lib/raid6/test/
$ make
<stdin>:1:1: error: stray ‘\’ in program
<stdin>:1:2: error: stray ‘#’ in program
<stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \
before ‘<’ token

[...]

The errors come from the HAS_ALTIVEC test, which fails, and the POWER
optimized versions are not built. That’s also reason nobody noticed on the
other architectures.

GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release
announcment:

> * WARNING: Backward-incompatibility!
> Number signs (#) appearing inside a macro reference or function invocation
> no longer introduce comments and should not be escaped with backslashes:
> thus a call such as:
> foo := $(shell echo '#')
> is legal. Previously the number sign needed to be escaped, for example:
> foo := $(shell echo '\#')
> Now this latter will resolve to "\#". If you want to write makefiles
> portable to both versions, assign the number sign to a variable:
> H := \#
> foo := $(shell echo '$H')
> This was claimed to be fixed in 3.81, but wasn't, for some reason.
> To detect this change search for 'nocomment' in the .FEATURES variable.

So, do the same as commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd
files for future Make") and commit 929bef467771 ("bpf: Use $(pound) instead
of \# in Makefiles") and define and use a $(pound) variable.

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Cc: Matt Brown <[email protected]>
Signed-off-by: Paul Menzel <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
lib/raid6/test/Makefile | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile
index a4c7cd74cff5..4fb7700a741b 100644
--- a/lib/raid6/test/Makefile
+++ b/lib/raid6/test/Makefile
@@ -4,6 +4,8 @@
# from userspace.
#

+pound := \#
+
CC = gcc
OPTFLAGS = -O2 # Adjust as desired
CFLAGS = -I.. -I ../../../include -g $(OPTFLAGS)
@@ -42,7 +44,7 @@ else ifeq ($(HAS_NEON),yes)
OBJS += neon.o neon1.o neon2.o neon4.o neon8.o recov_neon.o recov_neon_inner.o
CFLAGS += -DCONFIG_KERNEL_MODE_NEON=1
else
- HAS_ALTIVEC := $(shell printf '\#include <altivec.h>\nvector int a;\n' |\
+ HAS_ALTIVEC := $(shell printf '$(pound)include <altivec.h>\nvector int a;\n' |\
gcc -c -x c - >/dev/null && rm ./-.o && echo yes)
ifeq ($(HAS_ALTIVEC),yes)
CFLAGS += -I../../../arch/powerpc/include
--
2.34.1

2022-03-28 22:04:10

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field

From: "Paul E. McKenney" <[email protected]>

[ Upstream commit c09929031018913b5783872a8b8cdddef4a543c7 ]

KCSAN reports data races between the rcu_segcblist_clear_flags() and
rcu_segcblist_set_flags() functions, though misreporting the latter
as a call to rcu_segcblist_is_enabled() from call_rcu(). This commit
converts the updates of this field to WRITE_ONCE(), relying on the
resulting unmarked reads to continue to detect buggy concurrent writes
to this field.

Reported-by: Zhouyi Zhou <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/rcu/rcu_segcblist.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index e373fbe44da5..431cee212467 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -56,13 +56,13 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
static inline void rcu_segcblist_set_flags(struct rcu_segcblist *rsclp,
int flags)
{
- rsclp->flags |= flags;
+ WRITE_ONCE(rsclp->flags, rsclp->flags | flags);
}

static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp,
int flags)
{
- rsclp->flags &= ~flags;
+ WRITE_ONCE(rsclp->flags, rsclp->flags & ~flags);
}

static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp,
--
2.34.1

2022-03-28 22:09:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"

From: "Rafael J. Wysocki" <[email protected]>

[ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]

Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
_OSC regardless of the query flag") which caused legitimate usage
scenarios (when the platform firmware does not want the OS to control
certain platform features controlled by the system bus scope _OSC) to
break and was misguided by some misleading language in the _OSC
definition in the ACPI specification (in particular, Section 6.2.11.1.3
"Sequence of _OSC Calls" that contradicts other perts of the _OSC
definition).

Link: https://lore.kernel.org/linux-acpi/CAJZ5v0iStA0JmO0H3z+VgQsVuQONVjKPpw0F5HKfiq=Gb6B5yw@mail.gmail.com
Reported-by: Mario Limonciello <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Tested-by: Mario Limonciello <[email protected]>
Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/acpi/bus.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 07f604832fd6..079b952ab59f 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -332,21 +332,32 @@ static void acpi_bus_osc_negotiate_platform_control(void)
if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
return;

- kfree(context.ret.pointer);
+ capbuf_ret = context.ret.pointer;
+ if (context.ret.length <= OSC_SUPPORT_DWORD) {
+ kfree(context.ret.pointer);
+ return;
+ }

- /* Now run _OSC again with query flag clear */
+ /*
+ * Now run _OSC again with query flag clear and with the caps
+ * supported by both the OS and the platform.
+ */
capbuf[OSC_QUERY_DWORD] = 0;
+ capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD];
+ kfree(context.ret.pointer);

if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
return;

capbuf_ret = context.ret.pointer;
- osc_sb_apei_support_acked =
- capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
- osc_pc_lpi_support_confirmed =
- capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
- osc_sb_native_usb4_support_confirmed =
- capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
+ if (context.ret.length > OSC_SUPPORT_DWORD) {
+ osc_sb_apei_support_acked =
+ capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
+ osc_pc_lpi_support_confirmed =
+ capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
+ osc_sb_native_usb4_support_confirmed =
+ capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
+ }

kfree(context.ret.pointer);
}
--
2.34.1

2022-03-28 22:10:29

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32

From: "Jason A. Donenfeld" <[email protected]>

[ Upstream commit da3951ebdcd1cb1d5c750e08cd05aee7b0c04d9a ]

When the interrupt handler does not have a valid cycle counter, it calls
get_reg() to read a register from the irq stack, in round-robin.
Currently it does this assuming that registers are 32-bit. This is
_probably_ the case, and probably all platforms without cycle counters
are in fact 32-bit platforms. But maybe not, and either way, it's not
quite correct. This commit fixes that to deal with `unsigned long`
rather than `u32`.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/char/random.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2f21c5473d86..d2ce6b1a229d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1032,15 +1032,15 @@ static void add_interrupt_bench(cycles_t start)
#define add_interrupt_bench(x)
#endif

-static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
+static unsigned long get_reg(struct fast_pool *f, struct pt_regs *regs)
{
- u32 *ptr = (u32 *)regs;
+ unsigned long *ptr = (unsigned long *)regs;
unsigned int idx;

if (regs == NULL)
return 0;
idx = READ_ONCE(f->reg_idx);
- if (idx >= sizeof(struct pt_regs) / sizeof(u32))
+ if (idx >= sizeof(struct pt_regs) / sizeof(unsigned long))
idx = 0;
ptr += idx++;
WRITE_ONCE(f->reg_idx, idx);
--
2.34.1

2022-03-28 22:14:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show()

From: Chaitanya Kulkarni <[email protected]>

[ Upstream commit b27824d31f09ea7b4a6ba2c1b18bd328df3e8bed ]

sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Use a generic sysfs_emit function that knows the size of the
temporary buffer and ensures that no overrun is done for offset
attribute in
loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks.

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/block/loop.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 19fe19eaa50e..e65d1e24cab3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -681,33 +681,33 @@ static ssize_t loop_attr_backing_file_show(struct loop_device *lo, char *buf)

static ssize_t loop_attr_offset_show(struct loop_device *lo, char *buf)
{
- return sprintf(buf, "%llu\n", (unsigned long long)lo->lo_offset);
+ return sysfs_emit(buf, "%llu\n", (unsigned long long)lo->lo_offset);
}

static ssize_t loop_attr_sizelimit_show(struct loop_device *lo, char *buf)
{
- return sprintf(buf, "%llu\n", (unsigned long long)lo->lo_sizelimit);
+ return sysfs_emit(buf, "%llu\n", (unsigned long long)lo->lo_sizelimit);
}

static ssize_t loop_attr_autoclear_show(struct loop_device *lo, char *buf)
{
int autoclear = (lo->lo_flags & LO_FLAGS_AUTOCLEAR);

- return sprintf(buf, "%s\n", autoclear ? "1" : "0");
+ return sysfs_emit(buf, "%s\n", autoclear ? "1" : "0");
}

static ssize_t loop_attr_partscan_show(struct loop_device *lo, char *buf)
{
int partscan = (lo->lo_flags & LO_FLAGS_PARTSCAN);

- return sprintf(buf, "%s\n", partscan ? "1" : "0");
+ return sysfs_emit(buf, "%s\n", partscan ? "1" : "0");
}

static ssize_t loop_attr_dio_show(struct loop_device *lo, char *buf)
{
int dio = (lo->lo_flags & LO_FLAGS_DIRECT_IO);

- return sprintf(buf, "%s\n", dio ? "1" : "0");
+ return sysfs_emit(buf, "%s\n", dio ? "1" : "0");
}

LOOP_ATTR_RO(backing_file);
--
2.34.1

2022-03-28 22:16:20

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device

From: Marc Zyngier <[email protected]>

[ Upstream commit f7e53e2255808ca3abcc8f38d18ad0823425e771 ]

The npcm driver has a bunch of references to the irq_chip parent_device
field, but never sets it.

Fix it by fishing that reference from somewhere else, but it is
obvious that these debug statements were never used. Also remove
an unused field in a local data structure.

Signed-off-by: Marc Zyngier <[email protected]>
Acked-by: Bartosz Golaszewski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c | 25 +++++++++++------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
index 4d81908d6725..ba536fd4d674 100644
--- a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
+++ b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
@@ -78,7 +78,6 @@ struct npcm7xx_gpio {
struct gpio_chip gc;
int irqbase;
int irq;
- void *priv;
struct irq_chip irq_chip;
u32 pinctrl_id;
int (*direction_input)(struct gpio_chip *chip, unsigned offset);
@@ -226,7 +225,7 @@ static void npcmgpio_irq_handler(struct irq_desc *desc)
chained_irq_enter(chip, desc);
sts = ioread32(bank->base + NPCM7XX_GP_N_EVST);
en = ioread32(bank->base + NPCM7XX_GP_N_EVEN);
- dev_dbg(chip->parent_device, "==> got irq sts %.8x %.8x\n", sts,
+ dev_dbg(bank->gc.parent, "==> got irq sts %.8x %.8x\n", sts,
en);

sts &= en;
@@ -241,33 +240,33 @@ static int npcmgpio_set_irq_type(struct irq_data *d, unsigned int type)
gpiochip_get_data(irq_data_get_irq_chip_data(d));
unsigned int gpio = BIT(d->hwirq);

- dev_dbg(d->chip->parent_device, "setirqtype: %u.%u = %u\n", gpio,
+ dev_dbg(bank->gc.parent, "setirqtype: %u.%u = %u\n", gpio,
d->irq, type);
switch (type) {
case IRQ_TYPE_EDGE_RISING:
- dev_dbg(d->chip->parent_device, "edge.rising\n");
+ dev_dbg(bank->gc.parent, "edge.rising\n");
npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
break;
case IRQ_TYPE_EDGE_FALLING:
- dev_dbg(d->chip->parent_device, "edge.falling\n");
+ dev_dbg(bank->gc.parent, "edge.falling\n");
npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
break;
case IRQ_TYPE_EDGE_BOTH:
- dev_dbg(d->chip->parent_device, "edge.both\n");
+ dev_dbg(bank->gc.parent, "edge.both\n");
npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
break;
case IRQ_TYPE_LEVEL_LOW:
- dev_dbg(d->chip->parent_device, "level.low\n");
+ dev_dbg(bank->gc.parent, "level.low\n");
npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
break;
case IRQ_TYPE_LEVEL_HIGH:
- dev_dbg(d->chip->parent_device, "level.high\n");
+ dev_dbg(bank->gc.parent, "level.high\n");
npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
break;
default:
- dev_dbg(d->chip->parent_device, "invalid irq type\n");
+ dev_dbg(bank->gc.parent, "invalid irq type\n");
return -EINVAL;
}

@@ -289,7 +288,7 @@ static void npcmgpio_irq_ack(struct irq_data *d)
gpiochip_get_data(irq_data_get_irq_chip_data(d));
unsigned int gpio = d->hwirq;

- dev_dbg(d->chip->parent_device, "irq_ack: %u.%u\n", gpio, d->irq);
+ dev_dbg(bank->gc.parent, "irq_ack: %u.%u\n", gpio, d->irq);
iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVST);
}

@@ -301,7 +300,7 @@ static void npcmgpio_irq_mask(struct irq_data *d)
unsigned int gpio = d->hwirq;

/* Clear events */
- dev_dbg(d->chip->parent_device, "irq_mask: %u.%u\n", gpio, d->irq);
+ dev_dbg(bank->gc.parent, "irq_mask: %u.%u\n", gpio, d->irq);
iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVENC);
}

@@ -313,7 +312,7 @@ static void npcmgpio_irq_unmask(struct irq_data *d)
unsigned int gpio = d->hwirq;

/* Enable events */
- dev_dbg(d->chip->parent_device, "irq_unmask: %u.%u\n", gpio, d->irq);
+ dev_dbg(bank->gc.parent, "irq_unmask: %u.%u\n", gpio, d->irq);
iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVENS);
}

@@ -323,7 +322,7 @@ static unsigned int npcmgpio_irq_startup(struct irq_data *d)
unsigned int gpio = d->hwirq;

/* active-high, input, clear interrupt, enable interrupt */
- dev_dbg(d->chip->parent_device, "startup: %u.%u\n", gpio, d->irq);
+ dev_dbg(gc->parent, "startup: %u.%u\n", gpio, d->irq);
npcmgpio_direction_input(gc, gpio);
npcmgpio_irq_ack(d);
npcmgpio_irq_unmask(d);
--
2.34.1

2022-03-28 22:17:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there

From: "Rafael J. Wysocki" <[email protected]>

[ Upstream commit 0c9992315e738e7d6e927ef36839a466b080dba6 ]

ACPICA commit b1c3656ef4950098e530be68d4b589584f06cddc

Prevent acpi_ns_walk_namespace() from crashing when called with
start_node equal to ACPI_ROOT_OBJECT if the Namespace has not been
instantiated yet and acpi_gbl_root_node is NULL.

For instance, this can happen if the kernel is run with "acpi=off"
in the command line.

Link: https://github.com/acpica/acpica/commit/b1c3656ef4950098e530be68d4b589584f06cddc
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0hJWW_vZ3wwajE7xT38aWjY7cZyvqMJpXHzUL98-SiCVQ@mail.gmail.com/
Reported-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/acpi/acpica/nswalk.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/acpica/nswalk.c b/drivers/acpi/acpica/nswalk.c
index 915c2433463d..e7c30ce06e18 100644
--- a/drivers/acpi/acpica/nswalk.c
+++ b/drivers/acpi/acpica/nswalk.c
@@ -169,6 +169,9 @@ acpi_ns_walk_namespace(acpi_object_type type,

if (start_node == ACPI_ROOT_OBJECT) {
start_node = acpi_gbl_root_node;
+ if (!start_node) {
+ return_ACPI_STATUS(AE_NO_NAMESPACE);
+ }
}

/* Null child means "get first node" */
--
2.34.1

2022-03-28 22:22:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit

From: Casey Schaufler <[email protected]>

[ Upstream commit a5cd1ab7ab679d252a6d2f483eee7d45ebf2040c ]

Remove inappropriate use of ntohs() and assign the
port value directly.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
security/smack/smack_lsm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 14b279cc75c9..6207762dbdb1 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2510,7 +2510,7 @@ static int smk_ipv6_check(struct smack_known *subject,
#ifdef CONFIG_AUDIT
smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
ad.a.u.net->family = PF_INET6;
- ad.a.u.net->dport = ntohs(address->sin6_port);
+ ad.a.u.net->dport = address->sin6_port;
if (act == SMK_RECEIVING)
ad.a.u.net->v6info.saddr = address->sin6_addr;
else
--
2.34.1

2022-03-28 22:23:38

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script

From: Fangrui Song <[email protected]>

[ Upstream commit 4013e26670c590944abdab56c4fa797527b74325 ]

On ELF, (NOLOAD) sets the section type to SHT_NOBITS[1]. It is conceptually
inappropriate for .plt and .text.* sections which are always
SHT_PROGBITS.

In GNU ld, if PLT entries are needed, .plt will be SHT_PROGBITS anyway
and (NOLOAD) will be essentially ignored. In ld.lld, since
https://reviews.llvm.org/D118840 ("[ELF] Support (TYPE=<value>) to
customize the output section type"), ld.lld will report a `section type
mismatch` error. Just remove (NOLOAD) to fix the error.

[1] https://lld.llvm.org/ELF/linker_script.html As of today, "The
section should be marked as not loadable" on
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html is
outdated for ELF.

Tested-by: Nathan Chancellor <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Fangrui Song <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/arm64/include/asm/module.lds.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
index a11ccadd47d2..094701ec5500 100644
--- a/arch/arm64/include/asm/module.lds.h
+++ b/arch/arm64/include/asm/module.lds.h
@@ -1,8 +1,8 @@
SECTIONS {
#ifdef CONFIG_ARM64_MODULE_PLTS
- .plt 0 (NOLOAD) : { BYTE(0) }
- .init.plt 0 (NOLOAD) : { BYTE(0) }
- .text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
+ .plt 0 : { BYTE(0) }
+ .init.plt 0 : { BYTE(0) }
+ .text.ftrace_trampoline 0 : { BYTE(0) }
#endif

#ifdef CONFIG_KASAN_SW_TAGS
--
2.34.1

2022-03-28 22:25:00

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function

From: Daniel Lezcano <[email protected]>

[ Upstream commit 0aea2e4ec2a2bfa2d7e8820e37ba5b5ce04f20a5 ]

The release function does not reset the per cpu variable when it is
called. That will prevent creation again as the variable will be
already from the previous creation.

Fix it by resetting them.

Signed-off-by: Daniel Lezcano <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/powercap/dtpm_cpu.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
index b740866b228d..1e8cac699646 100644
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -150,10 +150,17 @@ static int update_pd_power_uw(struct dtpm *dtpm)
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
+ struct cpufreq_policy *policy;

if (freq_qos_request_active(&dtpm_cpu->qos_req))
freq_qos_remove_request(&dtpm_cpu->qos_req);

+ policy = cpufreq_cpu_get(dtpm_cpu->cpu);
+ if (policy) {
+ for_each_cpu(dtpm_cpu->cpu, policy->related_cpus)
+ per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL;
+ }
+
kfree(dtpm_cpu);
}

--
2.34.1

2022-03-28 22:26:03

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels

From: Oleg Nesterov <[email protected]>

[ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ]

On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.

When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
one of these is the spin lock used in signal handling.

Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spinlock_t lock that has been converted to a
sleeping lock. If this happens, the above issues with the corrupted
stack is possible.

Instead of calling the signal right away, for PREEMPT_RT and x86,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.

[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
[bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
[ tglx: Use a config option ]

Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/Ygq5aBB/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/Kconfig | 1 +
include/linux/sched.h | 3 +++
kernel/Kconfig.preempt | 12 +++++++++++-
kernel/entry/common.c | 14 ++++++++++++++
kernel/signal.c | 40 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9f5bd41bf660..d557ac29b6cd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -120,6 +120,7 @@ config X86
select ARCH_WANTS_NO_INSTR
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
+ select ARCH_WANTS_RT_DELAYED_SIGNALS
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
select BUILDTIME_TABLE_SORT
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..098e37fd770a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1087,6 +1087,9 @@ struct task_struct {
/* Restored if set_restore_sigmask() was used: */
sigset_t saved_sigmask;
struct sigpending pending;
+#ifdef CONFIG_RT_DELAYED_SIGNALS
+ struct kernel_siginfo forced_info;
+#endif
unsigned long sas_ss_sp;
size_t sas_ss_size;
unsigned int sas_ss_flags;
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index ce77f0265660..5644abd5f8a8 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -132,4 +132,14 @@ config SCHED_CORE
which is the likely usage by Linux distributions, there should
be no measurable impact on performance.

-
+config ARCH_WANTS_RT_DELAYED_SIGNALS
+ bool
+ help
+ This option is selected by architectures where raising signals
+ can happen in atomic contexts on PREEMPT_RT enabled kernels. This
+ option delays raising the signal until the return to user space
+ loop where it is also delivered. X86 requires this to deliver
+ signals from trap handlers which run on IST stacks.
+
+config RT_DELAYED_SIGNALS
+ def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bad713684c2e..0543a2c92f20 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work)
arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING);
}

+#ifdef CONFIG_RT_DELAYED_SIGNALS
+static inline void raise_delayed_signal(void)
+{
+ if (unlikely(current->forced_info.si_signo)) {
+ force_sig_info(&current->forced_info);
+ current->forced_info.si_signo = 0;
+ }
+}
+#else
+static inline void raise_delayed_signal(void) { }
+#endif
+
static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
unsigned long ti_work)
{
@@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
if (ti_work & _TIF_NEED_RESCHED)
schedule();

+ raise_delayed_signal();
+
if (ti_work & _TIF_UPROBE)
uprobe_notify_resume(regs);

diff --git a/kernel/signal.c b/kernel/signal.c
index 9b04631acde8..e93de6daa188 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1307,6 +1307,43 @@ enum sig_handler {
HANDLER_EXIT, /* Only visible as the process exit code */
};

+/*
+ * On some archictectures, PREEMPT_RT has to delay sending a signal from a
+ * trap since it cannot enable preemption, and the signal code's
+ * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME
+ * which will send the signal on exit of the trap.
+ */
+#ifdef CONFIG_RT_DELAYED_SIGNALS
+static inline bool force_sig_delayed(struct kernel_siginfo *info,
+ struct task_struct *t)
+{
+ if (!in_atomic())
+ return false;
+
+ if (WARN_ON_ONCE(t->forced_info.si_signo))
+ return true;
+
+ if (is_si_special(info)) {
+ WARN_ON_ONCE(info != SEND_SIG_PRIV);
+ t->forced_info.si_signo = info->si_signo;
+ t->forced_info.si_errno = 0;
+ t->forced_info.si_code = SI_KERNEL;
+ t->forced_info.si_pid = 0;
+ t->forced_info.si_uid = 0;
+ } else {
+ t->forced_info = *info;
+ }
+ set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+ return true;
+}
+#else
+static inline bool force_sig_delayed(struct kernel_siginfo *info,
+ struct task_struct *t)
+{
+ return false;
+}
+#endif
+
/*
* Force a signal that the process can't ignore: if necessary
* we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
struct k_sigaction *action;
int sig = info->si_signo;

+ if (force_sig_delayed(info, t))
+ return 0;
+
spin_lock_irqsave(&t->sighand->siglock, flags);
action = &t->sighand->action[sig-1];
ignored = action->sa.sa_handler == SIG_IGN;
--
2.34.1

2022-03-28 22:37:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data

From: Darren Hart <[email protected]>

[ Upstream commit 3f8dec116210ca649163574ed5f8df1e3b837d07 ]

Platforms with large BERT table data can trigger soft lockup errors
while attempting to print the entire BERT table data to the console at
boot:

watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1]

Observed on Ampere Altra systems with a single BERT record of ~250KB.

The original bert driver appears to have assumed relatively small table
data. Since it is impractical to reassemble large table data from
interwoven console messages, and the table data is available in

/sys/firmware/acpi/tables/data/BERT

limit the size for tables printed to the console to 1024 (for no reason
other than it seemed like a good place to kick off the discussion, would
appreciate feedback from existing users in terms of what size would
maintain their current usage model).

Alternatively, we could make printing a CONFIG option, use the
bert_disable boot arg (or something similar), or use a debug log level.
However, all those solutions require extra steps or change the existing
behavior for small table data. Limiting the size preserves existing
behavior on existing platforms with small table data, and eliminates the
soft lockups for platforms with large table data, while still making it
available.

Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/acpi/apei/bert.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/bert.c b/drivers/acpi/apei/bert.c
index 19e50fcbf4d6..ad8ab3f12cf3 100644
--- a/drivers/acpi/apei/bert.c
+++ b/drivers/acpi/apei/bert.c
@@ -29,6 +29,7 @@

#undef pr_fmt
#define pr_fmt(fmt) "BERT: " fmt
+#define ACPI_BERT_PRINT_MAX_LEN 1024

static int bert_disable;

@@ -58,8 +59,11 @@ static void __init bert_print_all(struct acpi_bert_region *region,
}

pr_info_once("Error records from previous boot:\n");
-
- cper_estatus_print(KERN_INFO HW_ERR, estatus);
+ if (region_len < ACPI_BERT_PRINT_MAX_LEN)
+ cper_estatus_print(KERN_INFO HW_ERR, estatus);
+ else
+ pr_info_once("Max print length exceeded, table data is available at:\n"
+ "/sys/firmware/acpi/tables/data/BERT");

/*
* Because the boot error source is "one-time polled" type,
--
2.34.1

2022-03-28 22:38:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files

From: Akira Kawata <[email protected]>

[ Upstream commit 0da1d5002745cdc721bc018b582a8a9704d56c42 ]

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197921

As pointed out in the discussion of buglink, we cannot calculate AT_PHDR
as the sum of load_addr and exec->e_phoff.

: The AT_PHDR of ELF auxiliary vectors should point to the memory address
: of program header. But binfmt_elf.c calculates this address as follows:
:
: NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
:
: which is wrong since e_phoff is the file offset of program header and
: load_addr is the memory base address from PT_LOAD entry.
:
: The ld.so uses AT_PHDR as the memory address of program header. In normal
: case, since the e_phoff is usually 64 and in the first PT_LOAD region, it
: is the correct program header address.
:
: But if the address of program header isn't equal to the first PT_LOAD
: address + e_phoff (e.g. Put the program header in other non-consecutive
: PT_LOAD region), ld.so will try to read program header from wrong address
: then crash or use incorrect program header.

This is because exec->e_phoff
is the offset of PHDRs in the file and the address of PHDRs in the
memory may differ from it. This patch fixes the bug by calculating the
address of program headers from PT_LOADs directly.

Signed-off-by: Akira Kawata <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
fs/binfmt_elf.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d61543fbd652..af0965c10619 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -170,8 +170,8 @@ static int padzero(unsigned long elf_bss)

static int
create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
- unsigned long load_addr, unsigned long interp_load_addr,
- unsigned long e_entry)
+ unsigned long interp_load_addr,
+ unsigned long e_entry, unsigned long phdr_addr)
{
struct mm_struct *mm = current->mm;
unsigned long p = bprm->p;
@@ -257,7 +257,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
- NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
+ NEW_AUX_ENT(AT_PHDR, phdr_addr);
NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);
NEW_AUX_ENT(AT_BASE, interp_load_addr);
@@ -823,7 +823,7 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr,
static int load_elf_binary(struct linux_binprm *bprm)
{
struct file *interpreter = NULL; /* to shut gcc up */
- unsigned long load_addr = 0, load_bias = 0;
+ unsigned long load_addr, load_bias = 0, phdr_addr = 0;
int load_addr_set = 0;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
@@ -1180,6 +1180,17 @@ static int load_elf_binary(struct linux_binprm *bprm)
reloc_func_desc = load_bias;
}
}
+
+ /*
+ * Figure out which segment in the file contains the Program
+ * Header table, and map to the associated memory address.
+ */
+ if (elf_ppnt->p_offset <= elf_ex->e_phoff &&
+ elf_ex->e_phoff < elf_ppnt->p_offset + elf_ppnt->p_filesz) {
+ phdr_addr = elf_ex->e_phoff - elf_ppnt->p_offset +
+ elf_ppnt->p_vaddr;
+ }
+
k = elf_ppnt->p_vaddr;
if ((elf_ppnt->p_flags & PF_X) && k < start_code)
start_code = k;
@@ -1215,6 +1226,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
}

e_entry = elf_ex->e_entry + load_bias;
+ phdr_addr += load_bias;
elf_bss += load_bias;
elf_brk += load_bias;
start_code += load_bias;
@@ -1278,8 +1290,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
goto out;
#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */

- retval = create_elf_tables(bprm, elf_ex,
- load_addr, interp_load_addr, e_entry);
+ retval = create_elf_tables(bprm, elf_ex, interp_load_addr,
+ e_entry, phdr_addr);
if (retval < 0)
goto out;

--
2.34.1

2022-03-28 22:38:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels


Thank you for cc'ing me. You probably want to hold off on back-porting
this patch. The appropriate fix requires some more conversation.

At a mininum this patch should not be using TIF_NOTIFY_RESUME.

Eric



Sasha Levin <[email protected]> writes:

> From: Oleg Nesterov <[email protected]>
>
> [ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ]
>
> On x86_64 we must disable preemption before we enable interrupts
> for stack faults, int3 and debugging, because the current task is using
> a per CPU debug stack defined by the IST. If we schedule out, another task
> can come in and use the same stack and cause the stack to be corrupted
> and crash the kernel on return.
>
> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
> one of these is the spin lock used in signal handling.
>
> Some of the debug code (int3) causes do_trap() to send a signal.
> This function calls a spinlock_t lock that has been converted to a
> sleeping lock. If this happens, the above issues with the corrupted
> stack is possible.
>
> Instead of calling the signal right away, for PREEMPT_RT and x86,
> the signal information is stored on the stacks task_struct and
> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
> code will send the signal when preemption is enabled.
>
> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
> ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
> [ tglx: Use a config option ]
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Link: https://lore.kernel.org/r/Ygq5aBB/[email protected]
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> arch/x86/Kconfig | 1 +
> include/linux/sched.h | 3 +++
> kernel/Kconfig.preempt | 12 +++++++++++-
> kernel/entry/common.c | 14 ++++++++++++++
> kernel/signal.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9f5bd41bf660..d557ac29b6cd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -120,6 +120,7 @@ config X86
> select ARCH_WANTS_NO_INSTR
> select ARCH_WANT_HUGE_PMD_SHARE
> select ARCH_WANT_LD_ORPHAN_WARN
> + select ARCH_WANTS_RT_DELAYED_SIGNALS
> select ARCH_WANTS_THP_SWAP if X86_64
> select ARCH_HAS_PARANOID_L1D_FLUSH
> select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 75ba8aa60248..098e37fd770a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1087,6 +1087,9 @@ struct task_struct {
> /* Restored if set_restore_sigmask() was used: */
> sigset_t saved_sigmask;
> struct sigpending pending;
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> + struct kernel_siginfo forced_info;
> +#endif
> unsigned long sas_ss_sp;
> size_t sas_ss_size;
> unsigned int sas_ss_flags;
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index ce77f0265660..5644abd5f8a8 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -132,4 +132,14 @@ config SCHED_CORE
> which is the likely usage by Linux distributions, there should
> be no measurable impact on performance.
>
> -
> +config ARCH_WANTS_RT_DELAYED_SIGNALS
> + bool
> + help
> + This option is selected by architectures where raising signals
> + can happen in atomic contexts on PREEMPT_RT enabled kernels. This
> + option delays raising the signal until the return to user space
> + loop where it is also delivered. X86 requires this to deliver
> + signals from trap handlers which run on IST stacks.
> +
> +config RT_DELAYED_SIGNALS
> + def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index bad713684c2e..0543a2c92f20 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work)
> arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING);
> }
>
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline void raise_delayed_signal(void)
> +{
> + if (unlikely(current->forced_info.si_signo)) {
> + force_sig_info(&current->forced_info);
> + current->forced_info.si_signo = 0;
> + }
> +}
> +#else
> +static inline void raise_delayed_signal(void) { }
> +#endif
> +
> static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> unsigned long ti_work)
> {
> @@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> if (ti_work & _TIF_NEED_RESCHED)
> schedule();
>
> + raise_delayed_signal();
> +
> if (ti_work & _TIF_UPROBE)
> uprobe_notify_resume(regs);
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9b04631acde8..e93de6daa188 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1307,6 +1307,43 @@ enum sig_handler {
> HANDLER_EXIT, /* Only visible as the process exit code */
> };
>
> +/*
> + * On some archictectures, PREEMPT_RT has to delay sending a signal from a
> + * trap since it cannot enable preemption, and the signal code's
> + * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME
> + * which will send the signal on exit of the trap.
> + */
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> + struct task_struct *t)
> +{
> + if (!in_atomic())
> + return false;
> +
> + if (WARN_ON_ONCE(t->forced_info.si_signo))
> + return true;
> +
> + if (is_si_special(info)) {
> + WARN_ON_ONCE(info != SEND_SIG_PRIV);
> + t->forced_info.si_signo = info->si_signo;
> + t->forced_info.si_errno = 0;
> + t->forced_info.si_code = SI_KERNEL;
> + t->forced_info.si_pid = 0;
> + t->forced_info.si_uid = 0;
> + } else {
> + t->forced_info = *info;
> + }
> + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
> + return true;
> +}
> +#else
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> + struct task_struct *t)
> +{
> + return false;
> +}
> +#endif
> +
> /*
> * Force a signal that the process can't ignore: if necessary
> * we unblock the signal and change any SIG_IGN to SIG_DFL.
> @@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
> struct k_sigaction *action;
> int sig = info->si_signo;
>
> + if (force_sig_delayed(info, t))
> + return 0;
> +
> spin_lock_irqsave(&t->sighand->siglock, flags);
> action = &t->sighand->action[sig-1];
> ignored = action->sa.sa_handler == SIG_IGN;

2022-03-28 22:54:37

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking

From: "Jason A. Donenfeld" <[email protected]>

[ Upstream commit 77760fd7f7ae3dfd03668204e708d1568d75447d ]

Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.

This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.

Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Reported-by: Jonathan Neuschäfer <[email protected]>
Tested-by: Jonathan Neuschäfer <[email protected]>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/char/random.c | 55 ++++++++++++++++++++++---------------------
1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 882f78829a24..34ee34b30993 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1876,13 +1876,16 @@ static int __init random_sysctls_init(void)
device_initcall(random_sysctls_init);
#endif /* CONFIG_SYSCTL */

+static atomic_t batch_generation = ATOMIC_INIT(0);
+
struct batched_entropy {
union {
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
+ local_lock_t lock;
unsigned int position;
- spinlock_t batch_lock;
+ int generation;
};

/*
@@ -1894,7 +1897,7 @@ struct batched_entropy {
* point prior.
*/
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u64.lock)
};

u64 get_random_u64(void)
@@ -1903,67 +1906,65 @@ u64 get_random_u64(void)
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

+ local_lock_irqsave(&batched_entropy_u64.lock, flags);
batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batched_entropy_u64.lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);

static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u32.lock)
};
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

+ local_lock_irqsave(&batched_entropy_u32.lock, flags);
batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u32);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batched_entropy_u32.lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);

/* It's important to invalidate all potential batched entropy that might
* be stored before the crng is initialized, which we can do lazily by
- * simply resetting the counter to zero so that it's re-extracted on the
- * next usage. */
+ * bumping the generation counter.
+ */
static void invalidate_batched_entropy(void)
{
- int cpu;
- unsigned long flags;
-
- for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
- }
+ atomic_inc(&batch_generation);
}

/**
--
2.34.1

2022-07-07 21:38:47

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"

[Public]



> -----Original Message-----
> From: Tom Crossland <[email protected]>
> Sent: Thursday, July 7, 2022 16:31
> To: Sasha Levin <[email protected]>; [email protected];
> [email protected]
> Cc: Rafael J. Wysocki <[email protected]>; Limonciello, Mario
> <[email protected]>; Huang, Ray <[email protected]>; Mika
> Westerberg <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same
> capabilities to the _OSC regardless of the query flag"
>
> Hi, I'm observing the issue described here which I think is due to a
> recent regression:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.c
> om%2Fintel%2Flinux-intel-
> lts%2Fissues%2F22&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7
> C77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e18
> 3d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> 0%7C%7C%7C&amp;sdata=X%2FEAU9GbRD%2FfYxCMUmnWI1cJ8dk8sICk0iYu
> %2BKGqtl4%3D&amp;reserved=0
>
> sudo dmesg -t -l err
>
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR01._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR02._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR03._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
>
> System:
> ? Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
> ??? parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
> ??? root=xxx intel_iommu=on iommu=pt
> ?Machine:
> ? Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
> ??? UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022
>
> I hope this is the correct forum to report the issue. Apologies if not.
>

This is the fix for it:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=7feec7430edddb87c24b0a86b08a03d0b496a755


> On 28/03/2022 13.18, Sasha Levin wrote:
> > From: "Rafael J. Wysocki" <[email protected]>
> >
> > [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
> >
> > Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> > _OSC regardless of the query flag") which caused legitimate usage
> > scenarios (when the platform firmware does not want the OS to control
> > certain platform features controlled by the system bus scope _OSC) to
> > break and was misguided by some misleading language in the _OSC
> > definition in the ACPI specification (in particular, Section 6.2.11.1.3
> > "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> > definition).
> >
> > Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Flinux-
> acpi%2FCAJZ5v0iStA0JmO0H3z%2BVgQsVuQONVjKPpw0F5HKfiq%3DGb6B5yw%
> 40mail.gmail.com&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7C
> 77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e183
> d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> 0%7C%7C%7C&amp;sdata=Te3BK%2B0q2QmrqqoG5mbV%2FNguoMgiwzILNHl
> %2BhUMLFlY%3D&amp;reserved=0
> > Reported-by: Mario Limonciello <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > Tested-by: Mario Limonciello <[email protected]>
> > Acked-by: Huang Rui <[email protected]>
> > Reviewed-by: Mika Westerberg <[email protected]>
> > Signed-off-by: Sasha Levin <[email protected]>
> > ---
> > drivers/acpi/bus.c | 27 +++++++++++++++++++--------
> > 1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 07f604832fd6..079b952ab59f 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -332,21 +332,32 @@ static void
> acpi_bus_osc_negotiate_platform_control(void)
> > if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > return;
> >
> > - kfree(context.ret.pointer);
> > + capbuf_ret = context.ret.pointer;
> > + if (context.ret.length <= OSC_SUPPORT_DWORD) {
> > + kfree(context.ret.pointer);
> > + return;
> > + }
> >
> > - /* Now run _OSC again with query flag clear */
> > + /*
> > + * Now run _OSC again with query flag clear and with the caps
> > + * supported by both the OS and the platform.
> > + */
> > capbuf[OSC_QUERY_DWORD] = 0;
> > + capbuf[OSC_SUPPORT_DWORD] =
> capbuf_ret[OSC_SUPPORT_DWORD];
> > + kfree(context.ret.pointer);
> >
> > if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > return;
> >
> > capbuf_ret = context.ret.pointer;
> > - osc_sb_apei_support_acked =
> > - capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_APEI_SUPPORT;
> > - osc_pc_lpi_support_confirmed =
> > - capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_PCLPI_SUPPORT;
> > - osc_sb_native_usb4_support_confirmed =
> > - capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_NATIVE_USB4_SUPPORT;
> > + if (context.ret.length > OSC_SUPPORT_DWORD) {
> > + osc_sb_apei_support_acked =
> > + capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_APEI_SUPPORT;
> > + osc_pc_lpi_support_confirmed =
> > + capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_PCLPI_SUPPORT;
> > + osc_sb_native_usb4_support_confirmed =
> > + capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_NATIVE_USB4_SUPPORT;
> > + }
> >
> > kfree(context.ret.pointer);
> > }

2022-07-07 21:49:49

by Tom Crossland

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"

Hi, I'm observing the issue described here which I think is due to a
recent regression:

https://github.com/intel/linux-intel-lts/issues/22

sudo dmesg -t -l err

ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR01._CPC due to previous error
(AE_NOT_FOUND) (20211217/psparse-529)
ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR02._CPC due to previous error
(AE_NOT_FOUND) (20211217/psparse-529)
ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR03._CPC due to previous error
(AE_NOT_FOUND) (20211217/psparse-529)

System:
  Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
    parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
    root=xxx intel_iommu=on iommu=pt
 Machine:
  Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
    UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022

I hope this is the correct forum to report the issue. Apologies if not.

On 28/03/2022 13.18, Sasha Levin wrote:
> From: "Rafael J. Wysocki" <[email protected]>
>
> [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
>
> Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> _OSC regardless of the query flag") which caused legitimate usage
> scenarios (when the platform firmware does not want the OS to control
> certain platform features controlled by the system bus scope _OSC) to
> break and was misguided by some misleading language in the _OSC
> definition in the ACPI specification (in particular, Section 6.2.11.1.3
> "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> definition).
>
> Link: https://lore.kernel.org/linux-acpi/CAJZ5v0iStA0JmO0H3z+VgQsVuQONVjKPpw0F5HKfiq=Gb6B5yw@mail.gmail.com
> Reported-by: Mario Limonciello <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> Tested-by: Mario Limonciello <[email protected]>
> Acked-by: Huang Rui <[email protected]>
> Reviewed-by: Mika Westerberg <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> drivers/acpi/bus.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 07f604832fd6..079b952ab59f 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -332,21 +332,32 @@ static void acpi_bus_osc_negotiate_platform_control(void)
> if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> return;
>
> - kfree(context.ret.pointer);
> + capbuf_ret = context.ret.pointer;
> + if (context.ret.length <= OSC_SUPPORT_DWORD) {
> + kfree(context.ret.pointer);
> + return;
> + }
>
> - /* Now run _OSC again with query flag clear */
> + /*
> + * Now run _OSC again with query flag clear and with the caps
> + * supported by both the OS and the platform.
> + */
> capbuf[OSC_QUERY_DWORD] = 0;
> + capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD];
> + kfree(context.ret.pointer);
>
> if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> return;
>
> capbuf_ret = context.ret.pointer;
> - osc_sb_apei_support_acked =
> - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> - osc_pc_lpi_support_confirmed =
> - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
> - osc_sb_native_usb4_support_confirmed =
> - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
> + if (context.ret.length > OSC_SUPPORT_DWORD) {
> + osc_sb_apei_support_acked =
> + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> + osc_pc_lpi_support_confirmed =
> + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
> + osc_sb_native_usb4_support_confirmed =
> + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
> + }
>
> kfree(context.ret.pointer);
> }

2022-07-08 09:35:26

by Tom Crossland

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"

I can confirm that the ACPI BIOS Errors no longer appear in the kernel
log using mainline 5.19.0-rc5 with the patch applied.

Many thanks

On Thu, Jul 7, 2022 at 11:36 PM Limonciello, Mario
<[email protected]> wrote:
>
> [Public]
>
>
>
> > -----Original Message-----
> > From: Tom Crossland <[email protected]>
> > Sent: Thursday, July 7, 2022 16:31
> > To: Sasha Levin <[email protected]>; [email protected];
> > [email protected]
> > Cc: Rafael J. Wysocki <[email protected]>; Limonciello, Mario
> > <[email protected]>; Huang, Ray <[email protected]>; Mika
> > Westerberg <[email protected]>; [email protected]; linux-
> > [email protected]
> > Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same
> > capabilities to the _OSC regardless of the query flag"
> >
> > Hi, I'm observing the issue described here which I think is due to a
> > recent regression:
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.c
> > om%2Fintel%2Flinux-intel-
> > lts%2Fissues%2F22&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7
> > C77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e18
> > 3d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWI
> > joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7C&amp;sdata=X%2FEAU9GbRD%2FfYxCMUmnWI1cJ8dk8sICk0iYu
> > %2BKGqtl4%3D&amp;reserved=0
> >
> > sudo dmesg -t -l err
> >
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR01._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR02._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR03._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> >
> > System:
> > Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
> > parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
> > root=xxx intel_iommu=on iommu=pt
> > Machine:
> > Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
> > UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022
> >
> > I hope this is the correct forum to report the issue. Apologies if not.
> >
>
> This is the fix for it:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=7feec7430edddb87c24b0a86b08a03d0b496a755
>
>
> > On 28/03/2022 13.18, Sasha Levin wrote:
> > > From: "Rafael J. Wysocki" <[email protected]>
> > >
> > > [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
> > >
> > > Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> > > _OSC regardless of the query flag") which caused legitimate usage
> > > scenarios (when the platform firmware does not want the OS to control
> > > certain platform features controlled by the system bus scope _OSC) to
> > > break and was misguided by some misleading language in the _OSC
> > > definition in the ACPI specification (in particular, Section 6.2.11.1.3
> > > "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> > > definition).
> > >
> > > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > nel.org%2Flinux-
> > acpi%2FCAJZ5v0iStA0JmO0H3z%2BVgQsVuQONVjKPpw0F5HKfiq%3DGb6B5yw%
> > 40mail.gmail.com&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7C
> > 77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e183
> > d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7C&amp;sdata=Te3BK%2B0q2QmrqqoG5mbV%2FNguoMgiwzILNHl
> > %2BhUMLFlY%3D&amp;reserved=0
> > > Reported-by: Mario Limonciello <[email protected]>
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > Tested-by: Mario Limonciello <[email protected]>
> > > Acked-by: Huang Rui <[email protected]>
> > > Reviewed-by: Mika Westerberg <[email protected]>
> > > Signed-off-by: Sasha Levin <[email protected]>
> > > ---
> > > drivers/acpi/bus.c | 27 +++++++++++++++++++--------
> > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > index 07f604832fd6..079b952ab59f 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -332,21 +332,32 @@ static void
> > acpi_bus_osc_negotiate_platform_control(void)
> > > if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > > return;
> > >
> > > - kfree(context.ret.pointer);
> > > + capbuf_ret = context.ret.pointer;
> > > + if (context.ret.length <= OSC_SUPPORT_DWORD) {
> > > + kfree(context.ret.pointer);
> > > + return;
> > > + }
> > >
> > > - /* Now run _OSC again with query flag clear */
> > > + /*
> > > + * Now run _OSC again with query flag clear and with the caps
> > > + * supported by both the OS and the platform.
> > > + */
> > > capbuf[OSC_QUERY_DWORD] = 0;
> > > + capbuf[OSC_SUPPORT_DWORD] =
> > capbuf_ret[OSC_SUPPORT_DWORD];
> > > + kfree(context.ret.pointer);
> > >
> > > if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > > return;
> > >
> > > capbuf_ret = context.ret.pointer;
> > > - osc_sb_apei_support_acked =
> > > - capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_APEI_SUPPORT;
> > > - osc_pc_lpi_support_confirmed =
> > > - capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_PCLPI_SUPPORT;
> > > - osc_sb_native_usb4_support_confirmed =
> > > - capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_NATIVE_USB4_SUPPORT;
> > > + if (context.ret.length > OSC_SUPPORT_DWORD) {
> > > + osc_sb_apei_support_acked =
> > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_APEI_SUPPORT;
> > > + osc_pc_lpi_support_confirmed =
> > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_PCLPI_SUPPORT;
> > > + osc_sb_native_usb4_support_confirmed =
> > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_NATIVE_USB4_SUPPORT;
> > > + }
> > >
> > > kfree(context.ret.pointer);
> > > }