2020-10-31 11:45:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 00/70] 5.8.18-rc1 review

-------------------------
Note, this is going to be the LAST 5.8.y kernel release. After this
one, this branch is now end-of-life. Please move to the 5.9.y branch at
this point in time.
-------------------------

This is the start of the stable review cycle for the 5.8.18 release.
There are 70 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.18-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.8.18-rc1

Pali Rohár <[email protected]>
phy: marvell: comphy: Convert internal SMCC firmware return codes to errno

Ricky Wu <[email protected]>
misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

Stafford Horne <[email protected]>
openrisc: Fix issue with get_user for 64-bit values

Souptick Joarder <[email protected]>
xen/gntdev.c: Mark pages as dirty

Geert Uytterhoeven <[email protected]>
ata: sata_rcar: Fix DMA boundary mask

Grygorii Strashko <[email protected]>
PM: runtime: Fix timer_expires data type on 32-bit arches

Peter Zijlstra <[email protected]>
serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt

Paras Sharma <[email protected]>
serial: qcom_geni_serial: To correct QUP Version detection logic

Chris Wilson <[email protected]>
drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex

Gustavo A. R. Silva <[email protected]>
mtd: lpddr: Fix bad logic in print_drs_error

Jason Gunthorpe <[email protected]>
RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()

Frederic Barrat <[email protected]>
cxl: Rework error message for incompatible slots

Jia-Ju Bai <[email protected]>
p54: avoid accessing the data mapped to streaming DMA

Roberto Sassu <[email protected]>
evm: Check size of security.evm before using it

Song Liu <[email protected]>
bpf: Fix comment for helper bpf_current_task_under_cgroup()

Miklos Szeredi <[email protected]>
fuse: fix page dereference after free

Pali Rohár <[email protected]>
ata: ahci: mvebu: Make SATA PHY optional for Armada 3720

Pali Rohár <[email protected]>
PCI: aardvark: Fix initialization with old Marvell's Arm Trusted Firmware

Juergen Gross <[email protected]>
x86/xen: disable Firmware First mode for correctable memory errors

Thomas Gleixner <[email protected]>
x86/traps: Fix #DE Oops message regression

Kim Phillips <[email protected]>
arch/x86/amd/ibs: Fix re-arming IBS Fetch

Gao Xiang <[email protected]>
erofs: avoid duplicated permission check for "trusted." xattrs

Leon Romanovsky <[email protected]>
net: protect tcf_block_unbind with block lock

Tung Nguyen <[email protected]>
tipc: fix memory leak caused by tipc_buf_append()

Arjun Roy <[email protected]>
tcp: Prevent low rmem stalls with SO_RCVLOWAT.

Andrew Gabbasov <[email protected]>
ravb: Fix bit fields checking in ravb_hwtstamp_get()

Heiner Kallweit <[email protected]>
r8169: fix issue with forced threading in combination with shared interrupts

Guillaume Nault <[email protected]>
net/sched: act_mpls: Add softdep on mpls_gso.ko

Alex Elder <[email protected]>
net: ipa: command payloads already mapped

Zenghui Yu <[email protected]>
net: hns3: Clear the CMDQ registers before unmapping BAR region

Aleksandr Nogikh <[email protected]>
netem: fix zero division in tabledist

Ido Schimmel <[email protected]>
mlxsw: core: Fix memory leak on module removal

Lijun Pan <[email protected]>
ibmvnic: fix ibmvnic_set_mac

Thomas Bogendoerfer <[email protected]>
ibmveth: Fix use of ibmveth in a bridge.

Masahiro Fujiwara <[email protected]>
gtp: fix an use-before-init in gtp_newlink()

Raju Rangoju <[email protected]>
cxgb4: set up filter action after rewrites

Vinay Kumar Yadav <[email protected]>
chelsio/chtls: fix tls record info to user

Vinay Kumar Yadav <[email protected]>
chelsio/chtls: fix memory leaks in CPL handlers

Vinay Kumar Yadav <[email protected]>
chelsio/chtls: fix deadlock issue

Vasundhara Volam <[email protected]>
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.

Vasundhara Volam <[email protected]>
bnxt_en: Re-write PCI BARs after PCI fatal error.

Vasundhara Volam <[email protected]>
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.

Vasundhara Volam <[email protected]>
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().

Michael Chan <[email protected]>
bnxt_en: Check abort error state in bnxt_open_nic().

Michael Schaller <[email protected]>
efivarfs: Replace invalid slashes with exclamation marks in dentries.

Dan Williams <[email protected]>
x86/copy_mc: Introduce copy_mc_enhanced_fast_string()

Dan Williams <[email protected]>
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

Randy Dunlap <[email protected]>
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled

Nick Desaulniers <[email protected]>
arm64: link with -z norelro regardless of CONFIG_RELOCATABLE

Marc Zyngier <[email protected]>
arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs

Marc Zyngier <[email protected]>
arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs

Kees Cook <[email protected]>
fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum

Ard Biesheuvel <[email protected]>
efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure

Rasmus Villemoes <[email protected]>
scripts/setlocalversion: make git describe output more reliable

Matthew Wilcox (Oracle) <[email protected]>
io_uring: Convert advanced XArray uses to the normal API

Matthew Wilcox (Oracle) <[email protected]>
io_uring: Fix XArray usage in io_uring_add_task_file

Matthew Wilcox (Oracle) <[email protected]>
io_uring: Fix use of XArray in __io_uring_files_cancel

Jens Axboe <[email protected]>
io_uring: no need to call xa_destroy() on empty xarray

Hillf Danton <[email protected]>
io-wq: fix use-after-free in io_wq_worker_running

Sebastian Andrzej Siewior <[email protected]>
io_wq: Make io_wqe::lock a raw_spinlock_t

Jens Axboe <[email protected]>
io_uring: reference ->nsproxy for file table commands

Jens Axboe <[email protected]>
io_uring: don't rely on weak ->files references

Jens Axboe <[email protected]>
io_uring: enable task/files specific overflow flushing

Jens Axboe <[email protected]>
io_uring: return cancelation status from poll/timeout/files handlers

Jens Axboe <[email protected]>
io_uring: unconditionally grab req->task

Jens Axboe <[email protected]>
io_uring: stash ctx task reference for SQPOLL

Jens Axboe <[email protected]>
io_uring: move dropping of files into separate helper

Jens Axboe <[email protected]>
io_uring: allow timeout/poll/files killing to take task into account

Jens Axboe <[email protected]>
io_uring: don't run task work on an exiting task

Saeed Mirzamohammadi <[email protected]>
netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create


-------------

Diffstat:

Makefile | 4 +-
arch/arm64/Makefile | 4 +-
arch/arm64/kernel/cpu_errata.c | 15 +
arch/openrisc/include/asm/uaccess.h | 35 +-
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/string.h | 2 -
arch/powerpc/include/asm/uaccess.h | 40 +-
arch/powerpc/lib/Makefile | 2 +-
.../lib/{memcpy_mcsafe_64.S => copy_mc_64.S} | 4 +-
arch/x86/Kconfig | 2 +-
arch/x86/Kconfig.debug | 2 +-
arch/x86/events/amd/ibs.c | 15 +-
arch/x86/include/asm/copy_mc_test.h | 75 ++++
arch/x86/include/asm/mce.h | 9 +
arch/x86/include/asm/mcsafe_test.h | 75 ----
arch/x86/include/asm/string_64.h | 32 --
arch/x86/include/asm/uaccess.h | 9 +
arch/x86/include/asm/uaccess_64.h | 20 -
arch/x86/kernel/cpu/mce/core.c | 8 +-
arch/x86/kernel/quirks.c | 10 +-
arch/x86/kernel/traps.c | 2 +-
arch/x86/lib/Makefile | 1 +
arch/x86/lib/copy_mc.c | 96 +++++
arch/x86/lib/copy_mc_64.S | 163 +++++++
arch/x86/lib/memcpy_64.S | 115 -----
arch/x86/lib/usercopy_64.c | 21 -
arch/x86/pci/intel_mid_pci.c | 1 +
arch/x86/xen/enlighten_pv.c | 9 +
drivers/ata/ahci.h | 2 +
drivers/ata/ahci_mvebu.c | 2 +-
drivers/ata/libahci_platform.c | 2 +-
drivers/ata/sata_rcar.c | 2 +-
drivers/base/firmware_loader/fallback_platform.c | 2 +-
drivers/crypto/chelsio/chtls/chtls_cm.c | 29 +-
drivers/crypto/chelsio/chtls/chtls_io.c | 7 +-
drivers/firmware/efi/libstub/arm64-stub.c | 8 +-
drivers/firmware/efi/libstub/fdt.c | 4 +-
drivers/gpu/drm/i915/i915_debugfs.c | 2 +
drivers/infiniband/core/addr.c | 11 +-
drivers/md/dm-writecache.c | 15 +-
drivers/misc/cardreader/rtsx_pcr.c | 4 -
drivers/misc/cxl/pci.c | 4 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 49 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 56 ++-
drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h | 4 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 6 -
drivers/net/ethernet/ibm/ibmvnic.c | 8 +-
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +
drivers/net/ethernet/realtek/r8169_main.c | 4 +-
drivers/net/ethernet/renesas/ravb_main.c | 10 +-
drivers/net/gtp.c | 16 +-
drivers/net/ipa/gsi_trans.c | 21 +-
drivers/net/wireless/intersil/p54/p54pci.c | 4 +-
drivers/nvdimm/claim.c | 2 +-
drivers/nvdimm/pmem.c | 6 +-
drivers/pci/controller/pci-aardvark.c | 4 +-
drivers/phy/marvell/phy-mvebu-a3700-comphy.c | 14 +-
drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 14 +-
drivers/tty/serial/amba-pl011.c | 11 +-
drivers/tty/serial/qcom_geni_serial.c | 2 +-
drivers/xen/gntdev.c | 17 +-
fs/efivarfs/super.c | 3 +
fs/erofs/xattr.c | 2 -
fs/exec.c | 6 +
fs/file.c | 2 +
fs/fuse/dev.c | 28 +-
fs/io-wq.c | 172 ++++----
fs/io-wq.h | 1 +
fs/io_uring.c | 475 ++++++++++++++++-----
include/linux/fs.h | 1 -
include/linux/io_uring.h | 53 +++
include/linux/mtd/pfow.h | 2 +-
include/linux/pm.h | 2 +-
include/linux/qcom-geni-se.h | 3 +
include/linux/sched.h | 5 +
include/linux/string.h | 9 +-
include/linux/uaccess.h | 13 +
include/linux/uio.h | 10 +-
include/net/netfilter/nf_tables.h | 6 +
include/uapi/linux/bpf.h | 4 +-
init/init_task.c | 3 +
kernel/fork.c | 6 +
lib/Kconfig | 7 +-
lib/iov_iter.c | 48 +--
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_input.c | 3 +-
net/netfilter/nf_tables_api.c | 6 +-
net/netfilter/nf_tables_offload.c | 4 +-
net/sched/act_mpls.c | 1 +
net/sched/cls_api.c | 4 +-
net/sched/sch_netem.c | 9 +-
net/tipc/msg.c | 5 +-
scripts/setlocalversion | 21 +-
security/integrity/evm/evm_main.c | 6 +
tools/arch/x86/include/asm/mcsafe_test.h | 13 -
tools/arch/x86/lib/memcpy_64.S | 115 -----
tools/include/uapi/linux/bpf.h | 4 +-
tools/objtool/check.c | 5 +-
tools/perf/bench/Build | 1 -
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 --
tools/testing/nvdimm/test/nfit.c | 49 +--
.../testing/selftests/powerpc/copyloops/.gitignore | 2 +-
tools/testing/selftests/powerpc/copyloops/Makefile | 6 +-
.../selftests/powerpc/copyloops/copy_mc_64.S | 242 +++++++++++
106 files changed, 1595 insertions(+), 908 deletions(-)



2020-10-31 11:45:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 50/70] arch/x86/amd/ibs: Fix re-arming IBS Fetch

From: Kim Phillips <[email protected]>

commit 221bfce5ebbdf72ff08b3bf2510ae81058ee568b upstream.

Stephane Eranian found a bug in that IBS' current Fetch counter was not
being reset when the driver would write the new value to clear it along
with the enable bit set, and found that adding an MSR write that would
first disable IBS Fetch would make IBS Fetch reset its current count.

Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12,
2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when
IbsFetchEn is changed from 0 to 1."

Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch,
so the driver properly resets the internal counter to 0 and IBS
Fetch starts counting again.

A family 15h machine tested does not have this problem, and the extra
wrmsr is also not needed on Family 19h, so only do the extra wrmsr on
families 16h through 18h.

Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Kim Phillips <[email protected]>
[peterz: optimized]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/events/amd/ibs.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -89,6 +89,7 @@ struct perf_ibs {
u64 max_period;
unsigned long offset_mask[1];
int offset_max;
+ unsigned int fetch_count_reset_broken : 1;
struct cpu_perf_ibs __percpu *pcpu;

struct attribute **format_attrs;
@@ -363,7 +364,12 @@ perf_ibs_event_update(struct perf_ibs *p
static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
struct hw_perf_event *hwc, u64 config)
{
- wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+ u64 tmp = hwc->config | config;
+
+ if (perf_ibs->fetch_count_reset_broken)
+ wrmsrl(hwc->config_base, tmp & ~perf_ibs->enable_mask);
+
+ wrmsrl(hwc->config_base, tmp | perf_ibs->enable_mask);
}

/*
@@ -733,6 +739,13 @@ static __init void perf_event_ibs_init(v
{
struct attribute **attr = ibs_op_format_attrs;

+ /*
+ * Some chips fail to reset the fetch count when it is written; instead
+ * they need a 0-1 transition of IbsFetchEn.
+ */
+ if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
+ perf_ibs_fetch.fetch_count_reset_broken = 1;
+
perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");

if (ibs_caps & IBS_CAPS_OPCNT) {


2020-10-31 11:45:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 34/70] chelsio/chtls: fix tls record info to user

From: Vinay Kumar Yadav <[email protected]>

[ Upstream commit 4f3391ce8f5a69e7e6d66d0a3fc654eb6dbdc919 ]

chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.

v1->v2:
- Improved commit message.

Fixes: 17a7d24aa89d ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/crypto/chelsio/chtls/chtls_io.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -1585,6 +1585,7 @@ skip_copy:
tp->urg_data = 0;

if ((avail + offset) >= skb->len) {
+ struct sk_buff *next_skb;
if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_TLS_HDR) {
tp->copied_seq += skb->len;
hws->rcvpld = skb->hdr_len;
@@ -1595,8 +1596,10 @@ skip_copy:
chtls_free_skb(sk, skb);
buffers_freed++;
hws->copied_seq = 0;
- if (copied >= target &&
- !skb_peek(&sk->sk_receive_queue))
+ next_skb = skb_peek(&sk->sk_receive_queue);
+ if (copied >= target && !next_skb)
+ break;
+ if (ULP_SKB_CB(next_skb)->flags & ULPCB_FLAG_TLS_HDR)
break;
}
} while (len > 0);


2020-10-31 11:45:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 24/70] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

From: Dan Williams <[email protected]>

commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <[email protected]> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <[email protected]> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: <[email protected]>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 40 +-
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_mc_64.S | 242 +++++++++++++++++
arch/powerpc/lib/memcpy_mcsafe_64.S | 242 -----------------
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_mc_test.h | 75 +++++
arch/x86/include/asm/mce.h | 9
arch/x86/include/asm/mcsafe_test.h | 75 -----
arch/x86/include/asm/string_64.h | 32 --
arch/x86/include/asm/uaccess.h | 9
arch/x86/include/asm/uaccess_64.h | 20 -
arch/x86/kernel/cpu/mce/core.c | 8
arch/x86/kernel/quirks.c | 10
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_mc.c | 82 +++++
arch/x86/lib/copy_mc_64.S | 127 ++++++++
arch/x86/lib/memcpy_64.S | 115 --------
arch/x86/lib/usercopy_64.c | 21 -
drivers/md/dm-writecache.c | 15 -
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6
include/linux/string.h | 9
include/linux/uaccess.h | 13
include/linux/uio.h | 10
lib/Kconfig | 7
lib/iov_iter.c | 48 +--
tools/arch/x86/include/asm/mcsafe_test.h | 13
tools/arch/x86/lib/memcpy_64.S | 115 --------
tools/objtool/check.c | 4
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 -
tools/testing/nvdimm/test/nfit.c | 49 +--
tools/testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6
tools/testing/selftests/powerpc/copyloops/copy_mc_64.S | 242 +++++++++++++++++
38 files changed, 914 insertions(+), 770 deletions(-)

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,7 @@ config PPC
select ARCH_HAS_STRICT_KERNEL_RWX if (PPC32 && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
- select ARCH_HAS_UACCESS_MCSAFE if PPC64
+ select ARCH_HAS_COPY_MC if PPC64
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_KEEP_MEMBLOCK
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -53,9 +53,7 @@ void *__memmove(void *to, const void *fr
#ifndef CONFIG_KASAN
#define __HAVE_ARCH_MEMSET32
#define __HAVE_ARCH_MEMSET64
-#define __HAVE_ARCH_MEMCPY_MCSAFE

-extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz);
extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -436,6 +436,32 @@ do { \
extern unsigned long __copy_tofrom_user(void __user *to,
const void __user *from, unsigned long size);

+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long __must_check
+copy_mc_generic(void *to, const void *from, unsigned long size);
+
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned long size)
+{
+ return copy_mc_generic(to, from, size);
+}
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long n)
+{
+ if (likely(check_copy_size(from, n, true))) {
+ if (access_ok(to, n)) {
+ allow_write_to_user(to, n);
+ n = copy_mc_generic((void *)to, from, n);
+ prevent_write_to_user(to, n);
+ }
+ }
+
+ return n;
+}
+#endif
+
#ifdef __powerpc64__
static inline unsigned long
raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
@@ -524,20 +550,6 @@ raw_copy_to_user(void __user *to, const
return ret;
}

-static __always_inline unsigned long __must_check
-copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n)
-{
- if (likely(check_copy_size(from, n, true))) {
- if (access_ok(to, n)) {
- allow_write_to_user(to, n);
- n = memcpy_mcsafe((void *)to, from, n);
- prevent_write_to_user(to, n);
- }
- }
-
- return n;
-}
-
unsigned long __arch_clear_user(void __user *addr, unsigned long size);

static inline unsigned long clear_user(void __user *addr, unsigned long size)
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -39,7 +39,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_
memcpy_power7.o

obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
- memcpy_64.o memcpy_mcsafe_64.o
+ memcpy_64.o copy_mc_64.o

obj64-$(CONFIG_SMP) += locks.o
obj64-$(CONFIG_ALTIVEC) += vmx-helper.o
--- /dev/null
+++ b/arch/powerpc/lib/copy_mc_64.S
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) IBM Corporation, 2011
+ * Derived from copyuser_power7.s by Anton Blanchard <[email protected]>
+ * Author - Balbir Singh <[email protected]>
+ */
+#include <asm/ppc_asm.h>
+#include <asm/errno.h>
+#include <asm/export.h>
+
+ .macro err1
+100:
+ EX_TABLE(100b,.Ldo_err1)
+ .endm
+
+ .macro err2
+200:
+ EX_TABLE(200b,.Ldo_err2)
+ .endm
+
+ .macro err3
+300: EX_TABLE(300b,.Ldone)
+ .endm
+
+.Ldo_err2:
+ ld r22,STK_REG(R22)(r1)
+ ld r21,STK_REG(R21)(r1)
+ ld r20,STK_REG(R20)(r1)
+ ld r19,STK_REG(R19)(r1)
+ ld r18,STK_REG(R18)(r1)
+ ld r17,STK_REG(R17)(r1)
+ ld r16,STK_REG(R16)(r1)
+ ld r15,STK_REG(R15)(r1)
+ ld r14,STK_REG(R14)(r1)
+ addi r1,r1,STACKFRAMESIZE
+.Ldo_err1:
+ /* Do a byte by byte copy to get the exact remaining size */
+ mtctr r7
+46:
+err3; lbz r0,0(r4)
+ addi r4,r4,1
+err3; stb r0,0(r3)
+ addi r3,r3,1
+ bdnz 46b
+ li r3,0
+ blr
+
+.Ldone:
+ mfctr r3
+ blr
+
+
+_GLOBAL(copy_mc_generic)
+ mr r7,r5
+ cmpldi r5,16
+ blt .Lshort_copy
+
+.Lcopy:
+ /* Get the source 8B aligned */
+ neg r6,r4
+ mtocrf 0x01,r6
+ clrldi r6,r6,(64-3)
+
+ bf cr7*4+3,1f
+err1; lbz r0,0(r4)
+ addi r4,r4,1
+err1; stb r0,0(r3)
+ addi r3,r3,1
+ subi r7,r7,1
+
+1: bf cr7*4+2,2f
+err1; lhz r0,0(r4)
+ addi r4,r4,2
+err1; sth r0,0(r3)
+ addi r3,r3,2
+ subi r7,r7,2
+
+2: bf cr7*4+1,3f
+err1; lwz r0,0(r4)
+ addi r4,r4,4
+err1; stw r0,0(r3)
+ addi r3,r3,4
+ subi r7,r7,4
+
+3: sub r5,r5,r6
+ cmpldi r5,128
+
+ mflr r0
+ stdu r1,-STACKFRAMESIZE(r1)
+ std r14,STK_REG(R14)(r1)
+ std r15,STK_REG(R15)(r1)
+ std r16,STK_REG(R16)(r1)
+ std r17,STK_REG(R17)(r1)
+ std r18,STK_REG(R18)(r1)
+ std r19,STK_REG(R19)(r1)
+ std r20,STK_REG(R20)(r1)
+ std r21,STK_REG(R21)(r1)
+ std r22,STK_REG(R22)(r1)
+ std r0,STACKFRAMESIZE+16(r1)
+
+ blt 5f
+ srdi r6,r5,7
+ mtctr r6
+
+ /* Now do cacheline (128B) sized loads and stores. */
+ .align 5
+4:
+err2; ld r0,0(r4)
+err2; ld r6,8(r4)
+err2; ld r8,16(r4)
+err2; ld r9,24(r4)
+err2; ld r10,32(r4)
+err2; ld r11,40(r4)
+err2; ld r12,48(r4)
+err2; ld r14,56(r4)
+err2; ld r15,64(r4)
+err2; ld r16,72(r4)
+err2; ld r17,80(r4)
+err2; ld r18,88(r4)
+err2; ld r19,96(r4)
+err2; ld r20,104(r4)
+err2; ld r21,112(r4)
+err2; ld r22,120(r4)
+ addi r4,r4,128
+err2; std r0,0(r3)
+err2; std r6,8(r3)
+err2; std r8,16(r3)
+err2; std r9,24(r3)
+err2; std r10,32(r3)
+err2; std r11,40(r3)
+err2; std r12,48(r3)
+err2; std r14,56(r3)
+err2; std r15,64(r3)
+err2; std r16,72(r3)
+err2; std r17,80(r3)
+err2; std r18,88(r3)
+err2; std r19,96(r3)
+err2; std r20,104(r3)
+err2; std r21,112(r3)
+err2; std r22,120(r3)
+ addi r3,r3,128
+ subi r7,r7,128
+ bdnz 4b
+
+ clrldi r5,r5,(64-7)
+
+ /* Up to 127B to go */
+5: srdi r6,r5,4
+ mtocrf 0x01,r6
+
+6: bf cr7*4+1,7f
+err2; ld r0,0(r4)
+err2; ld r6,8(r4)
+err2; ld r8,16(r4)
+err2; ld r9,24(r4)
+err2; ld r10,32(r4)
+err2; ld r11,40(r4)
+err2; ld r12,48(r4)
+err2; ld r14,56(r4)
+ addi r4,r4,64
+err2; std r0,0(r3)
+err2; std r6,8(r3)
+err2; std r8,16(r3)
+err2; std r9,24(r3)
+err2; std r10,32(r3)
+err2; std r11,40(r3)
+err2; std r12,48(r3)
+err2; std r14,56(r3)
+ addi r3,r3,64
+ subi r7,r7,64
+
+7: ld r14,STK_REG(R14)(r1)
+ ld r15,STK_REG(R15)(r1)
+ ld r16,STK_REG(R16)(r1)
+ ld r17,STK_REG(R17)(r1)
+ ld r18,STK_REG(R18)(r1)
+ ld r19,STK_REG(R19)(r1)
+ ld r20,STK_REG(R20)(r1)
+ ld r21,STK_REG(R21)(r1)
+ ld r22,STK_REG(R22)(r1)
+ addi r1,r1,STACKFRAMESIZE
+
+ /* Up to 63B to go */
+ bf cr7*4+2,8f
+err1; ld r0,0(r4)
+err1; ld r6,8(r4)
+err1; ld r8,16(r4)
+err1; ld r9,24(r4)
+ addi r4,r4,32
+err1; std r0,0(r3)
+err1; std r6,8(r3)
+err1; std r8,16(r3)
+err1; std r9,24(r3)
+ addi r3,r3,32
+ subi r7,r7,32
+
+ /* Up to 31B to go */
+8: bf cr7*4+3,9f
+err1; ld r0,0(r4)
+err1; ld r6,8(r4)
+ addi r4,r4,16
+err1; std r0,0(r3)
+err1; std r6,8(r3)
+ addi r3,r3,16
+ subi r7,r7,16
+
+9: clrldi r5,r5,(64-4)
+
+ /* Up to 15B to go */
+.Lshort_copy:
+ mtocrf 0x01,r5
+ bf cr7*4+0,12f
+err1; lwz r0,0(r4) /* Less chance of a reject with word ops */
+err1; lwz r6,4(r4)
+ addi r4,r4,8
+err1; stw r0,0(r3)
+err1; stw r6,4(r3)
+ addi r3,r3,8
+ subi r7,r7,8
+
+12: bf cr7*4+1,13f
+err1; lwz r0,0(r4)
+ addi r4,r4,4
+err1; stw r0,0(r3)
+ addi r3,r3,4
+ subi r7,r7,4
+
+13: bf cr7*4+2,14f
+err1; lhz r0,0(r4)
+ addi r4,r4,2
+err1; sth r0,0(r3)
+ addi r3,r3,2
+ subi r7,r7,2
+
+14: bf cr7*4+3,15f
+err1; lbz r0,0(r4)
+err1; stb r0,0(r3)
+
+15: li r3,0
+ blr
+
+EXPORT_SYMBOL_GPL(copy_mc_generic);
--- a/arch/powerpc/lib/memcpy_mcsafe_64.S
+++ /dev/null
@@ -1,242 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (C) IBM Corporation, 2011
- * Derived from copyuser_power7.s by Anton Blanchard <[email protected]>
- * Author - Balbir Singh <[email protected]>
- */
-#include <asm/ppc_asm.h>
-#include <asm/errno.h>
-#include <asm/export.h>
-
- .macro err1
-100:
- EX_TABLE(100b,.Ldo_err1)
- .endm
-
- .macro err2
-200:
- EX_TABLE(200b,.Ldo_err2)
- .endm
-
- .macro err3
-300: EX_TABLE(300b,.Ldone)
- .endm
-
-.Ldo_err2:
- ld r22,STK_REG(R22)(r1)
- ld r21,STK_REG(R21)(r1)
- ld r20,STK_REG(R20)(r1)
- ld r19,STK_REG(R19)(r1)
- ld r18,STK_REG(R18)(r1)
- ld r17,STK_REG(R17)(r1)
- ld r16,STK_REG(R16)(r1)
- ld r15,STK_REG(R15)(r1)
- ld r14,STK_REG(R14)(r1)
- addi r1,r1,STACKFRAMESIZE
-.Ldo_err1:
- /* Do a byte by byte copy to get the exact remaining size */
- mtctr r7
-46:
-err3; lbz r0,0(r4)
- addi r4,r4,1
-err3; stb r0,0(r3)
- addi r3,r3,1
- bdnz 46b
- li r3,0
- blr
-
-.Ldone:
- mfctr r3
- blr
-
-
-_GLOBAL(memcpy_mcsafe)
- mr r7,r5
- cmpldi r5,16
- blt .Lshort_copy
-
-.Lcopy:
- /* Get the source 8B aligned */
- neg r6,r4
- mtocrf 0x01,r6
- clrldi r6,r6,(64-3)
-
- bf cr7*4+3,1f
-err1; lbz r0,0(r4)
- addi r4,r4,1
-err1; stb r0,0(r3)
- addi r3,r3,1
- subi r7,r7,1
-
-1: bf cr7*4+2,2f
-err1; lhz r0,0(r4)
- addi r4,r4,2
-err1; sth r0,0(r3)
- addi r3,r3,2
- subi r7,r7,2
-
-2: bf cr7*4+1,3f
-err1; lwz r0,0(r4)
- addi r4,r4,4
-err1; stw r0,0(r3)
- addi r3,r3,4
- subi r7,r7,4
-
-3: sub r5,r5,r6
- cmpldi r5,128
-
- mflr r0
- stdu r1,-STACKFRAMESIZE(r1)
- std r14,STK_REG(R14)(r1)
- std r15,STK_REG(R15)(r1)
- std r16,STK_REG(R16)(r1)
- std r17,STK_REG(R17)(r1)
- std r18,STK_REG(R18)(r1)
- std r19,STK_REG(R19)(r1)
- std r20,STK_REG(R20)(r1)
- std r21,STK_REG(R21)(r1)
- std r22,STK_REG(R22)(r1)
- std r0,STACKFRAMESIZE+16(r1)
-
- blt 5f
- srdi r6,r5,7
- mtctr r6
-
- /* Now do cacheline (128B) sized loads and stores. */
- .align 5
-4:
-err2; ld r0,0(r4)
-err2; ld r6,8(r4)
-err2; ld r8,16(r4)
-err2; ld r9,24(r4)
-err2; ld r10,32(r4)
-err2; ld r11,40(r4)
-err2; ld r12,48(r4)
-err2; ld r14,56(r4)
-err2; ld r15,64(r4)
-err2; ld r16,72(r4)
-err2; ld r17,80(r4)
-err2; ld r18,88(r4)
-err2; ld r19,96(r4)
-err2; ld r20,104(r4)
-err2; ld r21,112(r4)
-err2; ld r22,120(r4)
- addi r4,r4,128
-err2; std r0,0(r3)
-err2; std r6,8(r3)
-err2; std r8,16(r3)
-err2; std r9,24(r3)
-err2; std r10,32(r3)
-err2; std r11,40(r3)
-err2; std r12,48(r3)
-err2; std r14,56(r3)
-err2; std r15,64(r3)
-err2; std r16,72(r3)
-err2; std r17,80(r3)
-err2; std r18,88(r3)
-err2; std r19,96(r3)
-err2; std r20,104(r3)
-err2; std r21,112(r3)
-err2; std r22,120(r3)
- addi r3,r3,128
- subi r7,r7,128
- bdnz 4b
-
- clrldi r5,r5,(64-7)
-
- /* Up to 127B to go */
-5: srdi r6,r5,4
- mtocrf 0x01,r6
-
-6: bf cr7*4+1,7f
-err2; ld r0,0(r4)
-err2; ld r6,8(r4)
-err2; ld r8,16(r4)
-err2; ld r9,24(r4)
-err2; ld r10,32(r4)
-err2; ld r11,40(r4)
-err2; ld r12,48(r4)
-err2; ld r14,56(r4)
- addi r4,r4,64
-err2; std r0,0(r3)
-err2; std r6,8(r3)
-err2; std r8,16(r3)
-err2; std r9,24(r3)
-err2; std r10,32(r3)
-err2; std r11,40(r3)
-err2; std r12,48(r3)
-err2; std r14,56(r3)
- addi r3,r3,64
- subi r7,r7,64
-
-7: ld r14,STK_REG(R14)(r1)
- ld r15,STK_REG(R15)(r1)
- ld r16,STK_REG(R16)(r1)
- ld r17,STK_REG(R17)(r1)
- ld r18,STK_REG(R18)(r1)
- ld r19,STK_REG(R19)(r1)
- ld r20,STK_REG(R20)(r1)
- ld r21,STK_REG(R21)(r1)
- ld r22,STK_REG(R22)(r1)
- addi r1,r1,STACKFRAMESIZE
-
- /* Up to 63B to go */
- bf cr7*4+2,8f
-err1; ld r0,0(r4)
-err1; ld r6,8(r4)
-err1; ld r8,16(r4)
-err1; ld r9,24(r4)
- addi r4,r4,32
-err1; std r0,0(r3)
-err1; std r6,8(r3)
-err1; std r8,16(r3)
-err1; std r9,24(r3)
- addi r3,r3,32
- subi r7,r7,32
-
- /* Up to 31B to go */
-8: bf cr7*4+3,9f
-err1; ld r0,0(r4)
-err1; ld r6,8(r4)
- addi r4,r4,16
-err1; std r0,0(r3)
-err1; std r6,8(r3)
- addi r3,r3,16
- subi r7,r7,16
-
-9: clrldi r5,r5,(64-4)
-
- /* Up to 15B to go */
-.Lshort_copy:
- mtocrf 0x01,r5
- bf cr7*4+0,12f
-err1; lwz r0,0(r4) /* Less chance of a reject with word ops */
-err1; lwz r6,4(r4)
- addi r4,r4,8
-err1; stw r0,0(r3)
-err1; stw r6,4(r3)
- addi r3,r3,8
- subi r7,r7,8
-
-12: bf cr7*4+1,13f
-err1; lwz r0,0(r4)
- addi r4,r4,4
-err1; stw r0,0(r3)
- addi r3,r3,4
- subi r7,r7,4
-
-13: bf cr7*4+2,14f
-err1; lhz r0,0(r4)
- addi r4,r4,2
-err1; sth r0,0(r3)
- addi r3,r3,2
- subi r7,r7,2
-
-14: bf cr7*4+3,15f
-err1; lbz r0,0(r4)
-err1; stb r0,0(r3)
-
-15: li r3,0
- blr
-
-EXPORT_SYMBOL_GPL(memcpy_mcsafe);
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -75,7 +75,7 @@ config X86
select ARCH_HAS_PTE_DEVMAP if X86_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64
- select ARCH_HAS_UACCESS_MCSAFE if X86_64 && X86_MCE
+ select ARCH_HAS_COPY_MC if X86_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_STRICT_KERNEL_RWX
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -59,7 +59,7 @@ config EARLY_PRINTK_USB_XDBC
You should normally say N here, unless you want to debug early
crashes or need a very simple printk logging facility.

-config MCSAFE_TEST
+config COPY_MC_TEST
def_bool n

config EFI_PGT_DUMP
--- /dev/null
+++ b/arch/x86/include/asm/copy_mc_test.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _COPY_MC_TEST_H_
+#define _COPY_MC_TEST_H_
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_COPY_MC_TEST
+extern unsigned long copy_mc_test_src;
+extern unsigned long copy_mc_test_dst;
+
+static inline void copy_mc_inject_src(void *addr)
+{
+ if (addr)
+ copy_mc_test_src = (unsigned long) addr;
+ else
+ copy_mc_test_src = ~0UL;
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+ if (addr)
+ copy_mc_test_dst = (unsigned long) addr;
+ else
+ copy_mc_test_dst = ~0UL;
+}
+#else /* CONFIG_COPY_MC_TEST */
+static inline void copy_mc_inject_src(void *addr)
+{
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+}
+#endif /* CONFIG_COPY_MC_TEST */
+
+#else /* __ASSEMBLY__ */
+#include <asm/export.h>
+
+#ifdef CONFIG_COPY_MC_TEST
+.macro COPY_MC_TEST_CTL
+ .pushsection .data
+ .align 8
+ .globl copy_mc_test_src
+ copy_mc_test_src:
+ .quad 0
+ EXPORT_SYMBOL_GPL(copy_mc_test_src)
+ .globl copy_mc_test_dst
+ copy_mc_test_dst:
+ .quad 0
+ EXPORT_SYMBOL_GPL(copy_mc_test_dst)
+ .popsection
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+ leaq \count(\reg), %r9
+ cmp copy_mc_test_src, %r9
+ ja \target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+ leaq \count(\reg), %r9
+ cmp copy_mc_test_dst, %r9
+ ja \target
+.endm
+#else
+.macro COPY_MC_TEST_CTL
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+.endm
+#endif /* CONFIG_COPY_MC_TEST */
+#endif /* __ASSEMBLY__ */
+#endif /* _COPY_MC_TEST_H_ */
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -174,6 +174,15 @@ extern void mce_unregister_decode_chain(

extern int mce_p5_enabled;

+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern void enable_copy_mc_fragile(void);
+unsigned long __must_check copy_mc_fragile(void *dst, const void *src, unsigned cnt);
+#else
+static inline void enable_copy_mc_fragile(void)
+{
+}
+#endif
+
#ifdef CONFIG_X86_MCE
int mcheck_init(void);
void mcheck_cpu_init(struct cpuinfo_x86 *c);
--- a/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,75 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-#ifndef __ASSEMBLY__
-#ifdef CONFIG_MCSAFE_TEST
-extern unsigned long mcsafe_test_src;
-extern unsigned long mcsafe_test_dst;
-
-static inline void mcsafe_inject_src(void *addr)
-{
- if (addr)
- mcsafe_test_src = (unsigned long) addr;
- else
- mcsafe_test_src = ~0UL;
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
- if (addr)
- mcsafe_test_dst = (unsigned long) addr;
- else
- mcsafe_test_dst = ~0UL;
-}
-#else /* CONFIG_MCSAFE_TEST */
-static inline void mcsafe_inject_src(void *addr)
-{
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
-}
-#endif /* CONFIG_MCSAFE_TEST */
-
-#else /* __ASSEMBLY__ */
-#include <asm/export.h>
-
-#ifdef CONFIG_MCSAFE_TEST
-.macro MCSAFE_TEST_CTL
- .pushsection .data
- .align 8
- .globl mcsafe_test_src
- mcsafe_test_src:
- .quad 0
- EXPORT_SYMBOL_GPL(mcsafe_test_src)
- .globl mcsafe_test_dst
- mcsafe_test_dst:
- .quad 0
- EXPORT_SYMBOL_GPL(mcsafe_test_dst)
- .popsection
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
- leaq \count(\reg), %r9
- cmp mcsafe_test_src, %r9
- ja \target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
- leaq \count(\reg), %r9
- cmp mcsafe_test_dst, %r9
- ja \target
-.endm
-#else
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* CONFIG_MCSAFE_TEST */
-#endif /* __ASSEMBLY__ */
-#endif /* _MCSAFE_TEST_H_ */
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -82,38 +82,6 @@ int strcmp(const char *cs, const char *c

#endif

-#define __HAVE_ARCH_MEMCPY_MCSAFE 1
-__must_check unsigned long __memcpy_mcsafe(void *dst, const void *src,
- size_t cnt);
-DECLARE_STATIC_KEY_FALSE(mcsafe_key);
-
-/**
- * memcpy_mcsafe - copy memory with indication if a machine check happened
- *
- * @dst: destination address
- * @src: source address
- * @cnt: number of bytes to copy
- *
- * Low level memory copy function that catches machine checks
- * We only call into the "safe" function on systems that can
- * actually do machine check recovery. Everyone else can just
- * use memcpy().
- *
- * Return 0 for success, or number of bytes not copied if there was an
- * exception.
- */
-static __always_inline __must_check unsigned long
-memcpy_mcsafe(void *dst, const void *src, size_t cnt)
-{
-#ifdef CONFIG_X86_MCE
- if (static_branch_unlikely(&mcsafe_key))
- return __memcpy_mcsafe(dst, src, cnt);
- else
-#endif
- memcpy(dst, src, cnt);
- return 0;
-}
-
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
void __memcpy_flushcache(void *dst, const void *src, size_t cnt);
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -455,6 +455,15 @@ extern __must_check long strnlen_user(co
unsigned long __must_check clear_user(void __user *mem, unsigned long len);
unsigned long __must_check __clear_user(void __user *mem, unsigned long len);

+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned len);
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+unsigned long __must_check
+copy_mc_to_user(void *to, const void *from, unsigned len);
+#endif
+
/*
* movsl can be slow when source and dest are not both 8-byte aligned
*/
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -47,22 +47,6 @@ copy_user_generic(void *to, const void *
}

static __always_inline __must_check unsigned long
-copy_to_user_mcsafe(void *to, const void *from, unsigned len)
-{
- unsigned long ret;
-
- __uaccess_begin();
- /*
- * Note, __memcpy_mcsafe() is explicitly used since it can
- * handle exceptions / faults. memcpy_mcsafe() may fall back to
- * memcpy() which lacks this handling.
- */
- ret = __memcpy_mcsafe(to, from, len);
- __uaccess_end();
- return ret;
-}
-
-static __always_inline __must_check unsigned long
raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
{
return copy_user_generic(dst, (__force void *)src, size);
@@ -102,8 +86,4 @@ __copy_from_user_flushcache(void *dst, c
kasan_check_write(dst, size);
return __copy_user_flushcache(dst, src, size);
}
-
-unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len);
-
#endif /* _ASM_X86_UACCESS_64_H */
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -40,7 +40,6 @@
#include <linux/debugfs.h>
#include <linux/irq_work.h>
#include <linux/export.h>
-#include <linux/jump_label.h>
#include <linux/set_memory.h>
#include <linux/task_work.h>
#include <linux/hardirq.h>
@@ -2122,7 +2121,7 @@ void mce_disable_bank(int bank)
and older.
* mce=nobootlog Don't log MCEs from before booting.
* mce=bios_cmci_threshold Don't program the CMCI threshold
- * mce=recovery force enable memcpy_mcsafe()
+ * mce=recovery force enable copy_mc_fragile()
*/
static int __init mcheck_enable(char *str)
{
@@ -2730,13 +2729,10 @@ static void __init mcheck_debugfs_init(v
static void __init mcheck_debugfs_init(void) { }
#endif

-DEFINE_STATIC_KEY_FALSE(mcsafe_key);
-EXPORT_SYMBOL_GPL(mcsafe_key);
-
static int __init mcheck_late_init(void)
{
if (mca_cfg.recovery)
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();

mcheck_debugfs_init();

--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -8,6 +8,7 @@

#include <asm/hpet.h>
#include <asm/setup.h>
+#include <asm/mce.h>

#if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)

@@ -624,10 +625,6 @@ static void amd_disable_seq_and_redirect
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3,
amd_disable_seq_and_redirect_scrub);

-#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE)
-#include <linux/jump_label.h>
-#include <asm/string_64.h>
-
/* Ivy Bridge, Haswell, Broadwell */
static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev)
{
@@ -636,7 +633,7 @@ static void quirk_intel_brickland_xeon_r
pci_read_config_dword(pdev, 0x84, &capid0);

if (capid0 & 0x10)
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();
}

/* Skylake */
@@ -653,7 +650,7 @@ static void quirk_intel_purley_xeon_ras_
* enabled, so memory machine check recovery is also enabled.
*/
if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0))
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();

}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap);
@@ -661,7 +658,6 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, quirk_intel_brickland_xeon_ras_cap);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2083, quirk_intel_purley_xeon_ras_cap);
#endif
-#endif

bool x86_apple_machine;
EXPORT_SYMBOL(x86_apple_machine);
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp
lib-y := delay.o misc.o cmdline.o cpu.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
+lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
--- /dev/null
+++ b/arch/x86/lib/copy_mc.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/jump_label.h>
+#include <linux/uaccess.h>
+#include <linux/export.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include <asm/mce.h>
+
+#ifdef CONFIG_X86_MCE
+/*
+ * See COPY_MC_TEST for self-test of the copy_mc_fragile()
+ * implementation.
+ */
+static DEFINE_STATIC_KEY_FALSE(copy_mc_fragile_key);
+
+void enable_copy_mc_fragile(void)
+{
+ static_branch_inc(&copy_mc_fragile_key);
+}
+#define copy_mc_fragile_enabled (static_branch_unlikely(&copy_mc_fragile_key))
+
+/*
+ * Similar to copy_user_handle_tail, probe for the write fault point, or
+ * source exception point.
+ */
+__visible notrace unsigned long
+copy_mc_fragile_handle_tail(char *to, char *from, unsigned len)
+{
+ for (; len; --len, to++, from++)
+ if (copy_mc_fragile(to, from, 1))
+ break;
+ return len;
+}
+#else
+/*
+ * No point in doing careful copying, or consulting a static key when
+ * there is no #MC handler in the CONFIG_X86_MCE=n case.
+ */
+void enable_copy_mc_fragile(void)
+{
+}
+#define copy_mc_fragile_enabled (0)
+#endif
+
+/**
+ * copy_mc_to_kernel - memory copy that handles source exceptions
+ *
+ * @dst: destination address
+ * @src: source address
+ * @len: number of bytes to copy
+ *
+ * Call into the 'fragile' version on systems that have trouble
+ * actually do machine check recovery. Everyone else can just
+ * use memcpy().
+ *
+ * Return 0 for success, or number of bytes not copied if there was an
+ * exception.
+ */
+unsigned long __must_check copy_mc_to_kernel(void *dst, const void *src, unsigned len)
+{
+ if (copy_mc_fragile_enabled)
+ return copy_mc_fragile(dst, src, len);
+ memcpy(dst, src, len);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(copy_mc_to_kernel);
+
+unsigned long __must_check copy_mc_to_user(void *dst, const void *src, unsigned len)
+{
+ unsigned long ret;
+
+ if (!copy_mc_fragile_enabled)
+ return copy_user_generic(dst, src, len);
+
+ __uaccess_begin();
+ ret = copy_mc_fragile(dst, src, len);
+ __uaccess_end();
+ return ret;
+}
--- /dev/null
+++ b/arch/x86/lib/copy_mc_64.S
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/linkage.h>
+#include <asm/copy_mc_test.h>
+#include <asm/export.h>
+#include <asm/asm.h>
+
+#ifndef CONFIG_UML
+
+#ifdef CONFIG_X86_MCE
+COPY_MC_TEST_CTL
+
+/*
+ * copy_mc_fragile - copy memory with indication if an exception / fault happened
+ *
+ * The 'fragile' version is opted into by platform quirks and takes
+ * pains to avoid unrecoverable corner cases like 'fast-string'
+ * instruction sequences, and consuming poison across a cacheline
+ * boundary. The non-fragile version is equivalent to memcpy()
+ * regardless of CPU machine-check-recovery capability.
+ */
+SYM_FUNC_START(copy_mc_fragile)
+ cmpl $8, %edx
+ /* Less than 8 bytes? Go to byte copy loop */
+ jb .L_no_whole_words
+
+ /* Check for bad alignment of source */
+ testl $7, %esi
+ /* Already aligned */
+ jz .L_8byte_aligned
+
+ /* Copy one byte at a time until source is 8-byte aligned */
+ movl %esi, %ecx
+ andl $7, %ecx
+ subl $8, %ecx
+ negl %ecx
+ subl %ecx, %edx
+.L_read_leading_bytes:
+ movb (%rsi), %al
+ COPY_MC_TEST_SRC %rsi 1 .E_leading_bytes
+ COPY_MC_TEST_DST %rdi 1 .E_leading_bytes
+.L_write_leading_bytes:
+ movb %al, (%rdi)
+ incq %rsi
+ incq %rdi
+ decl %ecx
+ jnz .L_read_leading_bytes
+
+.L_8byte_aligned:
+ movl %edx, %ecx
+ andl $7, %edx
+ shrl $3, %ecx
+ jz .L_no_whole_words
+
+.L_read_words:
+ movq (%rsi), %r8
+ COPY_MC_TEST_SRC %rsi 8 .E_read_words
+ COPY_MC_TEST_DST %rdi 8 .E_write_words
+.L_write_words:
+ movq %r8, (%rdi)
+ addq $8, %rsi
+ addq $8, %rdi
+ decl %ecx
+ jnz .L_read_words
+
+ /* Any trailing bytes? */
+.L_no_whole_words:
+ andl %edx, %edx
+ jz .L_done_memcpy_trap
+
+ /* Copy trailing bytes */
+ movl %edx, %ecx
+.L_read_trailing_bytes:
+ movb (%rsi), %al
+ COPY_MC_TEST_SRC %rsi 1 .E_trailing_bytes
+ COPY_MC_TEST_DST %rdi 1 .E_trailing_bytes
+.L_write_trailing_bytes:
+ movb %al, (%rdi)
+ incq %rsi
+ incq %rdi
+ decl %ecx
+ jnz .L_read_trailing_bytes
+
+ /* Copy successful. Return zero */
+.L_done_memcpy_trap:
+ xorl %eax, %eax
+.L_done:
+ ret
+SYM_FUNC_END(copy_mc_fragile)
+EXPORT_SYMBOL_GPL(copy_mc_fragile)
+
+ .section .fixup, "ax"
+ /*
+ * Return number of bytes not copied for any failure. Note that
+ * there is no "tail" handling since the source buffer is 8-byte
+ * aligned and poison is cacheline aligned.
+ */
+.E_read_words:
+ shll $3, %ecx
+.E_leading_bytes:
+ addl %edx, %ecx
+.E_trailing_bytes:
+ mov %ecx, %eax
+ jmp .L_done
+
+ /*
+ * For write fault handling, given the destination is unaligned,
+ * we handle faults on multi-byte writes with a byte-by-byte
+ * copy up to the write-protected page.
+ */
+.E_write_words:
+ shll $3, %ecx
+ addl %edx, %ecx
+ movl %ecx, %edx
+ jmp copy_mc_fragile_handle_tail
+
+ .previous
+
+ _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
+ _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
+ _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
+ _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
+ _ASM_EXTABLE(.L_write_words, .E_write_words)
+ _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
+#endif /* CONFIG_X86_MCE */
+#endif /* !CONFIG_UML */
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
#include <linux/linkage.h>
#include <asm/errno.h>
#include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
#include <asm/alternative-asm.h>
#include <asm/export.h>

@@ -187,117 +186,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
SYM_FUNC_END(memcpy_orig)

.popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
- cmpl $8, %edx
- /* Less than 8 bytes? Go to byte copy loop */
- jb .L_no_whole_words
-
- /* Check for bad alignment of source */
- testl $7, %esi
- /* Already aligned */
- jz .L_8byte_aligned
-
- /* Copy one byte at a time until source is 8-byte aligned */
- movl %esi, %ecx
- andl $7, %ecx
- subl $8, %ecx
- negl %ecx
- subl %ecx, %edx
-.L_read_leading_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
- MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
- movl %edx, %ecx
- andl $7, %edx
- shrl $3, %ecx
- jz .L_no_whole_words
-
-.L_read_words:
- movq (%rsi), %r8
- MCSAFE_TEST_SRC %rsi 8 .E_read_words
- MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
- movq %r8, (%rdi)
- addq $8, %rsi
- addq $8, %rdi
- decl %ecx
- jnz .L_read_words
-
- /* Any trailing bytes? */
-.L_no_whole_words:
- andl %edx, %edx
- jz .L_done_memcpy_trap
-
- /* Copy trailing bytes */
- movl %edx, %ecx
-.L_read_trailing_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
- MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_trailing_bytes
-
- /* Copy successful. Return zero */
-.L_done_memcpy_trap:
- xorl %eax, %eax
-.L_done:
- ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
- .section .fixup, "ax"
- /*
- * Return number of bytes not copied for any failure. Note that
- * there is no "tail" handling since the source buffer is 8-byte
- * aligned and poison is cacheline aligned.
- */
-.E_read_words:
- shll $3, %ecx
-.E_leading_bytes:
- addl %edx, %ecx
-.E_trailing_bytes:
- mov %ecx, %eax
- jmp .L_done
-
- /*
- * For write fault handling, given the destination is unaligned,
- * we handle faults on multi-byte writes with a byte-by-byte
- * copy up to the write-protected page.
- */
-.E_write_words:
- shll $3, %ecx
- addl %edx, %ecx
- movl %ecx, %edx
- jmp mcsafe_handle_tail
-
- .previous
-
- _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
- _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
- _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE(.L_write_words, .E_write_words)
- _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -56,27 +56,6 @@ unsigned long clear_user(void __user *to
}
EXPORT_SYMBOL(clear_user);

-/*
- * Similar to copy_user_handle_tail, probe for the write fault point,
- * but reuse __memcpy_mcsafe in case a new read error is encountered.
- * clac() is handled in _copy_to_iter_mcsafe().
- */
-__visible notrace unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
- for (; len; --len, to++, from++) {
- /*
- * Call the assembly routine back directly since
- * memcpy_mcsafe() may silently fallback to memcpy.
- */
- unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
- if (rem)
- break;
- }
- return len;
-}
-
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
/**
* clean_cache_range - write back a cache range with CLWB
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -49,7 +49,7 @@ do { \
#define pmem_assign(dest, src) ((dest) = (src))
#endif

-#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
+#if IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && defined(DM_WRITECACHE_HAS_PMEM)
#define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
#endif

@@ -992,7 +992,8 @@ static void writecache_resume(struct dm_
}
wc->freelist_size = 0;

- r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
+ r = copy_mc_to_kernel(&sb_seq_count, &sb(wc)->seq_count,
+ sizeof(uint64_t));
if (r) {
writecache_error(wc, r, "hardware memory error when reading superblock: %d", r);
sb_seq_count = cpu_to_le64(0);
@@ -1008,7 +1009,8 @@ static void writecache_resume(struct dm_
e->seq_count = -1;
continue;
}
- r = memcpy_mcsafe(&wme, memory_entry(wc, e), sizeof(struct wc_memory_entry));
+ r = copy_mc_to_kernel(&wme, memory_entry(wc, e),
+ sizeof(struct wc_memory_entry));
if (r) {
writecache_error(wc, r, "hardware memory error when reading metadata entry %lu: %d",
(unsigned long)b, r);
@@ -1206,7 +1208,7 @@ static void bio_copy_block(struct dm_wri

if (rw == READ) {
int r;
- r = memcpy_mcsafe(buf, data, size);
+ r = copy_mc_to_kernel(buf, data, size);
flush_dcache_page(bio_page(bio));
if (unlikely(r)) {
writecache_error(wc, r, "hardware memory error when reading data: %d", r);
@@ -2349,7 +2351,7 @@ invalid_optional:
}
}

- r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+ r = copy_mc_to_kernel(&s, sb(wc), sizeof(struct wc_memory_superblock));
if (r) {
ti->error = "Hardware memory error when reading superblock";
goto bad;
@@ -2360,7 +2362,8 @@ invalid_optional:
ti->error = "Unable to initialize device";
goto bad;
}
- r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+ r = copy_mc_to_kernel(&s, sb(wc),
+ sizeof(struct wc_memory_superblock));
if (r) {
ti->error = "Hardware memory error when reading superblock";
goto bad;
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -268,7 +268,7 @@ static int nsio_rw_bytes(struct nd_names
if (rw == READ) {
if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
return -EIO;
- if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0)
+ if (copy_mc_to_kernel(buf, nsio->addr + offset, size) != 0)
return -EIO;
return 0;
}
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -125,7 +125,7 @@ static blk_status_t read_pmem(struct pag
while (len) {
mem = kmap_atomic(page);
chunk = min_t(unsigned int, len, PAGE_SIZE - off);
- rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
+ rem = copy_mc_to_kernel(mem + off, pmem_addr, chunk);
kunmap_atomic(mem);
if (rem)
return BLK_STS_IOERR;
@@ -305,7 +305,7 @@ static long pmem_dax_direct_access(struc

/*
* Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_to_iter_mcsafe() to bypass HARDENED_USERCOPY overhead. Bounds
+ * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
* checking, both file offset and device offset, is handled by
* dax_iomap_actor()
*/
@@ -318,7 +318,7 @@ static size_t pmem_copy_from_iter(struct
static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- return _copy_to_iter_mcsafe(addr, bytes, i);
+ return _copy_mc_to_iter(addr, bytes, i);
}

static const struct dax_operations pmem_dax_ops = {
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -161,20 +161,13 @@ extern int bcmp(const void *,const void
#ifndef __HAVE_ARCH_MEMCHR
extern void * memchr(const void *,int,__kernel_size_t);
#endif
-#ifndef __HAVE_ARCH_MEMCPY_MCSAFE
-static inline __must_check unsigned long memcpy_mcsafe(void *dst,
- const void *src, size_t cnt)
-{
- memcpy(dst, src, cnt);
- return 0;
-}
-#endif
#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
{
memcpy(dst, src, cnt);
}
#endif
+
void *memchr_inv(const void *s, int c, size_t n);
char *strreplace(char *s, char old, char new);

--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -163,6 +163,19 @@ copy_in_user(void __user *to, const void
}
#endif

+#ifndef copy_mc_to_kernel
+/*
+ * Without arch opt-in this generic copy_mc_to_kernel() will not handle
+ * #MC (or arch equivalent) during source read.
+ */
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
+{
+ memcpy(dst, src, cnt);
+ return 0;
+}
+#endif
+
static __always_inline void pagefault_disabled_inc(void)
{
current->pagefault_disabled++;
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -186,10 +186,10 @@ size_t _copy_from_iter_flushcache(void *
#define _copy_from_iter_flushcache _copy_from_iter_nocache
#endif

-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
#else
-#define _copy_to_iter_mcsafe _copy_to_iter
+#define _copy_mc_to_iter _copy_to_iter
#endif

static __always_inline __must_check
@@ -202,12 +202,12 @@ size_t copy_from_iter_flushcache(void *a
}

static __always_inline __must_check
-size_t copy_to_iter_mcsafe(void *addr, size_t bytes, struct iov_iter *i)
+size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
{
if (unlikely(!check_copy_size(addr, bytes, true)))
return 0;
else
- return _copy_to_iter_mcsafe(addr, bytes, i);
+ return _copy_mc_to_iter(addr, bytes, i);
}

size_t iov_iter_zero(size_t bytes, struct iov_iter *);
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -631,7 +631,12 @@ config UACCESS_MEMCPY
config ARCH_HAS_UACCESS_FLUSHCACHE
bool

-config ARCH_HAS_UACCESS_MCSAFE
+# arch has a concept of a recoverable synchronous exception due to a
+# memory-read error like x86 machine-check or ARM data-abort, and
+# implements copy_mc_to_{user,kernel} to abort and report
+# 'bytes-transferred' if that exception fires when accessing the source
+# buffer.
+config ARCH_HAS_COPY_MC
bool

# Temporary. Goes away when all archs are cleaned up
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -636,30 +636,30 @@ size_t _copy_to_iter(const void *addr, s
}
EXPORT_SYMBOL(_copy_to_iter);

-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-static int copyout_mcsafe(void __user *to, const void *from, size_t n)
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+static int copyout_mc(void __user *to, const void *from, size_t n)
{
if (access_ok(to, n)) {
instrument_copy_to_user(to, from, n);
- n = copy_to_user_mcsafe((__force void *) to, from, n);
+ n = copy_mc_to_user((__force void *) to, from, n);
}
return n;
}

-static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
+static unsigned long copy_mc_to_page(struct page *page, size_t offset,
const char *from, size_t len)
{
unsigned long ret;
char *to;

to = kmap_atomic(page);
- ret = memcpy_mcsafe(to + offset, from, len);
+ ret = copy_mc_to_kernel(to + offset, from, len);
kunmap_atomic(to);

return ret;
}

-static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
+static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -677,7 +677,7 @@ static size_t copy_pipe_to_iter_mcsafe(c
size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
unsigned long rem;

- rem = memcpy_mcsafe_to_page(pipe->bufs[i_head & p_mask].page,
+ rem = copy_mc_to_page(pipe->bufs[i_head & p_mask].page,
off, addr, chunk);
i->head = i_head;
i->iov_offset = off + chunk - rem;
@@ -694,18 +694,17 @@ static size_t copy_pipe_to_iter_mcsafe(c
}

/**
- * _copy_to_iter_mcsafe - copy to user with source-read error exception handling
+ * _copy_mc_to_iter - copy to iter with source memory error exception handling
* @addr: source kernel address
* @bytes: total transfer length
* @iter: destination iterator
*
- * The pmem driver arranges for filesystem-dax to use this facility via
- * dax_copy_to_iter() for protecting read/write to persistent memory.
- * Unless / until an architecture can guarantee identical performance
- * between _copy_to_iter_mcsafe() and _copy_to_iter() it would be a
- * performance regression to switch more users to the mcsafe version.
+ * The pmem driver deploys this for the dax operation
+ * (dax_copy_to_iter()) for dax reads (bypass page-cache and the
+ * block-layer). Upon #MC read(2) aborts and returns EIO or the bytes
+ * successfully copied.
*
- * Otherwise, the main differences between this and typical _copy_to_iter().
+ * The main differences between this and typical _copy_to_iter().
*
* * Typical tail/residue handling after a fault retries the copy
* byte-by-byte until the fault happens again. Re-triggering machine
@@ -716,23 +715,22 @@ static size_t copy_pipe_to_iter_mcsafe(c
* * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies.
* Compare to copy_to_iter() where only ITER_IOVEC attempts might return
* a short copy.
- *
- * See MCSAFE_TEST for self-test.
*/
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
unsigned long rem, curr_addr, s_addr = (unsigned long) addr;

if (unlikely(iov_iter_is_pipe(i)))
- return copy_pipe_to_iter_mcsafe(addr, bytes, i);
+ return copy_mc_pipe_to_iter(addr, bytes, i);
if (iter_is_iovec(i))
might_fault();
iterate_and_advance(i, bytes, v,
- copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+ copyout_mc(v.iov_base, (from += v.iov_len) - v.iov_len,
+ v.iov_len),
({
- rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
- (from += v.bv_len) - v.bv_len, v.bv_len);
+ rem = copy_mc_to_page(v.bv_page, v.bv_offset,
+ (from += v.bv_len) - v.bv_len, v.bv_len);
if (rem) {
curr_addr = (unsigned long) from;
bytes = curr_addr - s_addr - rem;
@@ -740,8 +738,8 @@ size_t _copy_to_iter_mcsafe(const void *
}
}),
({
- rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len,
- v.iov_len);
+ rem = copy_mc_to_kernel(v.iov_base, (from += v.iov_len)
+ - v.iov_len, v.iov_len);
if (rem) {
curr_addr = (unsigned long) from;
bytes = curr_addr - s_addr - rem;
@@ -752,8 +750,8 @@ size_t _copy_to_iter_mcsafe(const void *

return bytes;
}
-EXPORT_SYMBOL_GPL(_copy_to_iter_mcsafe);
-#endif /* CONFIG_ARCH_HAS_UACCESS_MCSAFE */
+EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
+#endif /* CONFIG_ARCH_HAS_COPY_MC */

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
--- a/tools/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* _MCSAFE_TEST_H_ */
--- a/tools/arch/x86/lib/memcpy_64.S
+++ b/tools/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
#include <linux/linkage.h>
#include <asm/errno.h>
#include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
#include <asm/alternative-asm.h>
#include <asm/export.h>

@@ -187,117 +186,3 @@ SYM_FUNC_START(memcpy_orig)
SYM_FUNC_END(memcpy_orig)

.popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
- cmpl $8, %edx
- /* Less than 8 bytes? Go to byte copy loop */
- jb .L_no_whole_words
-
- /* Check for bad alignment of source */
- testl $7, %esi
- /* Already aligned */
- jz .L_8byte_aligned
-
- /* Copy one byte at a time until source is 8-byte aligned */
- movl %esi, %ecx
- andl $7, %ecx
- subl $8, %ecx
- negl %ecx
- subl %ecx, %edx
-.L_read_leading_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
- MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
- movl %edx, %ecx
- andl $7, %edx
- shrl $3, %ecx
- jz .L_no_whole_words
-
-.L_read_words:
- movq (%rsi), %r8
- MCSAFE_TEST_SRC %rsi 8 .E_read_words
- MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
- movq %r8, (%rdi)
- addq $8, %rsi
- addq $8, %rdi
- decl %ecx
- jnz .L_read_words
-
- /* Any trailing bytes? */
-.L_no_whole_words:
- andl %edx, %edx
- jz .L_done_memcpy_trap
-
- /* Copy trailing bytes */
- movl %edx, %ecx
-.L_read_trailing_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
- MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_trailing_bytes
-
- /* Copy successful. Return zero */
-.L_done_memcpy_trap:
- xorl %eax, %eax
-.L_done:
- ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
- .section .fixup, "ax"
- /*
- * Return number of bytes not copied for any failure. Note that
- * there is no "tail" handling since the source buffer is 8-byte
- * aligned and poison is cacheline aligned.
- */
-.E_read_words:
- shll $3, %ecx
-.E_leading_bytes:
- addl %edx, %ecx
-.E_trailing_bytes:
- mov %ecx, %eax
- jmp .L_done
-
- /*
- * For write fault handling, given the destination is unaligned,
- * we handle faults on multi-byte writes with a byte-by-byte
- * copy up to the write-protected page.
- */
-.E_write_words:
- shll $3, %ecx
- addl %edx, %ecx
- movl %ecx, %edx
- jmp mcsafe_handle_tail
-
- .previous
-
- _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
- _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
- _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE(.L_write_words, .E_write_words)
- _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -548,8 +548,8 @@ static const char *uaccess_safe_builtin[
"__ubsan_handle_shift_out_of_bounds",
/* misc */
"csum_partial_copy_generic",
- "__memcpy_mcsafe",
- "mcsafe_handle_tail",
+ "copy_mc_fragile",
+ "copy_mc_fragile_handle_tail",
"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */
NULL
};
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -11,7 +11,6 @@ perf-y += epoll-ctl.o
perf-y += synthesize.o
perf-y += kallsyms-parse.o

-perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o

--- a/tools/perf/bench/mem-memcpy-x86-64-lib.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * From code in arch/x86/lib/usercopy_64.c, copied to keep tools/ copy
- * of the kernel's arch/x86/lib/memcpy_64.s used in 'perf bench mem memcpy'
- * happy.
- */
-#include <linux/types.h>
-
-unsigned long __memcpy_mcsafe(void *dst, const void *src, size_t cnt);
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len);
-
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
- for (; len; --len, to++, from++) {
- /*
- * Call the assembly routine back directly since
- * memcpy_mcsafe() may silently fallback to memcpy.
- */
- unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
- if (rem)
- break;
- }
- return len;
-}
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -23,7 +23,8 @@
#include "nfit_test.h"
#include "../watermark.h"

-#include <asm/mcsafe_test.h>
+#include <asm/copy_mc_test.h>
+#include <asm/mce.h>

/*
* Generate an NFIT table to describe the following topology:
@@ -3052,7 +3053,7 @@ static struct platform_driver nfit_test_
.id_table = nfit_test_id,
};

-static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
+static char copy_mc_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));

enum INJECT {
INJECT_NONE,
@@ -3060,7 +3061,7 @@ enum INJECT {
INJECT_DST,
};

-static void mcsafe_test_init(char *dst, char *src, size_t size)
+static void copy_mc_test_init(char *dst, char *src, size_t size)
{
size_t i;

@@ -3069,7 +3070,7 @@ static void mcsafe_test_init(char *dst,
src[i] = (char) i;
}

-static bool mcsafe_test_validate(unsigned char *dst, unsigned char *src,
+static bool copy_mc_test_validate(unsigned char *dst, unsigned char *src,
size_t size, unsigned long rem)
{
size_t i;
@@ -3090,12 +3091,12 @@ static bool mcsafe_test_validate(unsigne
return true;
}

-void mcsafe_test(void)
+void copy_mc_test(void)
{
char *inject_desc[] = { "none", "source", "destination" };
enum INJECT inj;

- if (IS_ENABLED(CONFIG_MCSAFE_TEST)) {
+ if (IS_ENABLED(CONFIG_COPY_MC_TEST)) {
pr_info("%s: run...\n", __func__);
} else {
pr_info("%s: disabled, skip.\n", __func__);
@@ -3113,31 +3114,31 @@ void mcsafe_test(void)

switch (inj) {
case INJECT_NONE:
- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(NULL);
- dst = &mcsafe_buf[2048];
- src = &mcsafe_buf[1024 - i];
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(NULL);
+ dst = &copy_mc_buf[2048];
+ src = &copy_mc_buf[1024 - i];
expect = 0;
break;
case INJECT_SRC:
- mcsafe_inject_src(&mcsafe_buf[1024]);
- mcsafe_inject_dst(NULL);
- dst = &mcsafe_buf[2048];
- src = &mcsafe_buf[1024 - i];
+ copy_mc_inject_src(&copy_mc_buf[1024]);
+ copy_mc_inject_dst(NULL);
+ dst = &copy_mc_buf[2048];
+ src = &copy_mc_buf[1024 - i];
expect = 512 - i;
break;
case INJECT_DST:
- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(&mcsafe_buf[2048]);
- dst = &mcsafe_buf[2048 - i];
- src = &mcsafe_buf[1024];
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(&copy_mc_buf[2048]);
+ dst = &copy_mc_buf[2048 - i];
+ src = &copy_mc_buf[1024];
expect = 512 - i;
break;
}

- mcsafe_test_init(dst, src, 512);
- rem = __memcpy_mcsafe(dst, src, 512);
- valid = mcsafe_test_validate(dst, src, 512, expect);
+ copy_mc_test_init(dst, src, 512);
+ rem = copy_mc_fragile(dst, src, 512);
+ valid = copy_mc_test_validate(dst, src, 512, expect);
if (rem == expect && valid)
continue;
pr_info("%s: copy(%#lx, %#lx, %d) off: %d rem: %ld %s expect: %ld\n",
@@ -3149,8 +3150,8 @@ void mcsafe_test(void)
}
}

- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(NULL);
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(NULL);
}

static __init int nfit_test_init(void)
@@ -3161,7 +3162,7 @@ static __init int nfit_test_init(void)
libnvdimm_test();
acpi_nfit_test();
device_dax_test();
- mcsafe_test();
+ copy_mc_test();
dax_pmem_test();
dax_pmem_core_test();
#ifdef CONFIG_DEV_DAX_PMEM_COMPAT
--- a/tools/testing/selftests/powerpc/copyloops/.gitignore
+++ b/tools/testing/selftests/powerpc/copyloops/.gitignore
@@ -12,4 +12,4 @@ memcpy_p7_t1
copyuser_64_exc_t0
copyuser_64_exc_t1
copyuser_64_exc_t2
-memcpy_mcsafe_64
+copy_mc_64
--- a/tools/testing/selftests/powerpc/copyloops/Makefile
+++ b/tools/testing/selftests/powerpc/copyloops/Makefile
@@ -12,7 +12,7 @@ ASFLAGS = $(CFLAGS) -Wa,-mpower4
TEST_GEN_PROGS := copyuser_64_t0 copyuser_64_t1 copyuser_64_t2 \
copyuser_p7_t0 copyuser_p7_t1 \
memcpy_64_t0 memcpy_64_t1 memcpy_64_t2 \
- memcpy_p7_t0 memcpy_p7_t1 memcpy_mcsafe_64 \
+ memcpy_p7_t0 memcpy_p7_t1 copy_mc_64 \
copyuser_64_exc_t0 copyuser_64_exc_t1 copyuser_64_exc_t2

EXTRA_SOURCES := validate.c ../harness.c stubs.S
@@ -45,9 +45,9 @@ $(OUTPUT)/memcpy_p7_t%: memcpy_power7.S
-D SELFTEST_CASE=$(subst memcpy_p7_t,,$(notdir $@)) \
-o $@ $^

-$(OUTPUT)/memcpy_mcsafe_64: memcpy_mcsafe_64.S $(EXTRA_SOURCES)
+$(OUTPUT)/copy_mc_64: copy_mc_64.S $(EXTRA_SOURCES)
$(CC) $(CPPFLAGS) $(CFLAGS) \
- -D COPY_LOOP=test_memcpy_mcsafe \
+ -D COPY_LOOP=test_copy_mc_generic \
-o $@ $^

$(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \
--- /dev/null
+++ b/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) IBM Corporation, 2011
+ * Derived from copyuser_power7.s by Anton Blanchard <[email protected]>
+ * Author - Balbir Singh <[email protected]>
+ */
+#include <asm/ppc_asm.h>
+#include <asm/errno.h>
+#include <asm/export.h>
+
+ .macro err1
+100:
+ EX_TABLE(100b,.Ldo_err1)
+ .endm
+
+ .macro err2
+200:
+ EX_TABLE(200b,.Ldo_err2)
+ .endm
+
+ .macro err3
+300: EX_TABLE(300b,.Ldone)
+ .endm
+
+.Ldo_err2:
+ ld r22,STK_REG(R22)(r1)
+ ld r21,STK_REG(R21)(r1)
+ ld r20,STK_REG(R20)(r1)
+ ld r19,STK_REG(R19)(r1)
+ ld r18,STK_REG(R18)(r1)
+ ld r17,STK_REG(R17)(r1)
+ ld r16,STK_REG(R16)(r1)
+ ld r15,STK_REG(R15)(r1)
+ ld r14,STK_REG(R14)(r1)
+ addi r1,r1,STACKFRAMESIZE
+.Ldo_err1:
+ /* Do a byte by byte copy to get the exact remaining size */
+ mtctr r7
+46:
+err3; lbz r0,0(r4)
+ addi r4,r4,1
+err3; stb r0,0(r3)
+ addi r3,r3,1
+ bdnz 46b
+ li r3,0
+ blr
+
+.Ldone:
+ mfctr r3
+ blr
+
+
+_GLOBAL(copy_mc_generic)
+ mr r7,r5
+ cmpldi r5,16
+ blt .Lshort_copy
+
+.Lcopy:
+ /* Get the source 8B aligned */
+ neg r6,r4
+ mtocrf 0x01,r6
+ clrldi r6,r6,(64-3)
+
+ bf cr7*4+3,1f
+err1; lbz r0,0(r4)
+ addi r4,r4,1
+err1; stb r0,0(r3)
+ addi r3,r3,1
+ subi r7,r7,1
+
+1: bf cr7*4+2,2f
+err1; lhz r0,0(r4)
+ addi r4,r4,2
+err1; sth r0,0(r3)
+ addi r3,r3,2
+ subi r7,r7,2
+
+2: bf cr7*4+1,3f
+err1; lwz r0,0(r4)
+ addi r4,r4,4
+err1; stw r0,0(r3)
+ addi r3,r3,4
+ subi r7,r7,4
+
+3: sub r5,r5,r6
+ cmpldi r5,128
+
+ mflr r0
+ stdu r1,-STACKFRAMESIZE(r1)
+ std r14,STK_REG(R14)(r1)
+ std r15,STK_REG(R15)(r1)
+ std r16,STK_REG(R16)(r1)
+ std r17,STK_REG(R17)(r1)
+ std r18,STK_REG(R18)(r1)
+ std r19,STK_REG(R19)(r1)
+ std r20,STK_REG(R20)(r1)
+ std r21,STK_REG(R21)(r1)
+ std r22,STK_REG(R22)(r1)
+ std r0,STACKFRAMESIZE+16(r1)
+
+ blt 5f
+ srdi r6,r5,7
+ mtctr r6
+
+ /* Now do cacheline (128B) sized loads and stores. */
+ .align 5
+4:
+err2; ld r0,0(r4)
+err2; ld r6,8(r4)
+err2; ld r8,16(r4)
+err2; ld r9,24(r4)
+err2; ld r10,32(r4)
+err2; ld r11,40(r4)
+err2; ld r12,48(r4)
+err2; ld r14,56(r4)
+err2; ld r15,64(r4)
+err2; ld r16,72(r4)
+err2; ld r17,80(r4)
+err2; ld r18,88(r4)
+err2; ld r19,96(r4)
+err2; ld r20,104(r4)
+err2; ld r21,112(r4)
+err2; ld r22,120(r4)
+ addi r4,r4,128
+err2; std r0,0(r3)
+err2; std r6,8(r3)
+err2; std r8,16(r3)
+err2; std r9,24(r3)
+err2; std r10,32(r3)
+err2; std r11,40(r3)
+err2; std r12,48(r3)
+err2; std r14,56(r3)
+err2; std r15,64(r3)
+err2; std r16,72(r3)
+err2; std r17,80(r3)
+err2; std r18,88(r3)
+err2; std r19,96(r3)
+err2; std r20,104(r3)
+err2; std r21,112(r3)
+err2; std r22,120(r3)
+ addi r3,r3,128
+ subi r7,r7,128
+ bdnz 4b
+
+ clrldi r5,r5,(64-7)
+
+ /* Up to 127B to go */
+5: srdi r6,r5,4
+ mtocrf 0x01,r6
+
+6: bf cr7*4+1,7f
+err2; ld r0,0(r4)
+err2; ld r6,8(r4)
+err2; ld r8,16(r4)
+err2; ld r9,24(r4)
+err2; ld r10,32(r4)
+err2; ld r11,40(r4)
+err2; ld r12,48(r4)
+err2; ld r14,56(r4)
+ addi r4,r4,64
+err2; std r0,0(r3)
+err2; std r6,8(r3)
+err2; std r8,16(r3)
+err2; std r9,24(r3)
+err2; std r10,32(r3)
+err2; std r11,40(r3)
+err2; std r12,48(r3)
+err2; std r14,56(r3)
+ addi r3,r3,64
+ subi r7,r7,64
+
+7: ld r14,STK_REG(R14)(r1)
+ ld r15,STK_REG(R15)(r1)
+ ld r16,STK_REG(R16)(r1)
+ ld r17,STK_REG(R17)(r1)
+ ld r18,STK_REG(R18)(r1)
+ ld r19,STK_REG(R19)(r1)
+ ld r20,STK_REG(R20)(r1)
+ ld r21,STK_REG(R21)(r1)
+ ld r22,STK_REG(R22)(r1)
+ addi r1,r1,STACKFRAMESIZE
+
+ /* Up to 63B to go */
+ bf cr7*4+2,8f
+err1; ld r0,0(r4)
+err1; ld r6,8(r4)
+err1; ld r8,16(r4)
+err1; ld r9,24(r4)
+ addi r4,r4,32
+err1; std r0,0(r3)
+err1; std r6,8(r3)
+err1; std r8,16(r3)
+err1; std r9,24(r3)
+ addi r3,r3,32
+ subi r7,r7,32
+
+ /* Up to 31B to go */
+8: bf cr7*4+3,9f
+err1; ld r0,0(r4)
+err1; ld r6,8(r4)
+ addi r4,r4,16
+err1; std r0,0(r3)
+err1; std r6,8(r3)
+ addi r3,r3,16
+ subi r7,r7,16
+
+9: clrldi r5,r5,(64-4)
+
+ /* Up to 15B to go */
+.Lshort_copy:
+ mtocrf 0x01,r5
+ bf cr7*4+0,12f
+err1; lwz r0,0(r4) /* Less chance of a reject with word ops */
+err1; lwz r6,4(r4)
+ addi r4,r4,8
+err1; stw r0,0(r3)
+err1; stw r6,4(r3)
+ addi r3,r3,8
+ subi r7,r7,8
+
+12: bf cr7*4+1,13f
+err1; lwz r0,0(r4)
+ addi r4,r4,4
+err1; stw r0,0(r3)
+ addi r3,r3,4
+ subi r7,r7,4
+
+13: bf cr7*4+2,14f
+err1; lhz r0,0(r4)
+ addi r4,r4,2
+err1; sth r0,0(r3)
+ addi r3,r3,2
+ subi r7,r7,2
+
+14: bf cr7*4+3,15f
+err1; lbz r0,0(r4)
+err1; stb r0,0(r3)
+
+15: li r3,0
+ blr
+
+EXPORT_SYMBOL_GPL(copy_mc_generic);


2020-10-31 11:45:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 51/70] x86/traps: Fix #DE Oops message regression

From: Thomas Gleixner <[email protected]>

commit 5f1ec1fd32252af5130dac23b5542e8e66fe0bcb upstream.

The conversion of #DE to the idtentry mechanism introduced a change in the
Ooops message which confuses tools which parse crash information in dmesg.

Remove the underscore from 'divide_error' to restore previous behaviour.

Fixes: 9d06c4027f21 ("x86/entry: Convert Divide Error to IDTENTRY")
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/traps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -196,7 +196,7 @@ static __always_inline void __user *erro

DEFINE_IDTENTRY(exc_divide_error)
{
- do_error_trap(regs, 0, "divide_error", X86_TRAP_DE, SIGFPE,
+ do_error_trap(regs, 0, "divide error", X86_TRAP_DE, SIGFPE,
FPE_INTDIV, error_get_trap_addr(regs));
}



2020-10-31 11:46:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 20/70] arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs

From: Marc Zyngier <[email protected]>

commit 18fce56134c987e5b4eceddafdbe4b00c07e2ae1 upstream.

Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most
of the enabling code to the .matches() callback.

This has the unfortunate effect that the workaround gets only enabled on
the first affected CPU, and no other.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Cc: <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/kernel/cpu_errata.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -599,6 +599,12 @@ check_branch_predictor(const struct arm6
return (need_wa > 0);
}

+static void
+cpu_enable_branch_predictor_hardening(const struct arm64_cpu_capabilities *cap)
+{
+ cap->matches(cap, SCOPE_LOCAL_CPU);
+}
+
static const __maybe_unused struct midr_range tx2_family_cpus[] = {
MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
@@ -890,9 +896,11 @@ const struct arm64_cpu_capabilities arm6
},
#endif
{
+ .desc = "Branch predictor hardening",
.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
.matches = check_branch_predictor,
+ .cpu_enable = cpu_enable_branch_predictor_hardening,
},
#ifdef CONFIG_HARDEN_EL2_VECTORS
{


2020-10-31 11:46:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 61/70] mtd: lpddr: Fix bad logic in print_drs_error

From: Gustavo A. R. Silva <[email protected]>

commit 1c9c02bb22684f6949d2e7ddc0a3ff364fd5a6fc upstream.

Update logic for broken test. Use a more common logging style.

It appears the logic in this function is broken for the
consecutive tests of

if (prog_status & 0x3)
...
else if (prog_status & 0x2)
...
else (prog_status & 0x1)
...

Likely the first test should be

if ((prog_status & 0x3) == 0x3)

Found by inspection of include files using printk.

Fixes: eb3db27507f7 ("[MTD] LPDDR PFOW definition")
Cc: [email protected]
Reported-by: Joe Perches <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/3fb0e29f5b601db8be2938a01d974b00c8788501.1588016644.git.gustavo@embeddedor.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/mtd/pfow.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/mtd/pfow.h
+++ b/include/linux/mtd/pfow.h
@@ -128,7 +128,7 @@ static inline void print_drs_error(unsig

if (!(dsr & DSR_AVAILABLE))
printk(KERN_NOTICE"DSR.15: (0) Device not Available\n");
- if (prog_status & 0x03)
+ if ((prog_status & 0x03) == 0x03)
printk(KERN_NOTICE"DSR.9,8: (11) Attempt to program invalid "
"half with 41h command\n");
else if (prog_status & 0x02)


2020-10-31 11:46:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 44/70] r8169: fix issue with forced threading in combination with shared interrupts

From: Heiner Kallweit <[email protected]>

[ Upstream commit 2734a24e6e5d18522fbf599135c59b82ec9b2c9e ]

As reported by Serge flag IRQF_NO_THREAD causes an error if the
interrupt is actually shared and the other driver(s) don't have this
flag set. This situation can occur if a PCI(e) legacy interrupt is
used in combination with forced threading.
There's no good way to deal with this properly, therefore we have to
remove flag IRQF_NO_THREAD. For fixing the original forced threading
issue switch to napi_schedule().

Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading")
Link: https://www.spinics.net/lists/netdev/msg694960.html
Reported-by: Serge Belyshev <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Tested-by: Serge Belyshev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/realtek/r8169_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4559,7 +4559,7 @@ static irqreturn_t rtl8169_interrupt(int
}

rtl_irq_disable(tp);
- napi_schedule_irqoff(&tp->napi);
+ napi_schedule(&tp->napi);
out:
rtl_ack_events(tp, status);

@@ -4727,7 +4727,7 @@ static int rtl_open(struct net_device *d
rtl_request_firmware(tp);

retval = request_irq(pci_irq_vector(pdev, 0), rtl8169_interrupt,
- IRQF_NO_THREAD | IRQF_SHARED, dev->name, tp);
+ IRQF_SHARED, dev->name, tp);
if (retval < 0)
goto err_release_fw_2;



2020-10-31 11:46:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 49/70] erofs: avoid duplicated permission check for "trusted." xattrs

From: Gao Xiang <[email protected]>

commit d578b46db69d125a654f509bdc9091d84e924dc8 upstream.

Don't recheck it since xattr_permission() already
checks CAP_SYS_ADMIN capability.

Just follow 5d3ce4f70172 ("f2fs: avoid duplicated permission check for "trusted." xattrs")

Reported-by: Hongyu Jin <[email protected]>
[ Gao Xiang: since it could cause some complex Android overlay
permission issue as well on android-5.4+, it'd be better to
backport to 5.4+ rather than pure cleanup on mainline. ]
Cc: <[email protected]> # 5.4+
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/erofs/xattr.c | 2 --
1 file changed, 2 deletions(-)

--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -473,8 +473,6 @@ static int erofs_xattr_generic_get(const
return -EOPNOTSUPP;
break;
case EROFS_XATTR_INDEX_TRUSTED:
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
break;
case EROFS_XATTR_INDEX_SECURITY:
break;


2020-10-31 11:46:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 23/70] x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled

From: Randy Dunlap <[email protected]>

commit 035fff1f7aab43e420e0098f0854470a5286fb83 upstream.

Fix build error when CONFIG_ACPI is not set/enabled by adding the header
file <asm/acpi.h> which contains a stub for the function in the build
error.

../arch/x86/pci/intel_mid_pci.c: In function ‘intel_mid_pci_init’:
../arch/x86/pci/intel_mid_pci.c:303:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
acpi_noirq_set();

Fixes: a912a7584ec3 ("x86/platform/intel-mid: Move PCI initialization to arch_init()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Jesse Barnes <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: [email protected] # v4.16+
Cc: Jacob Pan <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Jesse Barnes <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/pci/intel_mid_pci.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/x86/pci/intel_mid_pci.c
+++ b/arch/x86/pci/intel_mid_pci.c
@@ -33,6 +33,7 @@
#include <asm/hw_irq.h>
#include <asm/io_apic.h>
#include <asm/intel-mid.h>
+#include <asm/acpi.h>

#define PCIE_CAP_OFFSET 0x100



2020-10-31 11:46:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 55/70] fuse: fix page dereference after free

From: Miklos Szeredi <[email protected]>

commit d78092e4937de9ce55edcb4ee4c5e3c707be0190 upstream.

After unlock_request() pages from the ap->pages[] array may be put (e.g. by
aborting the connection) and the pages can be freed.

Prevent use after free by grabbing a reference to the page before calling
unlock_request().

The original patch was created by Pradeep P V K.

Reported-by: Pradeep P V K <[email protected]>
Cc: <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/fuse/dev.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -785,15 +785,16 @@ static int fuse_try_move_page(struct fus
struct page *newpage;
struct pipe_buffer *buf = cs->pipebufs;

+ get_page(oldpage);
err = unlock_request(cs->req);
if (err)
- return err;
+ goto out_put_old;

fuse_copy_finish(cs);

err = pipe_buf_confirm(cs->pipe, buf);
if (err)
- return err;
+ goto out_put_old;

BUG_ON(!cs->nr_segs);
cs->currbuf = buf;
@@ -833,7 +834,7 @@ static int fuse_try_move_page(struct fus
err = replace_page_cache_page(oldpage, newpage, GFP_KERNEL);
if (err) {
unlock_page(newpage);
- return err;
+ goto out_put_old;
}

get_page(newpage);
@@ -852,14 +853,19 @@ static int fuse_try_move_page(struct fus
if (err) {
unlock_page(newpage);
put_page(newpage);
- return err;
+ goto out_put_old;
}

unlock_page(oldpage);
+ /* Drop ref for ap->pages[] array */
put_page(oldpage);
cs->len = 0;

- return 0;
+ err = 0;
+out_put_old:
+ /* Drop ref obtained in this function */
+ put_page(oldpage);
+ return err;

out_fallback_unlock:
unlock_page(newpage);
@@ -868,10 +874,10 @@ out_fallback:
cs->offset = buf->offset;

err = lock_request(cs->req);
- if (err)
- return err;
+ if (!err)
+ err = 1;

- return 1;
+ goto out_put_old;
}

static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
@@ -883,14 +889,16 @@ static int fuse_ref_page(struct fuse_cop
if (cs->nr_segs >= cs->pipe->max_usage)
return -EIO;

+ get_page(page);
err = unlock_request(cs->req);
- if (err)
+ if (err) {
+ put_page(page);
return err;
+ }

fuse_copy_finish(cs);

buf = cs->pipebufs;
- get_page(page);
buf->page = page;
buf->offset = offset;
buf->len = count;


2020-10-31 11:47:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 62/70] drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex

From: Chris Wilson <[email protected]>

commit 4fe9af8e881d946bf60790eeb37a7c4f96e28382 upstream.

Since the debugfs may peek into the GEM contexts as the corresponding
client/fd is being closed, we may try and follow a dangling pointer.
However, the context closure itself is serialised with the ctx->mutex,
so if we hold that mutex as we inspect the state coupled in the context,
we know the pointers within the context are stable and will remain valid
as we inspect their tables.

Signed-off-by: Chris Wilson <[email protected]>
Cc: CQ Tang <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 102f5aa491f262c818e607fc4fee08a724a76c69)
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -323,6 +323,7 @@ static void print_context_stats(struct s
}
i915_gem_context_unlock_engines(ctx);

+ mutex_lock(&ctx->mutex);
if (!IS_ERR_OR_NULL(ctx->file_priv)) {
struct file_stats stats = {
.vm = rcu_access_pointer(ctx->vm),
@@ -343,6 +344,7 @@ static void print_context_stats(struct s

print_file_stats(m, name, stats);
}
+ mutex_unlock(&ctx->mutex);

spin_lock(&i915->gem.contexts.lock);
list_safe_reset_next(ctx, cn, link);


2020-10-31 11:47:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 70/70] phy: marvell: comphy: Convert internal SMCC firmware return codes to errno

From: Pali Rohár <[email protected]>

commit ea17a0f153af2cd890e4ce517130dcccaa428c13 upstream.

Driver ->power_on and ->power_off callbacks leaks internal SMCC firmware
return codes to phy caller. This patch converts SMCC error codes to
standard linux errno codes. Include file linux/arm-smccc.h already provides
defines for SMCC error codes, so use them instead of custom driver defines.
Note that return value is signed 32bit, but stored in unsigned long type
with zero padding.

Tested-by: Tomasz Maciej Nowak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/phy/marvell/phy-mvebu-a3700-comphy.c | 14 +++++++++++---
drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 14 +++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)

--- a/drivers/phy/marvell/phy-mvebu-a3700-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-a3700-comphy.c
@@ -26,7 +26,6 @@
#define COMPHY_SIP_POWER_ON 0x82000001
#define COMPHY_SIP_POWER_OFF 0x82000002
#define COMPHY_SIP_PLL_LOCK 0x82000003
-#define COMPHY_FW_NOT_SUPPORTED (-1)

#define COMPHY_FW_MODE_SATA 0x1
#define COMPHY_FW_MODE_SGMII 0x2
@@ -112,10 +111,19 @@ static int mvebu_a3700_comphy_smc(unsign
unsigned long mode)
{
struct arm_smccc_res res;
+ s32 ret;

arm_smccc_smc(function, lane, mode, 0, 0, 0, 0, 0, &res);
+ ret = res.a0;

- return res.a0;
+ switch (ret) {
+ case SMCCC_RET_SUCCESS:
+ return 0;
+ case SMCCC_RET_NOT_SUPPORTED:
+ return -EOPNOTSUPP;
+ default:
+ return -EINVAL;
+ }
}

static int mvebu_a3700_comphy_get_fw_mode(int lane, int port,
@@ -220,7 +228,7 @@ static int mvebu_a3700_comphy_power_on(s
}

ret = mvebu_a3700_comphy_smc(COMPHY_SIP_POWER_ON, lane->id, fw_param);
- if (ret == COMPHY_FW_NOT_SUPPORTED)
+ if (ret == -EOPNOTSUPP)
dev_err(lane->dev,
"unsupported SMC call, try updating your firmware\n");

--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -123,7 +123,6 @@

#define COMPHY_SIP_POWER_ON 0x82000001
#define COMPHY_SIP_POWER_OFF 0x82000002
-#define COMPHY_FW_NOT_SUPPORTED (-1)

/*
* A lane is described by the following bitfields:
@@ -273,10 +272,19 @@ static int mvebu_comphy_smc(unsigned lon
unsigned long lane, unsigned long mode)
{
struct arm_smccc_res res;
+ s32 ret;

arm_smccc_smc(function, phys, lane, mode, 0, 0, 0, 0, &res);
+ ret = res.a0;

- return res.a0;
+ switch (ret) {
+ case SMCCC_RET_SUCCESS:
+ return 0;
+ case SMCCC_RET_NOT_SUPPORTED:
+ return -EOPNOTSUPP;
+ default:
+ return -EINVAL;
+ }
}

static int mvebu_comphy_get_mode(bool fw_mode, int lane, int port,
@@ -819,7 +827,7 @@ static int mvebu_comphy_power_on(struct
if (!ret)
return ret;

- if (ret == COMPHY_FW_NOT_SUPPORTED)
+ if (ret == -EOPNOTSUPP)
dev_err(priv->dev,
"unsupported SMC call, try updating your firmware\n");



2020-10-31 11:47:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 53/70] PCI: aardvark: Fix initialization with old Marvells Arm Trusted Firmware

From: Pali Rohár <[email protected]>

commit b0c6ae0f8948a2be6bf4e8b4bbab9ca1343289b6 upstream.

Old ATF automatically power on pcie phy and does not provide SMC call for
phy power on functionality which leads to aardvark initialization failure:

[ 0.330134] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware
[ 0.338846] phy phy-d0018300.phy.1: phy poweron failed --> -95
[ 0.344753] advk-pcie d0070000.pcie: Failed to initialize PHY (-95)
[ 0.351160] advk-pcie: probe of d0070000.pcie failed with error -95

This patch fixes above failure by ignoring 'not supported' error in
aardvark driver. In this case it is expected that phy is already power on.

Tested-by: Tomasz Maciej Nowak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 366697018c9a ("PCI: aardvark: Add PHY support")
Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Cc: <[email protected]> # 5.8+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/pci/controller/pci-aardvark.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -1068,7 +1068,9 @@ static int advk_pcie_enable_phy(struct a
}

ret = phy_power_on(pcie->phy);
- if (ret) {
+ if (ret == -EOPNOTSUPP) {
+ dev_warn(&pcie->pdev->dev, "PHY unsupported by firmware\n");
+ } else if (ret) {
phy_exit(pcie->phy);
return ret;
}


2020-10-31 11:47:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 59/70] cxl: Rework error message for incompatible slots

From: Frederic Barrat <[email protected]>

commit 40ac790d99c6dd16b367d5c2339e446a5f1b0593 upstream.

Improve the error message shown if a capi adapter is plugged on a
capi-incompatible slot directly under the PHB (no intermediate switch).

Fixes: 5632874311db ("cxl: Add support for POWER9 DD2")
Cc: [email protected] # 4.14+
Signed-off-by: Frederic Barrat <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/misc/cxl/pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -393,8 +393,8 @@ int cxl_calc_capp_routing(struct pci_dev
*capp_unit_id = get_capp_unit_id(np, *phb_index);
of_node_put(np);
if (!*capp_unit_id) {
- pr_err("cxl: invalid capp unit id (phb_index: %d)\n",
- *phb_index);
+ pr_err("cxl: No capp unit found for PHB[%lld,%d]. Make sure the adapter is on a capi-compatible slot\n",
+ *chipid, *phb_index);
return -ENODEV;
}



2020-10-31 11:50:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 56/70] bpf: Fix comment for helper bpf_current_task_under_cgroup()

From: Song Liu <[email protected]>

commit 1aef5b4391f0c75c0a1523706a7b0311846ee12f upstream.

This should be "current" not "skb".

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/uapi/linux/bpf.h | 4 ++--
tools/include/uapi/linux/bpf.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1416,8 +1416,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
- * * 0, if the *skb* task belongs to the cgroup2.
- * * 1, if the *skb* task does not belong to the cgroup2.
+ * * 0, if current task belongs to the cgroup2.
+ * * 1, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1416,8 +1416,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
- * * 0, if the *skb* task belongs to the cgroup2.
- * * 1, if the *skb* task does not belong to the cgroup2.
+ * * 0, if current task belongs to the cgroup2.
+ * * 1, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)


2020-10-31 11:51:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 58/70] p54: avoid accessing the data mapped to streaming DMA

From: Jia-Ju Bai <[email protected]>

commit 478762855b5ae9f68fa6ead1edf7abada70fcd5f upstream.

In p54p_tx(), skb->data is mapped to streaming DMA on line 337:
mapping = pci_map_single(..., skb->data, ...);

Then skb->data is accessed on line 349:
desc->device_addr = ((struct p54_hdr *)skb->data)->req_id;

This access may cause data inconsistency between CPU cache and hardware.

To fix this problem, ((struct p54_hdr *)skb->data)->req_id is stored in
a local variable before DMA mapping, and then the driver accesses this
local variable instead of skb->data.

Cc: <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Acked-by: Christian Lamparter <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/intersil/p54/p54pci.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/intersil/p54/p54pci.c
+++ b/drivers/net/wireless/intersil/p54/p54pci.c
@@ -329,10 +329,12 @@ static void p54p_tx(struct ieee80211_hw
struct p54p_desc *desc;
dma_addr_t mapping;
u32 idx, i;
+ __le32 device_addr;

spin_lock_irqsave(&priv->lock, flags);
idx = le32_to_cpu(ring_control->host_idx[1]);
i = idx % ARRAY_SIZE(ring_control->tx_data);
+ device_addr = ((struct p54_hdr *)skb->data)->req_id;

mapping = pci_map_single(priv->pdev, skb->data, skb->len,
PCI_DMA_TODEVICE);
@@ -346,7 +348,7 @@ static void p54p_tx(struct ieee80211_hw

desc = &ring_control->tx_data[i];
desc->host_addr = cpu_to_le32(mapping);
- desc->device_addr = ((struct p54_hdr *)skb->data)->req_id;
+ desc->device_addr = device_addr;
desc->len = cpu_to_le16(skb->len);
desc->flags = 0;



2020-10-31 11:51:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 57/70] evm: Check size of security.evm before using it

From: Roberto Sassu <[email protected]>

commit 455b6c9112eff8d249e32ba165742085678a80a4 upstream.

This patch checks the size for the EVM_IMA_XATTR_DIGSIG and
EVM_XATTR_PORTABLE_DIGSIG types to ensure that the algorithm is read from
the buffer returned by vfs_getxattr_alloc().

Cc: [email protected] # 4.19.x
Fixes: 5feeb61183dde ("evm: Allow non-SHA1 digital signatures")
Signed-off-by: Roberto Sassu <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
security/integrity/evm/evm_main.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -181,6 +181,12 @@ static enum integrity_status evm_verify_
break;
case EVM_IMA_XATTR_DIGSIG:
case EVM_XATTR_PORTABLE_DIGSIG:
+ /* accept xattr with non-empty signature field */
+ if (xattr_len <= sizeof(struct signature_v2_hdr)) {
+ evm_status = INTEGRITY_FAIL;
+ goto out;
+ }
+
hdr = (struct signature_v2_hdr *)xattr_data;
digest.hdr.algo = hdr->hash_algo;
rc = evm_calc_hash(dentry, xattr_name, xattr_value,


2020-10-31 11:51:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 68/70] openrisc: Fix issue with get_user for 64-bit values

From: Stafford Horne <[email protected]>

commit d877322bc1adcab9850732275670409e8bcca4c4 upstream.

A build failure was raised by kbuild with the following error.

drivers/android/binder.c: Assembler messages:
drivers/android/binder.c:3861: Error: unrecognized keyword/register name `l.lwz ?ap,4(r24)'
drivers/android/binder.c:3866: Error: unrecognized keyword/register name `l.addi ?ap,r0,0'

The issue is with 64-bit get_user() calls on openrisc. I traced this to
a problem where in the internally in the get_user macros there is a cast
to long __gu_val this causes GCC to think the get_user call is 32-bit.
This binder code is really long and GCC allocates register r30, which
triggers the issue. The 64-bit get_user asm tries to get the 64-bit pair
register, which for r30 overflows the general register names and returns
the dummy register ?ap.

The fix here is to move the temporary variables into the asm macros. We
use a 32-bit __gu_tmp for 32-bit and smaller macro and a 64-bit tmp in
the 64-bit macro. The cast in the 64-bit macro has a trick of casting
through __typeof__((x)-(x)) which avoids the below warning. This was
barrowed from riscv.

arch/openrisc/include/asm/uaccess.h:240:8: warning: cast to pointer from integer of different size

I tested this in a small unit test to check reading between 64-bit and
32-bit pointers to 64-bit and 32-bit values in all combinations. Also I
ran make C=1 to confirm no new sparse warnings came up. It all looks
clean to me.

Link: https://lore.kernel.org/lkml/202008200453.ohnhqkjQ%[email protected]/
Signed-off-by: Stafford Horne <[email protected]>
Reviewed-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
arch/openrisc/include/asm/uaccess.h | 35 ++++++++++++++++++++++-------------
1 file changed, 22 insertions(+), 13 deletions(-)

--- a/arch/openrisc/include/asm/uaccess.h
+++ b/arch/openrisc/include/asm/uaccess.h
@@ -164,19 +164,19 @@ struct __large_struct {

#define __get_user_nocheck(x, ptr, size) \
({ \
- long __gu_err, __gu_val; \
- __get_user_size(__gu_val, (ptr), (size), __gu_err); \
- (x) = (__force __typeof__(*(ptr)))__gu_val; \
+ long __gu_err; \
+ __get_user_size((x), (ptr), (size), __gu_err); \
__gu_err; \
})

#define __get_user_check(x, ptr, size) \
({ \
- long __gu_err = -EFAULT, __gu_val = 0; \
- const __typeof__(*(ptr)) * __gu_addr = (ptr); \
- if (access_ok(__gu_addr, size)) \
- __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
- (x) = (__force __typeof__(*(ptr)))__gu_val; \
+ long __gu_err = -EFAULT; \
+ const __typeof__(*(ptr)) *__gu_addr = (ptr); \
+ if (access_ok(__gu_addr, size)) \
+ __get_user_size((x), __gu_addr, (size), __gu_err); \
+ else \
+ (x) = (__typeof__(*(ptr))) 0; \
__gu_err; \
})

@@ -190,11 +190,13 @@ do { \
case 2: __get_user_asm(x, ptr, retval, "l.lhz"); break; \
case 4: __get_user_asm(x, ptr, retval, "l.lwz"); break; \
case 8: __get_user_asm2(x, ptr, retval); break; \
- default: (x) = __get_user_bad(); \
+ default: (x) = (__typeof__(*(ptr)))__get_user_bad(); \
} \
} while (0)

#define __get_user_asm(x, addr, err, op) \
+{ \
+ unsigned long __gu_tmp; \
__asm__ __volatile__( \
"1: "op" %1,0(%2)\n" \
"2:\n" \
@@ -208,10 +210,14 @@ do { \
" .align 2\n" \
" .long 1b,3b\n" \
".previous" \
- : "=r"(err), "=r"(x) \
- : "r"(addr), "i"(-EFAULT), "0"(err))
+ : "=r"(err), "=r"(__gu_tmp) \
+ : "r"(addr), "i"(-EFAULT), "0"(err)); \
+ (x) = (__typeof__(*(addr)))__gu_tmp; \
+}

#define __get_user_asm2(x, addr, err) \
+{ \
+ unsigned long long __gu_tmp; \
__asm__ __volatile__( \
"1: l.lwz %1,0(%2)\n" \
"2: l.lwz %H1,4(%2)\n" \
@@ -228,8 +234,11 @@ do { \
" .long 1b,4b\n" \
" .long 2b,4b\n" \
".previous" \
- : "=r"(err), "=&r"(x) \
- : "r"(addr), "i"(-EFAULT), "0"(err))
+ : "=r"(err), "=&r"(__gu_tmp) \
+ : "r"(addr), "i"(-EFAULT), "0"(err)); \
+ (x) = (__typeof__(*(addr)))( \
+ (__typeof__((x)-(x)))__gu_tmp); \
+}

/* more complex routines */



2020-10-31 11:51:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 67/70] xen/gntdev.c: Mark pages as dirty

From: Souptick Joarder <[email protected]>

commit 779055842da5b2e508f3ccf9a8153cb1f704f566 upstream.

There seems to be a bug in the original code when gntdev_get_page()
is called with writeable=true then the page needs to be marked dirty
before being put.

To address this, a bool writeable is added in gnt_dev_copy_batch, set
it in gntdev_grant_copy_seg() (and drop `writeable` argument to
gntdev_get_page()) and then, based on batch->writeable, use
set_page_dirty_lock().

Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy)
Suggested-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Souptick Joarder <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/xen/gntdev.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -720,17 +720,18 @@ struct gntdev_copy_batch {
s16 __user *status[GNTDEV_COPY_BATCH];
unsigned int nr_ops;
unsigned int nr_pages;
+ bool writeable;
};

static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
- bool writeable, unsigned long *gfn)
+ unsigned long *gfn)
{
unsigned long addr = (unsigned long)virt;
struct page *page;
unsigned long xen_pfn;
int ret;

- ret = get_user_pages_fast(addr, 1, writeable ? FOLL_WRITE : 0, &page);
+ ret = get_user_pages_fast(addr, 1, batch->writeable ? FOLL_WRITE : 0, &page);
if (ret < 0)
return ret;

@@ -746,9 +747,13 @@ static void gntdev_put_pages(struct gntd
{
unsigned int i;

- for (i = 0; i < batch->nr_pages; i++)
+ for (i = 0; i < batch->nr_pages; i++) {
+ if (batch->writeable && !PageDirty(batch->pages[i]))
+ set_page_dirty_lock(batch->pages[i]);
put_page(batch->pages[i]);
+ }
batch->nr_pages = 0;
+ batch->writeable = false;
}

static int gntdev_copy(struct gntdev_copy_batch *batch)
@@ -837,8 +842,9 @@ static int gntdev_grant_copy_seg(struct
virt = seg->source.virt + copied;
off = (unsigned long)virt & ~XEN_PAGE_MASK;
len = min(len, (size_t)XEN_PAGE_SIZE - off);
+ batch->writeable = false;

- ret = gntdev_get_page(batch, virt, false, &gfn);
+ ret = gntdev_get_page(batch, virt, &gfn);
if (ret < 0)
return ret;

@@ -856,8 +862,9 @@ static int gntdev_grant_copy_seg(struct
virt = seg->dest.virt + copied;
off = (unsigned long)virt & ~XEN_PAGE_MASK;
len = min(len, (size_t)XEN_PAGE_SIZE - off);
+ batch->writeable = true;

- ret = gntdev_get_page(batch, virt, true, &gfn);
+ ret = gntdev_get_page(batch, virt, &gfn);
if (ret < 0)
return ret;



2020-10-31 11:51:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 18/70] efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure

From: Ard Biesheuvel <[email protected]>

commit d32de9130f6c79533508e2c7879f18997bfbe2a0 upstream.

Currently, on arm64, we abort on any failure from efi_get_random_bytes()
other than EFI_NOT_FOUND when it comes to setting the physical seed for
KASLR, but ignore such failures when obtaining the seed for virtual
KASLR or for early seeding of the kernel's entropy pool via the config
table. This is inconsistent, and may lead to unexpected boot failures.

So let's permit any failure for the physical seed, and simply report
the error code if it does not equal EFI_NOT_FOUND.

Cc: <[email protected]> # v5.8+
Reported-by: Heinrich Schuchardt <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/firmware/efi/libstub/arm64-stub.c | 8 +++++---
drivers/firmware/efi/libstub/fdt.c | 4 +---
2 files changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -62,10 +62,12 @@ efi_status_t handle_kernel_image(unsigne
status = efi_get_random_bytes(sizeof(phys_seed),
(u8 *)&phys_seed);
if (status == EFI_NOT_FOUND) {
- efi_info("EFI_RNG_PROTOCOL unavailable, no randomness supplied\n");
+ efi_info("EFI_RNG_PROTOCOL unavailable, KASLR will be disabled\n");
+ efi_nokaslr = true;
} else if (status != EFI_SUCCESS) {
- efi_err("efi_get_random_bytes() failed\n");
- return status;
+ efi_err("efi_get_random_bytes() failed (0x%lx), KASLR will be disabled\n",
+ status);
+ efi_nokaslr = true;
}
} else {
efi_info("KASLR disabled on kernel command line\n");
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -136,7 +136,7 @@ static efi_status_t update_fdt(void *ori
if (status)
goto fdt_set_fail;

- if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
efi_status_t efi_status;

efi_status = efi_get_random_bytes(sizeof(fdt_val64),
@@ -145,8 +145,6 @@ static efi_status_t update_fdt(void *ori
status = fdt_setprop_var(fdt, node, "kaslr-seed", fdt_val64);
if (status)
goto fdt_set_fail;
- } else if (efi_status != EFI_NOT_FOUND) {
- return efi_status;
}
}



2020-10-31 11:51:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 27/70] bnxt_en: Check abort error state in bnxt_open_nic().

From: Michael Chan <[email protected]>

[ Upstream commit a1301f08c5acf992d9c1fafddc84c3a822844b04 ]

bnxt_open_nic() is called during configuration changes that require
the NIC to be closed and then opened. This call is protected by
rtnl_lock. Firmware reset can be happening at the same time. Only
critical portions of the entire firmware reset sequence are protected
by the rtnl_lock. It is possible that bnxt_open_nic() can be called
when the firmware reset sequence is aborting. In that case,
bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
abort if it is. The configuration change that resulted in the
bnxt_open_nic() call will fail but the NIC will be brought to a
consistent IF_DOWN state.

Without this patch, if bnxt_open_nic() were to continue in this error
state, it may crash like this:

[ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0
[ 1648.659813] Oops: 0000 [#1] SMP
[ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common
[ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1
[ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020
[ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000
[ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202
[ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000
[ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0
[ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008
[ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0
[ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40
[ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000
[ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0
[ 1648.669966] Call Trace:
[ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en]
[ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en]
[ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en]
[ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en]
[ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0
[ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0
[ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60
[ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0
[ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0
[ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500
[ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0
[ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a
[ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7
[ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.682724] RSP <ffff94f55df1fba8>
[ 1648.683451] CR2: 0000000000000000

Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.")
Reviewed-by: Vasundhara Volam <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9247,7 +9247,10 @@ int bnxt_open_nic(struct bnxt *bp, bool
{
int rc = 0;

- rc = __bnxt_open_nic(bp, irq_re_init, link_re_init);
+ if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state))
+ rc = -EIO;
+ if (!rc)
+ rc = __bnxt_open_nic(bp, irq_re_init, link_re_init);
if (rc) {
netdev_err(bp->dev, "nic open fail (rc: %x)\n", rc);
dev_close(bp->dev);


2020-10-31 11:51:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 48/70] net: protect tcf_block_unbind with block lock

From: Leon Romanovsky <[email protected]>

[ Upstream commit d6535dca28859d8d9ef80894eb287b2ac35a32e8 ]

The tcf_block_unbind() expects that the caller will take block->cb_lock
before calling it, however the code took RTNL lock and dropped cb_lock
instead. This causes to the following kernel panic.

WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420
Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw]
CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G W 5.9.0+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:tcf_block_unbind+0x2db/0x420
Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff <0f> 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd
RSP: 0018:ffff888117d17968 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b
RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878
RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87
R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000
R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0
FS: 00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0
Call Trace:
tc_block_indr_cleanup+0x3e0/0x5a0
? tcf_block_unbind+0x420/0x420
? __mutex_unlock_slowpath+0xe7/0x610
flow_indr_dev_unregister+0x5e2/0x930
? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
? flow_indr_block_cb_alloc+0x3c0/0x3c0
? mlx5_db_free+0x37c/0x4b0 [mlx5_core]
mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core]
mlx5e_detach_netdev+0xe5/0x120 [mlx5_core]
mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core]
esw_offloads_disable+0x227/0x2b0 [mlx5_core]
mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core]
mlx5_eswitch_disable+0x94/0xf0 [mlx5_core]
mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core]
mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core]
sriov_numvfs_store+0x261/0x2f0
? sriov_drivers_autoprobe_store+0x110/0x110
? sysfs_file_ops+0x170/0x170
? sysfs_file_ops+0x117/0x170
? sysfs_file_ops+0x170/0x170
kernfs_fop_write+0x1ff/0x3f0
? rcu_read_lock_any_held+0x6e/0x90
vfs_write+0x1f3/0x620
ksys_write+0xf9/0x1d0
? __x64_sys_read+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x273/0x3f0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9

<...>

---[ end trace bfdd028ada702879 ]---

Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()")
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/cls_api.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -650,12 +650,12 @@ static void tc_block_indr_cleanup(struct
block_cb->indr.binder_type,
&block->flow_block, tcf_block_shared(block),
&extack);
+ rtnl_lock();
down_write(&block->cb_lock);
list_del(&block_cb->driver_list);
list_move(&block_cb->list, &bo.cb_list);
- up_write(&block->cb_lock);
- rtnl_lock();
tcf_block_unbind(block, &bo);
+ up_write(&block->cb_lock);
rtnl_unlock();
}



2020-10-31 11:51:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 21/70] arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs

From: Marc Zyngier <[email protected]>

commit 39533e12063be7f55e3d6ae21ffe067799d542a4 upstream.

Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for
detection and verification") changed the way we deal with per-CPU errata
by only calling the .matches() callback until one CPU is found to be
affected. At this point, .matches() stop being called, and .cpu_enable()
will be called on all CPUs.

This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
mitigated.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification")
Cc: <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/kernel/cpu_errata.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -457,6 +457,12 @@ out_printmsg:
return required;
}

+static void cpu_enable_ssbd_mitigation(const struct arm64_cpu_capabilities *cap)
+{
+ if (ssbd_state != ARM64_SSBD_FORCE_DISABLE)
+ cap->matches(cap, SCOPE_LOCAL_CPU);
+}
+
/* known invulnerable cores */
static const struct midr_range arm64_ssb_cpus[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A35),
@@ -914,6 +920,7 @@ const struct arm64_cpu_capabilities arm6
.capability = ARM64_SSBD,
.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
.matches = has_ssbd_mitigation,
+ .cpu_enable = cpu_enable_ssbd_mitigation,
.midr_range_list = arm64_ssb_cpus,
},
#ifdef CONFIG_ARM64_ERRATUM_1418040


2020-10-31 11:52:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 41/70] net: hns3: Clear the CMDQ registers before unmapping BAR region

From: Zenghui Yu <[email protected]>

[ Upstream commit e3364c5ff3ff975b943a7bf47e21a2a4bf20f3fe ]

When unbinding the hns3 driver with the HNS3 VF, I got the following
kernel panic:

[ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000
[ 265.717928] Mem abort info:
[ 265.720740] ESR = 0x96000047
[ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits
[ 265.729126] SET = 0, FnV = 0
[ 265.732195] EA = 0, S1PTW = 0
[ 265.735351] Data abort info:
[ 265.738227] ISV = 0, ISS = 0x00000047
[ 265.742071] CM = 0, WnR = 1
[ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000
[ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000
[ 265.764314] Internal error: Oops: 96000047 [#1] SMP
[ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206
[ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300
[ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300
[ 265.861992] sp : ffff80004c983b50
[ 265.881411] pmr_save: 000000e0
[ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500
[ 265.889744] x27: 0000000000000000 x26: 0000000000000000
[ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90
[ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150
[ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90
[ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004
[ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8
[ 265.916201] x15: 0000000000000028 x14: 0000000000000002
[ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8
[ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000
[ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000
[ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff
[ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001
[ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100
[ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000
[ 265.964491] Call trace:
[ 265.964494] hclgevf_cmd_uninit+0xbc/0x300
[ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8
[ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130
[ 265.964516] hns3_remove+0x34/0x88 [hns3]
[ 266.009683] pci_device_remove+0x48/0xf0
[ 266.009692] device_release_driver_internal+0x114/0x1e8
[ 266.030058] device_driver_detach+0x28/0x38
[ 266.034224] unbind_store+0xd4/0x108
[ 266.037784] drv_attr_store+0x40/0x58
[ 266.041435] sysfs_kf_write+0x54/0x80
[ 266.045081] kernfs_fop_write+0x12c/0x250
[ 266.049076] vfs_write+0xc4/0x248
[ 266.052378] ksys_write+0x74/0xf8
[ 266.055677] __arm64_sys_write+0x24/0x30
[ 266.059584] el0_svc_common.constprop.3+0x84/0x270
[ 266.064354] do_el0_svc+0x34/0xa0
[ 266.067658] el0_svc+0x38/0x40
[ 266.070700] el0_sync_handler+0x8c/0xb0
[ 266.074519] el0_sync+0x140/0x180

It looks like the BAR memory region had already been unmapped before we
start clearing CMDQ registers in it, which is pretty bad and the kernel
happily kills itself because of a Current EL Data Abort (on arm64).

Moving the CMDQ uninitialization a bit early fixes the issue for me.

Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Zenghui Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -3146,8 +3146,8 @@ static void hclgevf_uninit_hdev(struct h
hclgevf_uninit_msi(hdev);
}

- hclgevf_pci_uninit(hdev);
hclgevf_cmd_uninit(hdev);
+ hclgevf_pci_uninit(hdev);
hclgevf_uninit_mac_list(hdev);
}



2020-10-31 11:52:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 37/70] ibmveth: Fix use of ibmveth in a bridge.

From: Thomas Bogendoerfer <[email protected]>

[ Upstream commit 2ac8af0967aaa2b67cb382727e784900d2f4d0da ]

The check for src mac address in ibmveth_is_packet_unsupported is wrong.
Commit 6f2275433a2f wanted to shut down messages for loopback packets,
but now suppresses bridged frames, which are accepted by the hypervisor
otherwise bridging won't work at all.

Fixes: 6f2275433a2f ("ibmveth: Detect unsupported packets before sending to the hypervisor")
Signed-off-by: Michal Suchanek <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/ibm/ibmveth.c | 6 ------
1 file changed, 6 deletions(-)

--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1031,12 +1031,6 @@ static int ibmveth_is_packet_unsupported
ret = -EOPNOTSUPP;
}

- if (!ether_addr_equal(ether_header->h_source, netdev->dev_addr)) {
- netdev_dbg(netdev, "source packet MAC address does not match veth device's, dropping packet.\n");
- netdev->stats.tx_dropped++;
- ret = -EOPNOTSUPP;
- }
-
return ret;
}



2020-10-31 11:52:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 30/70] bnxt_en: Re-write PCI BARs after PCI fatal error.

From: Vasundhara Volam <[email protected]>

[ Upstream commit f75d9a0aa96721d20011cd5f8c7a24eb32728589 ]

When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.

pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid. So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses. These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.

Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ++++++++++++++++++-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
2 files changed, 19 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12233,6 +12233,9 @@ static pci_ers_result_t bnxt_io_error_de
return PCI_ERS_RESULT_DISCONNECT;
}

+ if (state == pci_channel_io_frozen)
+ set_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN, &bp->state);
+
if (netif_running(netdev))
bnxt_close(netdev);

@@ -12259,7 +12262,7 @@ static pci_ers_result_t bnxt_io_slot_res
{
struct net_device *netdev = pci_get_drvdata(pdev);
struct bnxt *bp = netdev_priv(netdev);
- int err = 0;
+ int err = 0, off;
pci_ers_result_t result = PCI_ERS_RESULT_DISCONNECT;

netdev_info(bp->dev, "PCI Slot Reset\n");
@@ -12271,6 +12274,20 @@ static pci_ers_result_t bnxt_io_slot_res
"Cannot re-enable PCI device after reset.\n");
} else {
pci_set_master(pdev);
+ /* Upon fatal error, our device internal logic that latches to
+ * BAR value is getting reset and will restore only upon
+ * rewritting the BARs.
+ *
+ * As pci_restore_state() does not re-write the BARs if the
+ * value is same as saved value earlier, driver needs to
+ * write the BARs to 0 to force restore, in case of fatal error.
+ */
+ if (test_and_clear_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN,
+ &bp->state)) {
+ for (off = PCI_BASE_ADDRESS_0;
+ off <= PCI_BASE_ADDRESS_5; off += 4)
+ pci_write_config_dword(bp->pdev, off, 0);
+ }
pci_restore_state(pdev);
pci_save_state(pdev);

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1672,6 +1672,7 @@ struct bnxt {
#define BNXT_STATE_ABORT_ERR 5
#define BNXT_STATE_FW_FATAL_COND 6
#define BNXT_STATE_DRV_REGISTERED 7
+#define BNXT_STATE_PCI_CHANNEL_IO_FROZEN 8

#define BNXT_NO_FW_ACCESS(bp) \
(test_bit(BNXT_STATE_FW_FATAL_COND, &(bp)->state) || \


2020-10-31 11:52:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 32/70] chelsio/chtls: fix deadlock issue

From: Vinay Kumar Yadav <[email protected]>

[ Upstream commit 28e9dcd9172028263c8225c15c4e329e08475e89 ]

In chtls_pass_establish() we hold child socket lock using bh_lock_sock
and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
Remove bh_lock_sock in add_to_reap_list() as lock is already held.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/crypto/chelsio/chtls/chtls_cm.c | 2 --
1 file changed, 2 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_cm.c
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -1513,7 +1513,6 @@ static void add_to_reap_list(struct sock
struct chtls_sock *csk = sk->sk_user_data;

local_bh_disable();
- bh_lock_sock(sk);
release_tcp_port(sk); /* release the port immediately */

spin_lock(&reap_list_lock);
@@ -1522,7 +1521,6 @@ static void add_to_reap_list(struct sock
if (!csk->passive_reap_next)
schedule_work(&reap_task);
spin_unlock(&reap_list_lock);
- bh_unlock_sock(sk);
local_bh_enable();
}



2020-10-31 11:52:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 54/70] ata: ahci: mvebu: Make SATA PHY optional for Armada 3720

From: Pali Rohár <[email protected]>

commit 45aefe3d2251e4e229d7662052739f96ad1d08d9 upstream.

Older ATF does not provide SMC call for SATA phy power on functionality and
therefore initialization of ahci_mvebu is failing when older version of ATF
is using. In this case phy_power_on() function returns -EOPNOTSUPP.

This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause
that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from
phy_power_on() call.

It fixes initialization of ahci_mvebu on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for SATA phy power.

This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell:
armada-3720-espressobin: declare SATA PHY property") where SATA phy was
defined and therefore ahci_platform_enable_phys() on Espressobin started
failing.

Fixes: 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property")
Signed-off-by: Pali Rohár <[email protected]>
Tested-by: Tomasz Maciej Nowak <[email protected]>
Cc: <[email protected]> # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ata/ahci.h | 2 ++
drivers/ata/ahci_mvebu.c | 2 +-
drivers/ata/libahci_platform.c | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -240,6 +240,8 @@ enum {
as default lpm_policy */
AHCI_HFLAG_SUSPEND_PHYS = (1 << 26), /* handle PHYs during
suspend/resume */
+ AHCI_HFLAG_IGN_NOTSUPP_POWER_ON = (1 << 27), /* ignore -EOPNOTSUPP
+ from phy_power_on() */

/* ap->flags bits */

--- a/drivers/ata/ahci_mvebu.c
+++ b/drivers/ata/ahci_mvebu.c
@@ -227,7 +227,7 @@ static const struct ahci_mvebu_plat_data

static const struct ahci_mvebu_plat_data ahci_mvebu_armada_3700_plat_data = {
.plat_config = ahci_mvebu_armada_3700_config,
- .flags = AHCI_HFLAG_SUSPEND_PHYS,
+ .flags = AHCI_HFLAG_SUSPEND_PHYS | AHCI_HFLAG_IGN_NOTSUPP_POWER_ON,
};

static const struct of_device_id ahci_mvebu_of_match[] = {
--- a/drivers/ata/libahci_platform.c
+++ b/drivers/ata/libahci_platform.c
@@ -59,7 +59,7 @@ int ahci_platform_enable_phys(struct ahc
}

rc = phy_power_on(hpriv->phys[i]);
- if (rc) {
+ if (rc && !(rc == -EOPNOTSUPP && (hpriv->flags & AHCI_HFLAG_IGN_NOTSUPP_POWER_ON))) {
phy_exit(hpriv->phys[i]);
goto disable_phys;
}


2020-10-31 11:52:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 29/70] bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.

From: Vasundhara Volam <[email protected]>

[ Upstream commit 631ce27a3006fc0b732bfd589c6df505f62eadd9 ]

As part of the commit b148bb238c02
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.

As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.

Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().

Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1158,16 +1158,6 @@ static void bnxt_queue_sp_work(struct bn
schedule_work(&bp->sp_task);
}

-static void bnxt_cancel_sp_work(struct bnxt *bp)
-{
- if (BNXT_PF(bp)) {
- flush_workqueue(bnxt_pf_wq);
- } else {
- cancel_work_sync(&bp->sp_task);
- cancel_delayed_work_sync(&bp->fw_reset_task);
- }
-}
-
static void bnxt_sched_reset(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
{
if (!rxr->bnapi->in_reset) {
@@ -11514,7 +11504,8 @@ static void bnxt_remove_one(struct pci_d
unregister_netdev(dev);
clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
/* Flush any pending tasks */
- bnxt_cancel_sp_work(bp);
+ cancel_work_sync(&bp->sp_task);
+ cancel_delayed_work_sync(&bp->fw_reset_task);
bp->sp_event = 0;

bnxt_dl_fw_reporters_destroy(bp, true);


2020-10-31 11:52:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 09/70] io_uring: dont rely on weak ->files references

From: Jens Axboe <[email protected]>

commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream.

Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.

With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.

Cc: [email protected] # v5.5+
Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/exec.c | 6
fs/file.c | 2
fs/io_uring.c | 301 +++++++++++++++++++++++++++++++++++++++++------
include/linux/io_uring.h | 53 ++++++++
include/linux/sched.h | 5
init/init_task.c | 3
kernel/fork.c | 6
7 files changed, 340 insertions(+), 36 deletions(-)
create mode 100644 include/linux/io_uring.h

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,6 +62,7 @@
#include <linux/oom.h>
#include <linux/compat.h>
#include <linux/vmalloc.h>
+#include <linux/io_uring.h>

#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -1847,6 +1848,11 @@ static int __do_execve_file(int fd, stru
* further execve() calls fail. */
current->flags &= ~PF_NPROC_EXCEEDED;

+ /*
+ * Cancel any io_uring activity across execve
+ */
+ io_uring_task_cancel();
+
retval = unshare_files(&displaced);
if (retval)
goto out_ret;
--- a/fs/file.c
+++ b/fs/file.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/io_uring.h>

unsigned int sysctl_nr_open __read_mostly = 1024*1024;
unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -439,6 +440,7 @@ void exit_files(struct task_struct *tsk)
struct files_struct * files = tsk->files;

if (files) {
+ io_uring_files_cancel(files);
task_lock(tsk);
tsk->files = NULL;
task_unlock(tsk);
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -78,6 +78,7 @@
#include <linux/fs_struct.h>
#include <linux/splice.h>
#include <linux/task_work.h>
+#include <linux/io_uring.h>

#define CREATE_TRACE_POINTS
#include <trace/events/io_uring.h>
@@ -283,8 +284,6 @@ struct io_ring_ctx {
*/
struct fixed_file_data *file_data;
unsigned nr_user_files;
- int ring_fd;
- struct file *ring_file;

/* if used, fixed mapped user buffers */
unsigned nr_user_bufs;
@@ -1335,7 +1334,12 @@ static void __io_cqring_fill_event(struc
WRITE_ONCE(cqe->user_data, req->user_data);
WRITE_ONCE(cqe->res, res);
WRITE_ONCE(cqe->flags, cflags);
- } else if (ctx->cq_overflow_flushed) {
+ } else if (ctx->cq_overflow_flushed || req->task->io_uring->in_idle) {
+ /*
+ * If we're in ring overflow flush mode, or in task cancel mode,
+ * then we cannot store the request for later flushing, we need
+ * to drop it on the floor.
+ */
WRITE_ONCE(ctx->rings->cq_overflow,
atomic_inc_return(&ctx->cached_cq_overflow));
} else {
@@ -1451,17 +1455,22 @@ static void io_req_drop_files(struct io_
wake_up(&ctx->inflight_wait);
spin_unlock_irqrestore(&ctx->inflight_lock, flags);
req->flags &= ~REQ_F_INFLIGHT;
+ put_files_struct(req->work.files);
req->work.files = NULL;
}

static void __io_req_aux_free(struct io_kiocb *req)
{
+ struct io_uring_task *tctx = req->task->io_uring;
if (req->flags & REQ_F_NEED_CLEANUP)
io_cleanup_req(req);

kfree(req->io);
if (req->file)
io_put_file(req, req->file, (req->flags & REQ_F_FIXED_FILE));
+ atomic_long_inc(&tctx->req_complete);
+ if (tctx->in_idle)
+ wake_up(&tctx->wait);
put_task_struct(req->task);
io_req_work_drop_env(req);
}
@@ -3532,8 +3541,7 @@ static int io_close_prep(struct io_kiocb
return -EBADF;

req->close.fd = READ_ONCE(sqe->fd);
- if ((req->file && req->file->f_op == &io_uring_fops) ||
- req->close.fd == req->ctx->ring_fd)
+ if ((req->file && req->file->f_op == &io_uring_fops))
return -EBADF;

req->close.put_file = NULL;
@@ -5671,32 +5679,18 @@ static int io_req_set_file(struct io_sub

static int io_grab_files(struct io_kiocb *req)
{
- int ret = -EBADF;
struct io_ring_ctx *ctx = req->ctx;

if (req->work.files || (req->flags & REQ_F_NO_FILE_TABLE))
return 0;
- if (!ctx->ring_file)
- return -EBADF;

- rcu_read_lock();
+ req->work.files = get_files_struct(current);
+ req->flags |= REQ_F_INFLIGHT;
+
spin_lock_irq(&ctx->inflight_lock);
- /*
- * We use the f_ops->flush() handler to ensure that we can flush
- * out work accessing these files if the fd is closed. Check if
- * the fd has changed since we started down this path, and disallow
- * this operation if it has.
- */
- if (fcheck(ctx->ring_fd) == ctx->ring_file) {
- list_add(&req->inflight_entry, &ctx->inflight_list);
- req->flags |= REQ_F_INFLIGHT;
- req->work.files = current->files;
- ret = 0;
- }
+ list_add(&req->inflight_entry, &ctx->inflight_list);
spin_unlock_irq(&ctx->inflight_lock);
- rcu_read_unlock();
-
- return ret;
+ return 0;
}

static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
@@ -6067,6 +6061,7 @@ static int io_init_req(struct io_ring_ct
refcount_set(&req->refs, 2);
req->task = current;
get_task_struct(req->task);
+ atomic_long_inc(&req->task->io_uring->req_issue);
req->result = 0;

if (unlikely(req->opcode >= IORING_OP_LAST))
@@ -6102,8 +6097,7 @@ static int io_init_req(struct io_ring_ct
return io_req_set_file(state, req, READ_ONCE(sqe->fd));
}

-static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
- struct file *ring_file, int ring_fd)
+static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
{
struct io_submit_state state, *statep = NULL;
struct io_kiocb *link = NULL;
@@ -6127,9 +6121,6 @@ static int io_submit_sqes(struct io_ring
statep = &state;
}

- ctx->ring_fd = ring_fd;
- ctx->ring_file = ring_file;
-
for (i = 0; i < nr; i++) {
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
@@ -6290,7 +6281,7 @@ static int io_sq_thread(void *data)

mutex_lock(&ctx->uring_lock);
if (likely(!percpu_ref_is_dying(&ctx->refs)))
- ret = io_submit_sqes(ctx, to_submit, NULL, -1);
+ ret = io_submit_sqes(ctx, to_submit);
mutex_unlock(&ctx->uring_lock);
timeout = jiffies + ctx->sq_thread_idle;
}
@@ -7119,6 +7110,34 @@ out_fput:
return ret;
}

+static int io_uring_alloc_task_context(struct task_struct *task)
+{
+ struct io_uring_task *tctx;
+
+ tctx = kmalloc(sizeof(*tctx), GFP_KERNEL);
+ if (unlikely(!tctx))
+ return -ENOMEM;
+
+ xa_init(&tctx->xa);
+ init_waitqueue_head(&tctx->wait);
+ tctx->last = NULL;
+ tctx->in_idle = 0;
+ atomic_long_set(&tctx->req_issue, 0);
+ atomic_long_set(&tctx->req_complete, 0);
+ task->io_uring = tctx;
+ return 0;
+}
+
+void __io_uring_free(struct task_struct *tsk)
+{
+ struct io_uring_task *tctx = tsk->io_uring;
+
+ WARN_ON_ONCE(!xa_empty(&tctx->xa));
+ xa_destroy(&tctx->xa);
+ kfree(tctx);
+ tsk->io_uring = NULL;
+}
+
static int io_sq_offload_start(struct io_ring_ctx *ctx,
struct io_uring_params *p)
{
@@ -7154,6 +7173,9 @@ static int io_sq_offload_start(struct io
ctx->sqo_thread = NULL;
goto err;
}
+ ret = io_uring_alloc_task_context(ctx->sqo_thread);
+ if (ret)
+ goto err;
wake_up_process(ctx->sqo_thread);
} else if (p->flags & IORING_SETUP_SQ_AFF) {
/* Can't have SQ_AFF without SQPOLL */
@@ -7633,7 +7655,7 @@ static bool io_wq_files_match(struct io_
{
struct files_struct *files = data;

- return work->files == files;
+ return !files || work->files == files;
}

/*
@@ -7787,7 +7809,7 @@ static bool io_uring_cancel_files(struct

spin_lock_irq(&ctx->inflight_lock);
list_for_each_entry(req, &ctx->inflight_list, inflight_entry) {
- if (req->work.files != files)
+ if (files && req->work.files != files)
continue;
/* req is being completed, ignore */
if (!refcount_inc_not_zero(&req->refs))
@@ -7850,18 +7872,217 @@ static bool io_cancel_task_cb(struct io_
return io_task_match(req, task);
}

+static bool __io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
+ struct task_struct *task,
+ struct files_struct *files)
+{
+ bool ret;
+
+ ret = io_uring_cancel_files(ctx, files);
+ if (!files) {
+ enum io_wq_cancel cret;
+
+ cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, task, true);
+ if (cret != IO_WQ_CANCEL_NOTFOUND)
+ ret = true;
+
+ /* SQPOLL thread does its own polling */
+ if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
+ if (!list_empty_careful(&ctx->poll_list)) {
+ io_iopoll_reap_events(ctx);
+ ret = true;
+ }
+ }
+
+ ret |= io_poll_remove_all(ctx, task);
+ ret |= io_kill_timeouts(ctx, task);
+ }
+
+ return ret;
+}
+
+/*
+ * We need to iteratively cancel requests, in case a request has dependent
+ * hard links. These persist even for failure of cancelations, hence keep
+ * looping until none are found.
+ */
+static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
+ struct files_struct *files)
+{
+ struct task_struct *task = current;
+
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ task = ctx->sqo_thread;
+
+ io_cqring_overflow_flush(ctx, true, task, files);
+
+ while (__io_uring_cancel_task_requests(ctx, task, files)) {
+ io_run_task_work();
+ cond_resched();
+ }
+}
+
+/*
+ * Note that this task has used io_uring. We use it for cancelation purposes.
+ */
+static int io_uring_add_task_file(struct file *file)
+{
+ if (unlikely(!current->io_uring)) {
+ int ret;
+
+ ret = io_uring_alloc_task_context(current);
+ if (unlikely(ret))
+ return ret;
+ }
+ if (current->io_uring->last != file) {
+ XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
+ void *old;
+
+ rcu_read_lock();
+ old = xas_load(&xas);
+ if (old != file) {
+ get_file(file);
+ xas_lock(&xas);
+ xas_store(&xas, file);
+ xas_unlock(&xas);
+ }
+ rcu_read_unlock();
+ current->io_uring->last = file;
+ }
+
+ return 0;
+}
+
+/*
+ * Remove this io_uring_file -> task mapping.
+ */
+static void io_uring_del_task_file(struct file *file)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ XA_STATE(xas, &tctx->xa, (unsigned long) file);
+
+ if (tctx->last == file)
+ tctx->last = NULL;
+
+ xas_lock(&xas);
+ file = xas_store(&xas, NULL);
+ xas_unlock(&xas);
+
+ if (file)
+ fput(file);
+}
+
+static void __io_uring_attempt_task_drop(struct file *file)
+{
+ XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
+ struct file *old;
+
+ rcu_read_lock();
+ old = xas_load(&xas);
+ rcu_read_unlock();
+
+ if (old == file)
+ io_uring_del_task_file(file);
+}
+
+/*
+ * Drop task note for this file if we're the only ones that hold it after
+ * pending fput()
+ */
+static void io_uring_attempt_task_drop(struct file *file, bool exiting)
+{
+ if (!current->io_uring)
+ return;
+ /*
+ * fput() is pending, will be 2 if the only other ref is our potential
+ * task file note. If the task is exiting, drop regardless of count.
+ */
+ if (!exiting && atomic_long_read(&file->f_count) != 2)
+ return;
+
+ __io_uring_attempt_task_drop(file);
+}
+
+void __io_uring_files_cancel(struct files_struct *files)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ XA_STATE(xas, &tctx->xa, 0);
+
+ /* make sure overflow events are dropped */
+ tctx->in_idle = true;
+
+ do {
+ struct io_ring_ctx *ctx;
+ struct file *file;
+
+ xas_lock(&xas);
+ file = xas_next_entry(&xas, ULONG_MAX);
+ xas_unlock(&xas);
+
+ if (!file)
+ break;
+
+ ctx = file->private_data;
+
+ io_uring_cancel_task_requests(ctx, files);
+ if (files)
+ io_uring_del_task_file(file);
+ } while (1);
+}
+
+static inline bool io_uring_task_idle(struct io_uring_task *tctx)
+{
+ return atomic_long_read(&tctx->req_issue) ==
+ atomic_long_read(&tctx->req_complete);
+}
+
+/*
+ * Find any io_uring fd that this task has registered or done IO on, and cancel
+ * requests.
+ */
+void __io_uring_task_cancel(void)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ DEFINE_WAIT(wait);
+ long completions;
+
+ /* make sure overflow events are dropped */
+ tctx->in_idle = true;
+
+ while (!io_uring_task_idle(tctx)) {
+ /* read completions before cancelations */
+ completions = atomic_long_read(&tctx->req_complete);
+ __io_uring_files_cancel(NULL);
+
+ prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
+
+ /*
+ * If we've seen completions, retry. This avoids a race where
+ * a completion comes in before we did prepare_to_wait().
+ */
+ if (completions != atomic_long_read(&tctx->req_complete))
+ continue;
+ if (io_uring_task_idle(tctx))
+ break;
+ schedule();
+ }
+
+ finish_wait(&tctx->wait, &wait);
+ tctx->in_idle = false;
+}
+
static int io_uring_flush(struct file *file, void *data)
{
struct io_ring_ctx *ctx = file->private_data;

- io_uring_cancel_files(ctx, data);
-
/*
* If the task is going away, cancel work it may have pending
*/
if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
- io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, current, true);
+ data = NULL;

+ io_uring_cancel_task_requests(ctx, data);
+ io_uring_attempt_task_drop(file, !data);
return 0;
}

@@ -7975,8 +8196,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
wake_up(&ctx->sqo_wait);
submitted = to_submit;
} else if (to_submit) {
+ ret = io_uring_add_task_file(f.file);
+ if (unlikely(ret))
+ goto out;
mutex_lock(&ctx->uring_lock);
- submitted = io_submit_sqes(ctx, to_submit, f.file, fd);
+ submitted = io_submit_sqes(ctx, to_submit);
mutex_unlock(&ctx->uring_lock);

if (submitted != to_submit)
@@ -8188,6 +8412,7 @@ static int io_uring_get_fd(struct io_rin
file = anon_inode_getfile("[io_uring]", &io_uring_fops, ctx,
O_RDWR | O_CLOEXEC);
if (IS_ERR(file)) {
+err_fd:
put_unused_fd(ret);
ret = PTR_ERR(file);
goto err;
@@ -8196,6 +8421,10 @@ static int io_uring_get_fd(struct io_rin
#if defined(CONFIG_UNIX)
ctx->ring_sock->file = file;
#endif
+ if (unlikely(io_uring_add_task_file(file))) {
+ file = ERR_PTR(-ENOMEM);
+ goto err_fd;
+ }
fd_install(ret, file);
return ret;
err:
--- /dev/null
+++ b/include/linux/io_uring.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_IO_URING_H
+#define _LINUX_IO_URING_H
+
+#include <linux/sched.h>
+#include <linux/xarray.h>
+#include <linux/percpu-refcount.h>
+
+struct io_uring_task {
+ /* submission side */
+ struct xarray xa;
+ struct wait_queue_head wait;
+ struct file *last;
+ atomic_long_t req_issue;
+
+ /* completion side */
+ bool in_idle ____cacheline_aligned_in_smp;
+ atomic_long_t req_complete;
+};
+
+#if defined(CONFIG_IO_URING)
+void __io_uring_task_cancel(void);
+void __io_uring_files_cancel(struct files_struct *files);
+void __io_uring_free(struct task_struct *tsk);
+
+static inline void io_uring_task_cancel(void)
+{
+ if (current->io_uring && !xa_empty(&current->io_uring->xa))
+ __io_uring_task_cancel();
+}
+static inline void io_uring_files_cancel(struct files_struct *files)
+{
+ if (current->io_uring && !xa_empty(&current->io_uring->xa))
+ __io_uring_files_cancel(files);
+}
+static inline void io_uring_free(struct task_struct *tsk)
+{
+ if (tsk->io_uring)
+ __io_uring_free(tsk);
+}
+#else
+static inline void io_uring_task_cancel(void)
+{
+}
+static inline void io_uring_files_cancel(struct files_struct *files)
+{
+}
+static inline void io_uring_free(struct task_struct *tsk)
+{
+}
+#endif
+
+#endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -61,6 +61,7 @@ struct sighand_struct;
struct signal_struct;
struct task_delay_info;
struct task_group;
+struct io_uring_task;

/*
* Task state bitmask. NOTE! These bits are also
@@ -923,6 +924,10 @@ struct task_struct {
/* Open file information: */
struct files_struct *files;

+#ifdef CONFIG_IO_URING
+ struct io_uring_task *io_uring;
+#endif
+
/* Namespaces: */
struct nsproxy *nsproxy;

--- a/init/init_task.c
+++ b/init/init_task.c
@@ -113,6 +113,9 @@ struct task_struct init_task
.thread = INIT_THREAD,
.fs = &init_fs,
.files = &init_files,
+#ifdef CONFIG_IO_URING
+ .io_uring = NULL,
+#endif
.signal = &init_signals,
.sighand = &init_sighand,
.nsproxy = &init_nsproxy,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -95,6 +95,7 @@
#include <linux/stackleak.h>
#include <linux/kasan.h>
#include <linux/scs.h>
+#include <linux/io_uring.h>

#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -745,6 +746,7 @@ void __put_task_struct(struct task_struc
WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);

+ io_uring_free(tsk);
cgroup_free(tsk);
task_numa_free(tsk, true);
security_task_free(tsk);
@@ -2022,6 +2024,10 @@ static __latent_entropy struct task_stru
p->vtime.state = VTIME_INACTIVE;
#endif

+#ifdef CONFIG_IO_URING
+ p->io_uring = NULL;
+#endif
+
#if defined(SPLIT_RSS_COUNTING)
memset(&p->rss_stat, 0, sizeof(p->rss_stat));
#endif


2020-10-31 11:52:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 08/70] io_uring: enable task/files specific overflow flushing

From: Jens Axboe <[email protected]>

commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream.

This allows us to selectively flush out pending overflows, depending on
the task and/or files_struct being passed in.

No intended functional changes in this patch.

Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
1 file changed, 26 insertions(+), 15 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1240,12 +1240,24 @@ static void io_cqring_ev_posted(struct i
eventfd_signal(ctx->cq_ev_fd, 1);
}

+static inline bool io_match_files(struct io_kiocb *req,
+ struct files_struct *files)
+{
+ if (!files)
+ return true;
+ if (req->flags & REQ_F_WORK_INITIALIZED)
+ return req->work.files == files;
+ return false;
+}
+
/* Returns true if there are no backlogged entries after the flush */
-static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
+static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
+ struct task_struct *tsk,
+ struct files_struct *files)
{
struct io_rings *rings = ctx->rings;
+ struct io_kiocb *req, *tmp;
struct io_uring_cqe *cqe;
- struct io_kiocb *req;
unsigned long flags;
LIST_HEAD(list);

@@ -1264,7 +1276,12 @@ static bool io_cqring_overflow_flush(str
ctx->cq_overflow_flushed = 1;

cqe = NULL;
- while (!list_empty(&ctx->cq_overflow_list)) {
+ list_for_each_entry_safe(req, tmp, &ctx->cq_overflow_list, list) {
+ if (tsk && req->task != tsk)
+ continue;
+ if (!io_match_files(req, files))
+ continue;
+
cqe = io_get_cqring(ctx);
if (!cqe && !force)
break;
@@ -1734,7 +1751,7 @@ static unsigned io_cqring_events(struct
if (noflush && !list_empty(&ctx->cq_overflow_list))
return -1U;

- io_cqring_overflow_flush(ctx, false);
+ io_cqring_overflow_flush(ctx, false, NULL, NULL);
}

/* See comment at the top of this file */
@@ -6095,7 +6112,7 @@ static int io_submit_sqes(struct io_ring
/* if we have a backlog and couldn't flush it all, return BUSY */
if (test_bit(0, &ctx->sq_check_overflow)) {
if (!list_empty(&ctx->cq_overflow_list) &&
- !io_cqring_overflow_flush(ctx, false))
+ !io_cqring_overflow_flush(ctx, false, NULL, NULL))
return -EBUSY;
}

@@ -7556,7 +7573,7 @@ static void io_ring_exit_work(struct wor

ctx = container_of(work, struct io_ring_ctx, exit_work);
if (ctx->rings)
- io_cqring_overflow_flush(ctx, true);
+ io_cqring_overflow_flush(ctx, true, NULL, NULL);

/*
* If we're doing polled IO and end up having requests being
@@ -7567,7 +7584,7 @@ static void io_ring_exit_work(struct wor
while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20)) {
io_iopoll_reap_events(ctx);
if (ctx->rings)
- io_cqring_overflow_flush(ctx, true);
+ io_cqring_overflow_flush(ctx, true, NULL, NULL);
}
io_ring_ctx_free(ctx);
}
@@ -7587,7 +7604,7 @@ static void io_ring_ctx_wait_and_kill(st
io_iopoll_reap_events(ctx);
/* if we failed setting up the ctx, we might not have any rings */
if (ctx->rings)
- io_cqring_overflow_flush(ctx, true);
+ io_cqring_overflow_flush(ctx, true, NULL, NULL);
idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);

/*
@@ -7637,12 +7654,6 @@ static bool io_match_link(struct io_kioc
return false;
}

-static inline bool io_match_files(struct io_kiocb *req,
- struct files_struct *files)
-{
- return (req->flags & REQ_F_WORK_INITIALIZED) && req->work.files == files;
-}
-
static bool io_match_link_files(struct io_kiocb *req,
struct files_struct *files)
{
@@ -7959,7 +7970,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
ret = 0;
if (ctx->flags & IORING_SETUP_SQPOLL) {
if (!list_empty_careful(&ctx->cq_overflow_list))
- io_cqring_overflow_flush(ctx, false);
+ io_cqring_overflow_flush(ctx, false, NULL, NULL);
if (flags & IORING_ENTER_SQ_WAKEUP)
wake_up(&ctx->sqo_wait);
submitted = to_submit;


2020-10-31 11:52:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 63/70] serial: qcom_geni_serial: To correct QUP Version detection logic

From: Paras Sharma <[email protected]>

commit c9ca43d42ed8d5fd635d327a664ed1d8579eb2af upstream.

For QUP IP versions 2.5 and above the oversampling rate is
halved from 32 to 16.

Commit ce734600545f ("tty: serial: qcom_geni_serial: Update
the oversampling rate") is pushed to handle this scenario.
But the existing logic is failing to classify QUP Version 3.0
into the correct group ( 2.5 and above).

As result Serial Engine clocks are not configured properly for
baud rate and garbage data is sampled to FIFOs from the line.

So, fix the logic to detect QUP with versions 2.5 and above.

Fixes: ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate")
Cc: stable <[email protected]>
Signed-off-by: Paras Sharma <[email protected]>
Reviewed-by: Akash Asthana <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/serial/qcom_geni_serial.c | 2 +-
include/linux/qcom-geni-se.h | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -954,7 +954,7 @@ static void qcom_geni_serial_set_termios
sampling_rate = UART_OVERSAMPLING;
/* Sampling rate is halved for IP versions >= 2.5 */
ver = geni_se_get_qup_hw_version(&port->se);
- if (GENI_SE_VERSION_MAJOR(ver) >= 2 && GENI_SE_VERSION_MINOR(ver) >= 5)
+ if (ver >= QUP_SE_VERSION_2_5)
sampling_rate /= 2;

clk_rate = get_clk_div_rate(baud, sampling_rate, &clk_div);
--- a/include/linux/qcom-geni-se.h
+++ b/include/linux/qcom-geni-se.h
@@ -229,6 +229,9 @@ struct geni_se {
#define GENI_SE_VERSION_MINOR(ver) ((ver & HW_VER_MINOR_MASK) >> HW_VER_MINOR_SHFT)
#define GENI_SE_VERSION_STEP(ver) (ver & HW_VER_STEP_MASK)

+/* QUP SE VERSION value for major number 2 and minor number 5 */
+#define QUP_SE_VERSION_2_5 0x20050000
+
#if IS_ENABLED(CONFIG_QCOM_GENI_SE)

u32 geni_se_get_qup_hw_version(struct geni_se *se);


2020-10-31 11:52:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 33/70] chelsio/chtls: fix memory leaks in CPL handlers

From: Vinay Kumar Yadav <[email protected]>

[ Upstream commit 6daa1da4e262b0cd52ef0acc1989ff22b5540264 ]

CPL handler functions chtls_pass_open_rpl() and
chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
so that caller function will do skb free to avoid leak.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/crypto/chelsio/chtls/chtls_cm.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_cm.c
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -772,14 +772,13 @@ static int chtls_pass_open_rpl(struct ch
if (rpl->status != CPL_ERR_NONE) {
pr_info("Unexpected PASS_OPEN_RPL status %u for STID %u\n",
rpl->status, stid);
- return CPL_RET_BUF_DONE;
+ } else {
+ cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
+ sock_put(listen_ctx->lsk);
+ kfree(listen_ctx);
+ module_put(THIS_MODULE);
}
- cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
- sock_put(listen_ctx->lsk);
- kfree(listen_ctx);
- module_put(THIS_MODULE);
-
- return 0;
+ return CPL_RET_BUF_DONE;
}

static int chtls_close_listsrv_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
@@ -796,15 +795,13 @@ static int chtls_close_listsrv_rpl(struc
if (rpl->status != CPL_ERR_NONE) {
pr_info("Unexpected CLOSE_LISTSRV_RPL status %u for STID %u\n",
rpl->status, stid);
- return CPL_RET_BUF_DONE;
+ } else {
+ cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
+ sock_put(listen_ctx->lsk);
+ kfree(listen_ctx);
+ module_put(THIS_MODULE);
}
-
- cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
- sock_put(listen_ctx->lsk);
- kfree(listen_ctx);
- module_put(THIS_MODULE);
-
- return 0;
+ return CPL_RET_BUF_DONE;
}

static void chtls_purge_wr_queue(struct sock *sk)


2020-10-31 11:52:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 14/70] io_uring: Fix use of XArray in __io_uring_files_cancel

From: "Matthew Wilcox (Oracle)" <[email protected]>

commit ce765372bc443573d1d339a2bf4995de385dea3a upstream.

We have to drop the lock during each iteration, so there's no advantage
to using the advanced API. Convert this to a standard xa_for_each() loop.

Reported-by: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/io_uring.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8008,28 +8008,19 @@ static void io_uring_attempt_task_drop(s
void __io_uring_files_cancel(struct files_struct *files)
{
struct io_uring_task *tctx = current->io_uring;
- XA_STATE(xas, &tctx->xa, 0);
+ struct file *file;
+ unsigned long index;

/* make sure overflow events are dropped */
tctx->in_idle = true;

- do {
- struct io_ring_ctx *ctx;
- struct file *file;
-
- xas_lock(&xas);
- file = xas_next_entry(&xas, ULONG_MAX);
- xas_unlock(&xas);
-
- if (!file)
- break;
-
- ctx = file->private_data;
+ xa_for_each(&tctx->xa, index, file) {
+ struct io_ring_ctx *ctx = file->private_data;

io_uring_cancel_task_requests(ctx, files);
if (files)
io_uring_del_task_file(file);
- } while (1);
+ }
}

static inline bool io_uring_task_idle(struct io_uring_task *tctx)


2020-10-31 11:52:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 16/70] io_uring: Convert advanced XArray uses to the normal API

From: "Matthew Wilcox (Oracle)" <[email protected]>

commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream.

There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/io_uring.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7958,27 +7958,17 @@ static int io_uring_add_task_file(struct
static void io_uring_del_task_file(struct file *file)
{
struct io_uring_task *tctx = current->io_uring;
- XA_STATE(xas, &tctx->xa, (unsigned long) file);

if (tctx->last == file)
tctx->last = NULL;
-
- xas_lock(&xas);
- file = xas_store(&xas, NULL);
- xas_unlock(&xas);
-
+ file = xa_erase(&tctx->xa, (unsigned long)file);
if (file)
fput(file);
}

static void __io_uring_attempt_task_drop(struct file *file)
{
- XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
- struct file *old;
-
- rcu_read_lock();
- old = xas_load(&xas);
- rcu_read_unlock();
+ struct file *old = xa_load(&current->io_uring->xa, (unsigned long)file);

if (old == file)
io_uring_del_task_file(file);


2020-10-31 11:53:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 60/70] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()

From: Jason Gunthorpe <[email protected]>

commit 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e upstream.

This three thread race can result in the work being run once the callback
becomes NULL:

CPU1 CPU2 CPU3
netevent_callback()
process_one_req() rdma_addr_cancel()
[..]
spin_lock_bh()
set_timeout()
spin_unlock_bh()

spin_lock_bh()
list_del_init(&req->list);
spin_unlock_bh()

req->callback = NULL
spin_lock_bh()
if (!list_empty(&req->list))
// Skipped!
// cancel_delayed_work(&req->work);
spin_unlock_bh()

process_one_req() // again
req->callback() // BOOM
cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.

Cc: [email protected]
Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Dan Aloni <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/core/addr.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -647,13 +647,12 @@ static void process_one_req(struct work_
req->callback = NULL;

spin_lock_bh(&lock);
+ /*
+ * Although the work will normally have been canceled by the workqueue,
+ * it can still be requeued as long as it is on the req_list.
+ */
+ cancel_delayed_work(&req->work);
if (!list_empty(&req->list)) {
- /*
- * Although the work will normally have been canceled by the
- * workqueue, it can still be requeued as long as it is on the
- * req_list.
- */
- cancel_delayed_work(&req->work);
list_del_init(&req->list);
kfree(req);
}


2020-10-31 11:53:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 39/70] mlxsw: core: Fix memory leak on module removal

From: Ido Schimmel <[email protected]>

[ Upstream commit adc80b6cfedff6dad8b93d46a5ea2775fd5af9ec ]

Free the devlink instance during the teardown sequence in the non-reload
case to avoid the following memory leak.

unreferenced object 0xffff888232895000 (size 2048):
comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........".......
10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2....
backtrace:
[<00000000c704e9a6>] __kmalloc+0x13a/0x2a0
[<00000000ee30129d>] devlink_alloc+0xff/0x760
[<0000000092ab3e5d>] 0xffffffffa042e5b0
[<000000004f3f8a31>] 0xffffffffa042f6ad
[<0000000092800b4b>] 0xffffffffa0491df3
[<00000000c4843903>] local_pci_probe+0xcb/0x170
[<000000006993ded7>] pci_device_probe+0x2c2/0x4e0
[<00000000a8e0de75>] really_probe+0x2c5/0xf90
[<00000000d42ba75d>] driver_probe_device+0x1eb/0x340
[<00000000bcc95e05>] device_driver_attach+0x294/0x300
[<000000000e2bc177>] __driver_attach+0x167/0x2f0
[<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0
[<000000003cd5a91e>] driver_attach+0x45/0x60
[<000000000041ce51>] bus_add_driver+0x3b8/0x720
[<00000000f5215476>] driver_register+0x230/0x4e0
[<00000000d79356f5>] __pci_register_driver+0x190/0x200

Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Vadim Pasternak <[email protected]>
Tested-by: Oleksandr Shamray <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1483,6 +1483,8 @@ void mlxsw_core_bus_device_unregister(st
if (!reload)
devlink_resources_unregister(devlink, NULL);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
+ if (!reload)
+ devlink_free(devlink);

return;



2020-10-31 11:53:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 03/70] io_uring: allow timeout/poll/files killing to take task into account

From: Jens Axboe <[email protected]>

commit f3606e3a92ddd36299642c78592fc87609abb1f6 upstream.

We currently cancel these when the ring exits, and we cancel all of
them. This is in preparation for killing only the ones associated
with a given task.

Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/io_uring.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1141,13 +1141,25 @@ static void io_kill_timeout(struct io_ki
}
}

-static void io_kill_timeouts(struct io_ring_ctx *ctx)
+static bool io_task_match(struct io_kiocb *req, struct task_struct *tsk)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (!tsk || req->task == tsk)
+ return true;
+ if ((ctx->flags & IORING_SETUP_SQPOLL) && req->task == ctx->sqo_thread)
+ return true;
+ return false;
+}
+
+static void io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk)
{
struct io_kiocb *req, *tmp;

spin_lock_irq(&ctx->completion_lock);
list_for_each_entry_safe(req, tmp, &ctx->timeout_list, list)
- io_kill_timeout(req);
+ if (io_task_match(req, tsk))
+ io_kill_timeout(req);
spin_unlock_irq(&ctx->completion_lock);
}

@@ -4641,7 +4653,7 @@ static bool io_poll_remove_one(struct io
return do_complete;
}

-static void io_poll_remove_all(struct io_ring_ctx *ctx)
+static void io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk)
{
struct hlist_node *tmp;
struct io_kiocb *req;
@@ -4652,8 +4664,10 @@ static void io_poll_remove_all(struct io
struct hlist_head *list;

list = &ctx->cancel_hash[i];
- hlist_for_each_entry_safe(req, tmp, list, hash_node)
- posted += io_poll_remove_one(req);
+ hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+ if (io_task_match(req, tsk))
+ posted += io_poll_remove_one(req);
+ }
}
spin_unlock_irq(&ctx->completion_lock);

@@ -7556,8 +7570,8 @@ static void io_ring_ctx_wait_and_kill(st
percpu_ref_kill(&ctx->refs);
mutex_unlock(&ctx->uring_lock);

- io_kill_timeouts(ctx);
- io_poll_remove_all(ctx);
+ io_kill_timeouts(ctx, NULL);
+ io_poll_remove_all(ctx, NULL);

if (ctx->io_wq)
io_wq_cancel_all(ctx->io_wq);
@@ -7809,7 +7823,7 @@ static bool io_cancel_task_cb(struct io_
struct io_kiocb *req = container_of(work, struct io_kiocb, work);
struct task_struct *task = data;

- return req->task == task;
+ return io_task_match(req, task);
}

static int io_uring_flush(struct file *file, void *data)


2020-10-31 11:53:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 22/70] arm64: link with -z norelro regardless of CONFIG_RELOCATABLE

From: Nick Desaulniers <[email protected]>

commit 3b92fa7485eba16b05166fddf38ab42f2ff6ab95 upstream.

With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n,
CONFIG_RELOCATABLE=n, we observe the following failure when trying to
link the kernel image with LD=ld.lld:

error: section: .exit.data is not contiguous with other relro sections

ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This
was previously fixed, but only for CONFIG_RELOCATABLE=y.

Fixes: 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options")
Signed-off-by: Nick Desaulniers <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -10,14 +10,14 @@
#
# Copyright (C) 1995-2001 by Russell King

-LDFLAGS_vmlinux :=--no-undefined -X
+LDFLAGS_vmlinux :=--no-undefined -X -z norelro
CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)

ifeq ($(CONFIG_RELOCATABLE), y)
# Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
# for relative relocs, since this leads to better Image compression
# with the relocation offsets always being zero.
-LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro \
+LDFLAGS_vmlinux += -shared -Bsymbolic -z notext \
$(call ld-option, --no-apply-dynamic-relocs)
endif



2020-10-31 11:53:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 25/70] x86/copy_mc: Introduce copy_mc_enhanced_fast_string()

From: Dan Williams <[email protected]>

commit 5da8e4a658109e3b7e1f45ae672b7c06ac3e7158 upstream.

The motivations to go rework memcpy_mcsafe() are that the benefit of
doing slow and careful copies is obviated on newer CPUs, and that the
current opt-in list of CPUs to instrument recovery is broken relative to
those CPUs. There is no need to keep an opt-in list up to date on an
ongoing basis if pmem/dax operations are instrumented for recovery by
default. With recovery enabled by default the old "mcsafe_key" opt-in to
careful copying can be made a "fragile" opt-out. Where the "fragile"
list takes steps to not consume poison across cachelines.

The discussion with Linus made clear that the current "_mcsafe" suffix
was imprecise to a fault. The operations that are needed by pmem/dax are
to copy from a source address that might throw #MC to a destination that
may write-fault, if it is a user page.

So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
the separate precautions taken on source and destination.
copy_mc_to_kernel() is introduced as a non-SMAP version that does not
expect write-faults on the destination, but is still prepared to abort
with an error code upon taking #MC.

The original copy_mc_fragile() implementation had negative performance
implications since it did not use the fast-string instruction sequence
to perform copies. For this reason copy_mc_to_kernel() fell back to
plain memcpy() to preserve performance on platforms that did not indicate
the capability to recover from machine check exceptions. However, that
capability detection was not architectural and now that some platforms
can recover from fast-string consumption of memory errors the memcpy()
fallback now causes these more capable platforms to fail.

Introduce copy_mc_enhanced_fast_string() as the fast default
implementation of copy_mc_to_kernel() and finalize the transition of
copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
With this in place, copy_mc_to_kernel() is fast and recovery-ready by
default regardless of hardware capability.

Thanks to Vivek for identifying that copy_user_generic() is not suitable
as the copy_mc_to_user() backend since the #MC handler explicitly checks
ex_has_fault_handler(). Thanks to the 0day robot for catching a
performance bug in the x86/copy_mc_to_user implementation.

[ bp: Add the "why" for this change from the 0/2th message, massage. ]

Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Reported-by: Erwin Tsaur <[email protected]>
Reported-by: 0day robot <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Tested-by: Erwin Tsaur <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/lib/copy_mc.c | 32 +++++++++++++++++++++++---------
arch/x86/lib/copy_mc_64.S | 36 ++++++++++++++++++++++++++++++++++++
tools/objtool/check.c | 1 +
3 files changed, 60 insertions(+), 9 deletions(-)

--- a/arch/x86/lib/copy_mc.c
+++ b/arch/x86/lib/copy_mc.c
@@ -45,6 +45,8 @@ void enable_copy_mc_fragile(void)
#define copy_mc_fragile_enabled (0)
#endif

+unsigned long copy_mc_enhanced_fast_string(void *dst, const void *src, unsigned len);
+
/**
* copy_mc_to_kernel - memory copy that handles source exceptions
*
@@ -52,9 +54,11 @@ void enable_copy_mc_fragile(void)
* @src: source address
* @len: number of bytes to copy
*
- * Call into the 'fragile' version on systems that have trouble
- * actually do machine check recovery. Everyone else can just
- * use memcpy().
+ * Call into the 'fragile' version on systems that benefit from avoiding
+ * corner case poison consumption scenarios, For example, accessing
+ * poison across 2 cachelines with a single instruction. Almost all
+ * other uses case can use copy_mc_enhanced_fast_string() for a fast
+ * recoverable copy, or fallback to plain memcpy.
*
* Return 0 for success, or number of bytes not copied if there was an
* exception.
@@ -63,6 +67,8 @@ unsigned long __must_check copy_mc_to_ke
{
if (copy_mc_fragile_enabled)
return copy_mc_fragile(dst, src, len);
+ if (static_cpu_has(X86_FEATURE_ERMS))
+ return copy_mc_enhanced_fast_string(dst, src, len);
memcpy(dst, src, len);
return 0;
}
@@ -72,11 +78,19 @@ unsigned long __must_check copy_mc_to_us
{
unsigned long ret;

- if (!copy_mc_fragile_enabled)
- return copy_user_generic(dst, src, len);
+ if (copy_mc_fragile_enabled) {
+ __uaccess_begin();
+ ret = copy_mc_fragile(dst, src, len);
+ __uaccess_end();
+ return ret;
+ }
+
+ if (static_cpu_has(X86_FEATURE_ERMS)) {
+ __uaccess_begin();
+ ret = copy_mc_enhanced_fast_string(dst, src, len);
+ __uaccess_end();
+ return ret;
+ }

- __uaccess_begin();
- ret = copy_mc_fragile(dst, src, len);
- __uaccess_end();
- return ret;
+ return copy_user_generic(dst, src, len);
}
--- a/arch/x86/lib/copy_mc_64.S
+++ b/arch/x86/lib/copy_mc_64.S
@@ -124,4 +124,40 @@ EXPORT_SYMBOL_GPL(copy_mc_fragile)
_ASM_EXTABLE(.L_write_words, .E_write_words)
_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
#endif /* CONFIG_X86_MCE */
+
+/*
+ * copy_mc_enhanced_fast_string - memory copy with exception handling
+ *
+ * Fast string copy + fault / exception handling. If the CPU does
+ * support machine check exception recovery, but does not support
+ * recovering from fast-string exceptions then this CPU needs to be
+ * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any
+ * machine check recovery support this version should be no slower than
+ * standard memcpy.
+ */
+SYM_FUNC_START(copy_mc_enhanced_fast_string)
+ movq %rdi, %rax
+ movq %rdx, %rcx
+.L_copy:
+ rep movsb
+ /* Copy successful. Return zero */
+ xorl %eax, %eax
+ ret
+SYM_FUNC_END(copy_mc_enhanced_fast_string)
+
+ .section .fixup, "ax"
+.E_copy:
+ /*
+ * On fault %rcx is updated such that the copy instruction could
+ * optionally be restarted at the fault position, i.e. it
+ * contains 'bytes remaining'. A non-zero return indicates error
+ * to copy_mc_generic() users, or indicate short transfers to
+ * user-copy routines.
+ */
+ movq %rcx, %rax
+ ret
+
+ .previous
+
+ _ASM_EXTABLE_FAULT(.L_copy, .E_copy)
#endif /* !CONFIG_UML */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -550,6 +550,7 @@ static const char *uaccess_safe_builtin[
"csum_partial_copy_generic",
"copy_mc_fragile",
"copy_mc_fragile_handle_tail",
+ "copy_mc_enhanced_fast_string",
"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */
NULL
};


2020-10-31 11:54:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 19/70] fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum

From: Kees Cook <[email protected]>

commit 06e67b849ab910a49a629445f43edb074153d0eb upstream.

The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It
should not be distinguished separately from just "FIRMWARE", as this
confuses the LSMs about what is being loaded. Additionally, there was
no actual validation of the firmware contents happening.

Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firmware_request_platform()")
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Acked-by: Scott Branden <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_loader/fallback_platform.c | 2 +-
include/linux/fs.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/base/firmware_loader/fallback_platform.c
+++ b/drivers/base/firmware_loader/fallback_platform.c
@@ -17,7 +17,7 @@ int firmware_fallback_platform(struct fw
if (!(opt_flags & FW_OPT_FALLBACK_PLATFORM))
return -ENOENT;

- rc = security_kernel_load_data(LOADING_FIRMWARE_EFI_EMBEDDED);
+ rc = security_kernel_load_data(LOADING_FIRMWARE);
if (rc)
return rc;

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3011,7 +3011,6 @@ extern int do_pipe_flags(int *, int);
id(UNKNOWN, unknown) \
id(FIRMWARE, firmware) \
id(FIRMWARE_PREALLOC_BUFFER, firmware) \
- id(FIRMWARE_EFI_EMBEDDED, firmware) \
id(MODULE, kernel-module) \
id(KEXEC_IMAGE, kexec-image) \
id(KEXEC_INITRAMFS, kexec-initramfs) \


2020-10-31 11:54:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.8 17/70] scripts/setlocalversion: make git describe output more reliable

From: Rasmus Villemoes <[email protected]>

commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.

When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, make sure the abbreviated
sha1 always consists of exactly 12 hex characters. That is consistent
with the current rule for -stable patches, and is almost always enough
to identify the head commit unambigously - in the few cases where it
does not, the v5.4.3-00021- prefix would certainly nail it down.

Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
scripts/setlocalversion | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -45,7 +45,7 @@ scm_version()

# Check for git and a git repo.
if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
- head=$(git rev-parse --verify --short HEAD 2>/dev/null); then
+ head=$(git rev-parse --verify HEAD 2>/dev/null); then

# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
# it, because this version is defined in the top level Makefile.
@@ -59,11 +59,22 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
- if atag="$(git describe 2>/dev/null)"; then
- echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}'
+ #
+ # Ensure the abbreviated sha1 has exactly 12
+ # hex characters, to make the output
+ # independent of git version, local
+ # core.abbrev settings and/or total number of
+ # objects in the current repository - passing
+ # --abbrev=12 ensures a minimum of 12, and the
+ # awk substr() then picks the 'g' and first 12
+ # hex chars.
+ if atag="$(git describe --abbrev=12 2>/dev/null)"; then
+ echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}'

- # If we don't have a tag at all we print -g{commitish}.
+ # If we don't have a tag at all we print -g{commitish},
+ # again using exactly 12 hex chars.
else
+ head="$(echo $head | cut -c1-12)"
printf '%s%s' -g $head
fi
fi


2020-10-31 20:11:26

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.8 00/70] 5.8.18-rc1 review

On Sat, Oct 31, 2020 at 12:35:32PM +0100, Greg Kroah-Hartman wrote:
> -------------------------
> Note, this is going to be the LAST 5.8.y kernel release. After this
> one, this branch is now end-of-life. Please move to the 5.9.y branch at
> this point in time.
> -------------------------
>
> This is the start of the stable review cycle for the 5.8.18 release.
> There are 70 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 154 pass: 154 fail: 0
Qemu test results:
total: 426 pass: 426 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2020-11-01 07:23:16

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 5.8 00/70] 5.8.18-rc1 review

On Sat, 31 Oct 2020 at 17:11, Greg Kroah-Hartman
<[email protected]> wrote:
>
> -------------------------
> Note, this is going to be the LAST 5.8.y kernel release. After this
> one, this branch is now end-of-life. Please move to the 5.9.y branch at
> this point in time.
> -------------------------
>
> This is the start of the stable review cycle for the 5.8.18 release.
> There are 70 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.18-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

NOTE:
LTP version upgrade to 20200930. Due to this change we have noticed few test
failures and fixes which are not related to kernel changes.

Summary
------------------------------------------------------------------------

kernel: 5.8.18-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.8.y
git commit: 46e8244bb94fd0c961f1df918b14b2d3a3970398
git describe: v5.8.17-71-g46e8244bb94f
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.8.y/build/v5.8.17-71-g46e8244bb94f

No regressions (compared to build v5.8.17)

No fixes (compared to build v5.8.17)

Fixes (compared to LTP 20200515)
These fixes are coming from LTP upgrade 20200930.

ltp-commands-tests:
* logrotate_sh

ltp-containers-tests:
* netns_netlink

ltp-controllers-tests:
* cpuset_hotplug

ltp-crypto-tests:
* af_alg02

ltp-cve-tests:
* cve-2017-17805
* cve-2018-1000199

ltp-syscalls-tests:
* clock_gettime03
* clone302
* copy_file_range02
* mknod07
* ptrace08
* syslog01
* syslog02
* syslog03
* syslog04
* syslog05
* syslog07
* syslog08
* syslog09
* syslog10

ltp-open-posix-tests:
* clocks_invaliddates

--
Linaro LKFT
https://lkft.linaro.org