2021-06-28 14:19:53

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 000/110] 5.12.14-rc1 review


This is the start of the stable review cycle for the 5.12.14 release.
There are 110 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
and the diffstat can be found below.

Thanks,
Sasha

-------------
Pseudo-Shortlog of commits:

Alper Gun (1):
KVM: SVM: Call SEV Guest Decommission if ASID binding fails

Andy Shevchenko (1):
pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()

Arnd Bergmann (1):
ARM: 9081/1: fix gcc-10 thumb2-kernel regression

Austin Kim (1):
net: ethtool: clear heap allocations for ethtool function

Bumyong Lee (1):
swiotlb: manipulate orig_addr when tlb_addr has offset

Christian König (3):
drm/nouveau: wait for moving fence after pinning v2
drm/radeon: wait for moving fence after pinning
drm/amdgpu: wait for moving fence after pinning

Christoph Hellwig (1):
scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)

Daniel Borkmann (1):
bpf, selftests: Adjust few selftest outcomes wrt unreachable code

Daniel Vetter (1):
Revert "drm: add a locked version of drm_is_current_master"

David Abdurachmanov (1):
riscv: dts: fu740: fix cache-controller interrupts

Desmond Cheong Zhi Xi (1):
drm: add a locked version of drm_is_current_master

Du Cheng (1):
cfg80211: call cfg80211_leave_ocb when switching away from OCB

Eric Dumazet (6):
inet: annotate data race in inet_send_prepare() and
inet_dgram_connect()
net: annotate data race in sock_error()
inet: annotate date races around sk->sk_txhash
net/packet: annotate data race in packet_sendmsg()
net/packet: annotate accesses to po->bind
net/packet: annotate accesses to po->ifindex

Eric Snowberg (4):
certs: Add EFI_CERT_X509_GUID support for dbx entries
certs: Move load_system_certificate_list to a common function
certs: Add ability to preload revocation certs
integrity: Load mokx variables into the blacklist keyring

Esben Haabendal (2):
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY

Fabien Dessenne (1):
pinctrl: stm32: fix the reported number of GPIO lines per bank

Fuad Tabba (1):
KVM: selftests: Fix kvm_check_cap() assertion

Gabriel Knezek (1):
gpiolib: cdev: zero padding during conversion to gpioline_info_changed

Guillaume Ranquet (3):
dmaengine: mediatek: free the proper desc in desc_free handler
dmaengine: mediatek: do not issue a new desc if one is still current
dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma

Haibo Chen (1):
spi: spi-nxp-fspi: move the register operation after the clock enable

Heikki Krogerus (1):
software node: Handle software node injection to an existing device
properly

Heiko Carstens (1):
s390/stack: fix possible register corruption with stack switch helper

Heiner Kallweit (1):
i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving
i801_access

Hugh Dickins (16):
mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
mm/thp: make is_huge_zero_pmd() safe and quicker
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: fix vma_address() if virtual address below file offset
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
mm: page_vma_mapped_walk(): use page for pvmw->page
mm: page_vma_mapped_walk(): settle PageHuge on entry
mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
mm: page_vma_mapped_walk(): crossing page table boundary
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm, futex: fix shared futex pgoff on shmem huge page

Jeff Layton (2):
ceph: must hold snap_rwsem when filling inode for async create
netfs: fix test for whether we can skip read when writing beyond EOF

Jiapeng Chong (1):
dmaengine: idxd: Fix missing error code in idxd_cdev_open()

Johan Hovold (1):
i2c: robotfuzz-osif: fix control-request directions

Johannes Berg (5):
mac80211: remove warning in ieee80211_get_sband()
mac80211_hwsim: drop pending frames on stop
mac80211: drop multicast fragments
mac80211: reset profile_periodicity/ema_ap
mac80211: handle various extensible elements correctly

Johannes Weiner (1):
psi: Fix psi state corruption when schedule() races with cgroup move

Jue Wang (1):
mm/thp: fix page_address_in_vma() on file THP tails

Juergen Gross (1):
xen/events: reset active flag for lateeoi events later

Kan Liang (1):
perf/x86: Track pmu in per-CPU cpu_hw_events

Kees Cook (4):
r8152: Avoid memcpy() over-reading of ETH_SS_STATS
sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS
r8169: Avoid memcpy() over-reading of ETH_SS_STATS
net: qed: Fix memcpy() overflow of qed_dcbx_params()

Khem Raj (1):
riscv32: Use medany C model for modules

Kristian Evensen (1):
qmi_wwan: Do not call netif_rx from rx_fixup

Laurent Pinchart (2):
dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig
dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits

Like Xu (1):
perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context

Maxime Ripard (2):
drm/vc4: hdmi: Move the HSM clock enable to runtime_pm
drm/vc4: hdmi: Make sure the controller is powered in detect

Mikel Rychliski (1):
PCI: Add AMD RS690 quirk to enable 64-bit DMA

Mimi Zohar (1):
module: limit enabling module.sig_enforce

Naoya Horiguchi (1):
mm/hwpoison: do not lock page again when me_huge_page() successfully
recovers

Neil Armstrong (1):
mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk

Nicholas Piggin (1):
KVM: do not allow mapping valid but non-reference-counted pages

Pavel Skripkin (2):
net: caif: fix memory leak in ldisc_open
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group

Peter Zijlstra (5):
x86/entry: Fix noinstr fail in __do_fast_syscall_32()
x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()
x86/xen: Fix noinstr fail in exc_xen_unknown_trap()
locking/lockdep: Improve noinstr vs errors
recordmcount: Correct st_shndx handling

Petr Mladek (2):
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with
kthread_cancel_delayed_work_sync()

Praneeth Bajjuri (1):
net: phy: dp83867: perform soft reset and retain established link

Rafael J. Wysocki (1):
Revert "PCI: PM: Do not read power state in pci_enable_device_flags()"

Sasha Levin (1):
Linux 5.12.14-rc1

Sven Schnelle (3):
s390/topology: clear thread/group maps for offline cpus
s390: fix system call restart with multiple signals
s390: clear pt_regs::flags on irq entry

Thomas Gleixner (3):
perf/x86/intel/lbr: Zero the xstate buffer on allocation
x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
x86/fpu: Make init_fpstate correct with optimized XSAVE

Tony Luck (1):
mm/memory-failure: use a mutex to avoid memory_failure() races

Xu Yu (1):
mm, thp: use head page in __migration_entry_wait()

Yang Shi (1):
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split

Yifan Zhang (2):
Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."
Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover
full doorbell."

Yu Kuai (2):
dmaengine: zynqmp_dma: Fix PM reference leak in
zynqmp_dma_alloc_chan_resourc()
dmaengine: stm32-mdma: fix PM reference leak in
stm32_mdma_alloc_chan_resourc()

Zhen Lei (1):
drm/kmb: Fix error return code in kmb_hw_init()

Zheng Yongjun (2):
net: ipv4: Remove unneed BUG() function
ping: Check return value of function 'ping_queue_rcv_skb'

Zou Wei (1):
dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe()

Makefile | 4 +-
arch/arm/kernel/setup.c | 16 +-
arch/riscv/Makefile | 2 +-
arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 2 +-
arch/s390/include/asm/stacktrace.h | 18 +-
arch/s390/kernel/entry.S | 1 +
arch/s390/kernel/signal.c | 1 -
arch/s390/kernel/topology.c | 12 +-
arch/x86/entry/common.c | 5 +-
arch/x86/events/core.c | 23 ++-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/ds.c | 4 +-
arch/x86/events/intel/lbr.c | 36 ++--
arch/x86/events/perf_event.h | 10 +-
arch/x86/include/asm/fpu/internal.h | 30 +---
arch/x86/kernel/fpu/signal.c | 26 +--
arch/x86/kernel/fpu/xstate.c | 41 ++++-
arch/x86/kvm/svm/sev.c | 32 ++--
arch/x86/pci/fixup.c | 44 +++++
arch/x86/xen/enlighten_pv.c | 2 +
certs/Kconfig | 17 ++
certs/Makefile | 21 ++-
certs/blacklist.c | 64 +++++++
certs/blacklist.h | 2 +
certs/common.c | 57 +++++++
certs/common.h | 9 +
certs/revocation_certificates.S | 21 +++
certs/system_keyring.c | 55 +-----
drivers/base/swnode.c | 16 +-
drivers/dma/Kconfig | 1 +
drivers/dma/idxd/cdev.c | 1 +
drivers/dma/mediatek/mtk-uart-apdma.c | 27 +--
drivers/dma/sh/rcar-dmac.c | 2 +-
drivers/dma/stm32-mdma.c | 4 +-
drivers/dma/xilinx/xilinx_dpdma.c | 7 +-
drivers/dma/xilinx/zynqmp_dma.c | 2 +-
drivers/gpio/gpiolib-cdev.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +-
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +-
drivers/gpu/drm/kmb/kmb_drv.c | 1 +
drivers/gpu/drm/nouveau/nouveau_prime.c | 17 +-
drivers/gpu/drm/radeon/radeon_prime.c | 16 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 44 +++--
drivers/i2c/busses/i2c-i801.c | 3 +
drivers/i2c/busses/i2c-robotfuzz-osif.c | 4 +-
drivers/mmc/host/meson-gx-mmc.c | 50 +++++-
drivers/net/caif/caif_serial.c | 1 +
drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 4 +-
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
drivers/net/ethernet/renesas/sh_eth.c | 2 +-
drivers/net/ethernet/xilinx/ll_temac_main.c | 19 ++-
drivers/net/phy/dp83867.c | 6 +-
drivers/net/usb/qmi_wwan.c | 2 +-
drivers/net/usb/r8152.c | 2 +-
drivers/net/wireless/mac80211_hwsim.c | 5 +
drivers/pci/pci.c | 16 +-
drivers/pinctrl/pinctrl-microchip-sgpio.c | 4 +-
drivers/pinctrl/stm32/pinctrl-stm32.c | 9 +-
drivers/scsi/sd.c | 22 ++-
drivers/spi/spi-nxp-fspi.c | 11 +-
drivers/xen/events/events_base.c | 11 +-
fs/ceph/addr.c | 54 ++++--
fs/ceph/file.c | 3 +
fs/ceph/inode.c | 2 +
fs/nilfs2/sysfs.c | 1 +
include/keys/system_keyring.h | 15 ++
include/linux/debug_locks.h | 2 +
include/linux/huge_mm.h | 8 +-
include/linux/hugetlb.h | 16 --
include/linux/mm.h | 3 +
include/linux/pagemap.h | 13 +-
include/linux/rmap.h | 1 +
include/net/sock.h | 17 +-
kernel/dma/swiotlb.c | 8 +-
kernel/futex.c | 3 +-
kernel/kthread.c | 77 ++++++---
kernel/locking/lockdep.c | 4 +-
kernel/module.c | 14 +-
kernel/sched/psi.c | 36 ++--
lib/debug_locks.c | 2 +-
mm/huge_memory.c | 56 +++---
mm/hugetlb.c | 5 +-
mm/internal.h | 53 ++++--
mm/memory-failure.c | 80 ++++++---
mm/memory.c | 41 +++++
mm/migrate.c | 1 +
mm/page_vma_mapped.c | 160 +++++++++++-------
mm/pgtable-generic.c | 5 +-
mm/rmap.c | 39 +++--
mm/truncate.c | 43 +++--
net/ethtool/ioctl.c | 10 +-
net/ipv4/af_inet.c | 4 +-
net/ipv4/devinet.c | 2 +-
net/ipv4/ping.c | 12 +-
net/ipv6/addrconf.c | 2 +-
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/mlme.c | 8 +
net/mac80211/rx.c | 9 +-
net/mac80211/util.c | 22 +--
net/packet/af_packet.c | 41 +++--
net/wireless/util.c | 3 +
scripts/Makefile | 1 +
scripts/recordmcount.h | 15 +-
.../platform_certs/keyring_handler.c | 11 ++
security/integrity/platform_certs/load_uefi.c | 20 ++-
tools/testing/selftests/bpf/test_verifier.c | 2 +-
tools/testing/selftests/bpf/verifier/and.c | 2 +
tools/testing/selftests/bpf/verifier/bounds.c | 14 ++
.../selftests/bpf/verifier/dead_code.c | 2 +
tools/testing/selftests/bpf/verifier/jmp32.c | 22 +++
tools/testing/selftests/bpf/verifier/jset.c | 10 +-
tools/testing/selftests/bpf/verifier/unpriv.c | 2 +
.../selftests/bpf/verifier/value_ptr_arith.c | 7 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
virt/kvm/kvm_main.c | 19 ++-
116 files changed, 1345 insertions(+), 556 deletions(-)
create mode 100644 certs/common.c
create mode 100644 certs/common.h
create mode 100644 certs/revocation_certificates.S

--
2.30.2


2021-06-28 14:20:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 001/110] module: limit enabling module.sig_enforce

From: Mimi Zohar <[email protected]>

[ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ]

Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.

This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.

Fixes: fda784e50aac ("module: export module signature enforcement status")
Reported-by: Nayna Jain <[email protected]>
Tested-by: Mimi Zohar <[email protected]>
Tested-by: Jessica Yu <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Jessica Yu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/module.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 30479355ab85..260d6f3f6d68 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -266,9 +266,18 @@ static void module_assert_mutex_or_preempt(void)
#endif
}

+#ifdef CONFIG_MODULE_SIG
static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
module_param(sig_enforce, bool_enable_only, 0644);

+void set_module_sig_enforced(void)
+{
+ sig_enforce = true;
+}
+#else
+#define sig_enforce false
+#endif
+
/*
* Export sig_enforce kernel cmdline parameter to allow other subsystems rely
* on that instead of directly to CONFIG_MODULE_SIG_FORCE config.
@@ -279,11 +288,6 @@ bool is_module_sig_enforced(void)
}
EXPORT_SYMBOL(is_module_sig_enforced);

-void set_module_sig_enforced(void)
-{
- sig_enforce = true;
-}
-
/* Block module loading/unloading? */
int modules_disabled = 0;
core_param(nomodule, modules_disabled, bint, 0);
--
2.30.2

2021-06-28 14:20:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 004/110] drm: add a locked version of drm_is_current_master

From: Desmond Cheong Zhi Xi <[email protected]>

commit 1815d9c86e3090477fbde066ff314a7e9721ee0f upstream.

While checking the master status of the DRM file in
drm_is_current_master(), the device's master mutex should be
held. Without the mutex, the pointer fpriv->master may be freed
concurrently by another process calling drm_setmaster_ioctl(). This
could lead to use-after-free errors when the pointer is subsequently
dereferenced in drm_lease_owner().

The callers of drm_is_current_master() from drm_auth.c hold the
device's master mutex, but external callers do not. Hence, we implement
drm_is_current_master_locked() to be used within drm_auth.c, and
modify drm_is_current_master() to grab the device's master mutex
before checking the master status.

Reported-by: Daniel Vetter <[email protected]>
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/drm_auth.c | 51 ++++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 232abbba3686..86d4b72e95cb 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -61,6 +61,35 @@
* trusted clients.
*/

+static bool drm_is_current_master_locked(struct drm_file *fpriv)
+{
+ lockdep_assert_held_once(&fpriv->master->dev->master_mutex);
+
+ return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
+}
+
+/**
+ * drm_is_current_master - checks whether @priv is the current master
+ * @fpriv: DRM file private
+ *
+ * Checks whether @fpriv is current master on its device. This decides whether a
+ * client is allowed to run DRM_MASTER IOCTLs.
+ *
+ * Most of the modern IOCTL which require DRM_MASTER are for kernel modesetting
+ * - the current master is assumed to own the non-shareable display hardware.
+ */
+bool drm_is_current_master(struct drm_file *fpriv)
+{
+ bool ret;
+
+ mutex_lock(&fpriv->master->dev->master_mutex);
+ ret = drm_is_current_master_locked(fpriv);
+ mutex_unlock(&fpriv->master->dev->master_mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL(drm_is_current_master);
+
int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
{
struct drm_auth *auth = data;
@@ -223,7 +252,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,
if (ret)
goto out_unlock;

- if (drm_is_current_master(file_priv))
+ if (drm_is_current_master_locked(file_priv))
goto out_unlock;

if (dev->master) {
@@ -272,7 +301,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
if (ret)
goto out_unlock;

- if (!drm_is_current_master(file_priv)) {
+ if (!drm_is_current_master_locked(file_priv)) {
ret = -EINVAL;
goto out_unlock;
}
@@ -321,7 +350,7 @@ void drm_master_release(struct drm_file *file_priv)
if (file_priv->magic)
idr_remove(&file_priv->master->magic_map, file_priv->magic);

- if (!drm_is_current_master(file_priv))
+ if (!drm_is_current_master_locked(file_priv))
goto out;

drm_legacy_lock_master_cleanup(dev, master);
@@ -342,22 +371,6 @@ out:
mutex_unlock(&dev->master_mutex);
}

-/**
- * drm_is_current_master - checks whether @priv is the current master
- * @fpriv: DRM file private
- *
- * Checks whether @fpriv is current master on its device. This decides whether a
- * client is allowed to run DRM_MASTER IOCTLs.
- *
- * Most of the modern IOCTL which require DRM_MASTER are for kernel modesetting
- * - the current master is assumed to own the non-shareable display hardware.
- */
-bool drm_is_current_master(struct drm_file *fpriv)
-{
- return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
-}
-EXPORT_SYMBOL(drm_is_current_master);
-
/**
* drm_master_get - reference a master pointer
* @master: &struct drm_master
--
2.30.2

2021-06-28 14:20:20

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 003/110] Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell."

From: Yifan Zhang <[email protected]>

commit baacf52a473b24e10322b67757ddb92ab8d86717 upstream.

This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3.

Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.

Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 72d23651501d..2342c5d216f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -6769,12 +6769,8 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring)
if (ring->use_doorbell) {
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
- /* If GC has entered CGPG, ringing doorbell > first page doesn't
- * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround
- * this issue.
- */
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
- (adev->doorbell.size - 4));
+ (adev->doorbell_index.userqueue_end * 2) << 2);
}

WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
--
2.30.2

2021-06-28 14:20:25

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 007/110] drm/amdgpu: wait for moving fence after pinning

From: Christian König <[email protected]>

commit 8ddf5b9bb479570a3825d70fecfb9399bc15700c upstream.

We actually need to wait for the moving fence after pinning
the BO to make sure that the pin is completed.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
References: https://lore.kernel.org/dri-devel/[email protected]/
CC: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 47e0b48dc26f..1c4623d25a62 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -214,9 +214,21 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach)
{
struct drm_gem_object *obj = attach->dmabuf->priv;
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+ int r;

/* pin buffer into GTT */
- return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
+ r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
+ if (r)
+ return r;
+
+ if (bo->tbo.moving) {
+ r = dma_fence_wait(bo->tbo.moving, true);
+ if (r) {
+ amdgpu_bo_unpin(bo);
+ return r;
+ }
+ }
+ return 0;
}

/**
--
2.30.2

2021-06-28 14:20:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 008/110] ARM: 9081/1: fix gcc-10 thumb2-kernel regression

From: Arnd Bergmann <[email protected]>

commit dad7b9896a5dbac5da8275d5a6147c65c81fb5f2 upstream.

When building the kernel wtih gcc-10 or higher using the
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y flag, the compiler picks a slightly
different set of registers for the inline assembly in cpu_init() that
subsequently results in a corrupt kernel stack as well as remaining in
FIQ mode. If a banked register is used for the last argument, the wrong
version of that register gets loaded into CPSR_c. When building in Arm
mode, the arguments are passed as immediate values and the bug cannot
happen.

This got introduced when Daniel reworked the FIQ handling and was
technically always broken, but happened to work with both clang and gcc
before gcc-10 as long as they picked one of the lower registers.
This is probably an indication that still very few people build the
kernel in Thumb2 mode.

Marek pointed out the problem on IRC, Arnd narrowed it down to this
inline assembly and Russell pinpointed the exact bug.

Change the constraints to force the final mode switch to use a non-banked
register for the argument to ensure that the correct constant gets loaded.
Another alternative would be to always use registers for the constant
arguments to avoid the #ifdef that has now become more complex.

Cc: <[email protected]> # v3.18+
Cc: Daniel Thompson <[email protected]>
Reported-by: Marek Vasut <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Fixes: c0e7f7ee717e ("ARM: 8150/3: fiq: Replace default FIQ handler")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/kernel/setup.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 1a5edf562e85..73ca7797b92f 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -545,9 +545,11 @@ void notrace cpu_init(void)
* In Thumb-2, msr with an immediate value is not allowed.
*/
#ifdef CONFIG_THUMB2_KERNEL
-#define PLC "r"
+#define PLC_l "l"
+#define PLC_r "r"
#else
-#define PLC "I"
+#define PLC_l "I"
+#define PLC_r "I"
#endif

/*
@@ -569,15 +571,15 @@ void notrace cpu_init(void)
"msr cpsr_c, %9"
:
: "r" (stk),
- PLC (PSR_F_BIT | PSR_I_BIT | IRQ_MODE),
+ PLC_r (PSR_F_BIT | PSR_I_BIT | IRQ_MODE),
"I" (offsetof(struct stack, irq[0])),
- PLC (PSR_F_BIT | PSR_I_BIT | ABT_MODE),
+ PLC_r (PSR_F_BIT | PSR_I_BIT | ABT_MODE),
"I" (offsetof(struct stack, abt[0])),
- PLC (PSR_F_BIT | PSR_I_BIT | UND_MODE),
+ PLC_r (PSR_F_BIT | PSR_I_BIT | UND_MODE),
"I" (offsetof(struct stack, und[0])),
- PLC (PSR_F_BIT | PSR_I_BIT | FIQ_MODE),
+ PLC_r (PSR_F_BIT | PSR_I_BIT | FIQ_MODE),
"I" (offsetof(struct stack, fiq[0])),
- PLC (PSR_F_BIT | PSR_I_BIT | SVC_MODE)
+ PLC_l (PSR_F_BIT | PSR_I_BIT | SVC_MODE)
: "r14");
#endif
}
--
2.30.2

2021-06-28 14:20:30

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 005/110] drm/nouveau: wait for moving fence after pinning v2

From: Christian König <[email protected]>

commit 17b11f71795abdce46f62a808f906857e525cea8 upstream.

We actually need to wait for the moving fence after pinning
the BO to make sure that the pin is completed.

v2: grab the lock while waiting

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
References: https://lore.kernel.org/dri-devel/[email protected]/
CC: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_prime.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
index 347488685f74..60019d0532fc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -93,7 +93,22 @@ int nouveau_gem_prime_pin(struct drm_gem_object *obj)
if (ret)
return -EINVAL;

- return 0;
+ ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
+ if (ret)
+ goto error;
+
+ if (nvbo->bo.moving)
+ ret = dma_fence_wait(nvbo->bo.moving, true);
+
+ ttm_bo_unreserve(&nvbo->bo);
+ if (ret)
+ goto error;
+
+ return ret;
+
+error:
+ nouveau_bo_unpin(nvbo);
+ return ret;
}

void nouveau_gem_prime_unpin(struct drm_gem_object *obj)
--
2.30.2

2021-06-28 14:20:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 017/110] x86/xen: Fix noinstr fail in exc_xen_unknown_trap()

From: Peter Zijlstra <[email protected]>

[ Upstream commit 4c9c26f1e67648f41f28f8c997c5c9467a3dbbe4 ]

Fix:

vmlinux.o: warning: objtool: exc_xen_unknown_trap()+0x7: call to printk() leaves .noinstr.text section

Fixes: 2e92493637a0 ("x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/xen/enlighten_pv.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 8183ddb3700c..64db5852432e 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -592,8 +592,10 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
DEFINE_IDTENTRY_RAW(exc_xen_unknown_trap)
{
/* This should never happen and there is no way to handle it. */
+ instrumentation_begin();
pr_err("Unknown trap in Xen PV mode.");
BUG();
+ instrumentation_end();
}

#ifdef CONFIG_X86_MCE
--
2.30.2

2021-06-28 14:20:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 011/110] spi: spi-nxp-fspi: move the register operation after the clock enable

From: Haibo Chen <[email protected]>

[ Upstream commit f422316c8e9d3c4aff3c56549dfb44a677d02f14 ]

Move the register operation after the clock enable, otherwise system
will stuck when this driver probe.

Fixes: 71d80563b076 ("spi: spi-nxp-fspi: fix fspi panic by unexpected interrupts")
Signed-off-by: Haibo Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/spi/spi-nxp-fspi.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/spi/spi-nxp-fspi.c b/drivers/spi/spi-nxp-fspi.c
index ab9035662717..bcc0b5a3a459 100644
--- a/drivers/spi/spi-nxp-fspi.c
+++ b/drivers/spi/spi-nxp-fspi.c
@@ -1033,12 +1033,6 @@ static int nxp_fspi_probe(struct platform_device *pdev)
goto err_put_ctrl;
}

- /* Clear potential interrupts */
- reg = fspi_readl(f, f->iobase + FSPI_INTR);
- if (reg)
- fspi_writel(f, reg, f->iobase + FSPI_INTR);
-
-
/* find the resources - controller memory mapped space */
if (is_acpi_node(f->dev->fwnode))
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
@@ -1076,6 +1070,11 @@ static int nxp_fspi_probe(struct platform_device *pdev)
}
}

+ /* Clear potential interrupts */
+ reg = fspi_readl(f, f->iobase + FSPI_INTR);
+ if (reg)
+ fspi_writel(f, reg, f->iobase + FSPI_INTR);
+
/* find the irq */
ret = platform_get_irq(pdev, 0);
if (ret < 0)
--
2.30.2

2021-06-28 14:20:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 009/110] mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk

From: Neil Armstrong <[email protected]>

commit 103a5348c22c3fca8b96c735a9e353b8a0801842 upstream.

It has been reported that usage of memcpy() to/from an iomem mapping is invalid,
and a recent arm64 memcpy update [1] triggers a memory abort when dram-access-quirk
is used on the G12A/G12B platforms.

This adds a local sg_copy_to_buffer which makes usage of io versions of memcpy
when dram-access-quirk is enabled.

[1] 285133040e6c ("arm64: Import latest memcpy()/memmove() implementation")

Fixes: acdc8e71d9bb ("mmc: meson-gx: add dram-access-quirk")
Reported-by: Marek Szyprowski <[email protected]>
Suggested-by: Mark Rutland <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/mmc/host/meson-gx-mmc.c | 50 +++++++++++++++++++++++++++++----
1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
index 016a6106151a..3f28eb4d17fe 100644
--- a/drivers/mmc/host/meson-gx-mmc.c
+++ b/drivers/mmc/host/meson-gx-mmc.c
@@ -165,6 +165,7 @@ struct meson_host {

unsigned int bounce_buf_size;
void *bounce_buf;
+ void __iomem *bounce_iomem_buf;
dma_addr_t bounce_dma_addr;
struct sd_emmc_desc *descs;
dma_addr_t descs_dma_addr;
@@ -745,6 +746,47 @@ static void meson_mmc_desc_chain_transfer(struct mmc_host *mmc, u32 cmd_cfg)
writel(start, host->regs + SD_EMMC_START);
}

+/* local sg copy to buffer version with _to/fromio usage for dram_access_quirk */
+static void meson_mmc_copy_buffer(struct meson_host *host, struct mmc_data *data,
+ size_t buflen, bool to_buffer)
+{
+ unsigned int sg_flags = SG_MITER_ATOMIC;
+ struct scatterlist *sgl = data->sg;
+ unsigned int nents = data->sg_len;
+ struct sg_mapping_iter miter;
+ unsigned int offset = 0;
+
+ if (to_buffer)
+ sg_flags |= SG_MITER_FROM_SG;
+ else
+ sg_flags |= SG_MITER_TO_SG;
+
+ sg_miter_start(&miter, sgl, nents, sg_flags);
+
+ while ((offset < buflen) && sg_miter_next(&miter)) {
+ unsigned int len;
+
+ len = min(miter.length, buflen - offset);
+
+ /* When dram_access_quirk, the bounce buffer is a iomem mapping */
+ if (host->dram_access_quirk) {
+ if (to_buffer)
+ memcpy_toio(host->bounce_iomem_buf + offset, miter.addr, len);
+ else
+ memcpy_fromio(miter.addr, host->bounce_iomem_buf + offset, len);
+ } else {
+ if (to_buffer)
+ memcpy(host->bounce_buf + offset, miter.addr, len);
+ else
+ memcpy(miter.addr, host->bounce_buf + offset, len);
+ }
+
+ offset += len;
+ }
+
+ sg_miter_stop(&miter);
+}
+
static void meson_mmc_start_cmd(struct mmc_host *mmc, struct mmc_command *cmd)
{
struct meson_host *host = mmc_priv(mmc);
@@ -788,8 +830,7 @@ static void meson_mmc_start_cmd(struct mmc_host *mmc, struct mmc_command *cmd)
if (data->flags & MMC_DATA_WRITE) {
cmd_cfg |= CMD_CFG_DATA_WR;
WARN_ON(xfer_bytes > host->bounce_buf_size);
- sg_copy_to_buffer(data->sg, data->sg_len,
- host->bounce_buf, xfer_bytes);
+ meson_mmc_copy_buffer(host, data, xfer_bytes, true);
dma_wmb();
}

@@ -958,8 +999,7 @@ static irqreturn_t meson_mmc_irq_thread(int irq, void *dev_id)
if (meson_mmc_bounce_buf_read(data)) {
xfer_bytes = data->blksz * data->blocks;
WARN_ON(xfer_bytes > host->bounce_buf_size);
- sg_copy_from_buffer(data->sg, data->sg_len,
- host->bounce_buf, xfer_bytes);
+ meson_mmc_copy_buffer(host, data, xfer_bytes, false);
}

next_cmd = meson_mmc_get_next_command(cmd);
@@ -1179,7 +1219,7 @@ static int meson_mmc_probe(struct platform_device *pdev)
* instead of the DDR memory
*/
host->bounce_buf_size = SD_EMMC_SRAM_DATA_BUF_LEN;
- host->bounce_buf = host->regs + SD_EMMC_SRAM_DATA_BUF_OFF;
+ host->bounce_iomem_buf = host->regs + SD_EMMC_SRAM_DATA_BUF_OFF;
host->bounce_dma_addr = res->start + SD_EMMC_SRAM_DATA_BUF_OFF;
} else {
/* data bounce buffer */
--
2.30.2

2021-06-28 14:20:50

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 018/110] locking/lockdep: Improve noinstr vs errors

From: Peter Zijlstra <[email protected]>

[ Upstream commit 49faa77759b211fff344898edc23bb780707fff5 ]

Better handle the failure paths.

vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section
vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section

debug_locks_off+0x19/0x40:
instrument_atomic_write at include/linux/instrumented.h:86
(inlined by) __debug_locks_off at include/linux/debug_locks.h:17
(inlined by) debug_locks_off at lib/debug_locks.c:41

Fixes: 6eebad1ad303 ("lockdep: __always_inline more for noinstr")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/debug_locks.h | 2 ++
kernel/locking/lockdep.c | 4 +++-
lib/debug_locks.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 2915f56ad421..edb5c186b0b7 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -27,8 +27,10 @@ extern int debug_locks_off(void);
int __ret = 0; \
\
if (!oops_in_progress && unlikely(c)) { \
+ instrumentation_begin(); \
if (debug_locks_off() && !debug_locks_silent) \
WARN(1, "DEBUG_LOCKS_WARN_ON(%s)", #c); \
+ instrumentation_end(); \
__ret = 1; \
} \
__ret; \
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index f39c383c7180..5bf6b1659215 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -842,7 +842,7 @@ static int count_matching_names(struct lock_class *new_class)
}

/* used from NMI context -- must be lockless */
-static __always_inline struct lock_class *
+static noinstr struct lock_class *
look_up_lock_class(const struct lockdep_map *lock, unsigned int subclass)
{
struct lockdep_subclass_key *key;
@@ -850,12 +850,14 @@ look_up_lock_class(const struct lockdep_map *lock, unsigned int subclass)
struct lock_class *class;

if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
+ instrumentation_begin();
debug_locks_off();
printk(KERN_ERR
"BUG: looking up invalid subclass: %u\n", subclass);
printk(KERN_ERR
"turning off the locking correctness validator.\n");
dump_stack();
+ instrumentation_end();
return NULL;
}

diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 06d3135bd184..a75ee30b77cb 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -36,7 +36,7 @@ EXPORT_SYMBOL_GPL(debug_locks_silent);
/*
* Generic 'turn off all lock debugging' function:
*/
-noinstr int debug_locks_off(void)
+int debug_locks_off(void)
{
if (debug_locks && __debug_locks_off()) {
if (!debug_locks_silent) {
--
2.30.2

2021-06-28 14:20:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 019/110] drm/kmb: Fix error return code in kmb_hw_init()

From: Zhen Lei <[email protected]>

[ Upstream commit 6fd8f323b3e4e5290d02174559308669507c00dd ]

When the call to platform_get_irq() to obtain the IRQ of the lcd fails, the
returned error code should be propagated. However, we currently do not
explicitly assign this error code to 'ret'. As a result, 0 was incorrectly
returned.

Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Signed-off-by: Anitha Chrisanthus <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/kmb/kmb_drv.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/kmb/kmb_drv.c b/drivers/gpu/drm/kmb/kmb_drv.c
index f64e06e1067d..96ea1a2c11dd 100644
--- a/drivers/gpu/drm/kmb/kmb_drv.c
+++ b/drivers/gpu/drm/kmb/kmb_drv.c
@@ -137,6 +137,7 @@ static int kmb_hw_init(struct drm_device *drm, unsigned long flags)
/* Allocate LCD interrupt resources */
irq_lcd = platform_get_irq(pdev, 0);
if (irq_lcd < 0) {
+ ret = irq_lcd;
drm_err(&kmb->drm, "irq_lcd not found");
goto setup_fail;
}
--
2.30.2

2021-06-28 14:20:58

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 016/110] x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()

From: Peter Zijlstra <[email protected]>

[ Upstream commit 84e60065df9ef03759115a7e48c04bbc0d292165 ]

Fix:

vmlinux.o: warning: objtool: xen_pv_evtchn_do_upcall()+0x23: call to irq_enter_rcu() leaves .noinstr.text section

Fixes: 359f01d1816f ("x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/entry/common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index cbe19c87e6be..8767dc53b569 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -266,15 +266,16 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
irqentry_state_t state = irqentry_enter(regs);
bool inhcall;

+ instrumentation_begin();
run_sysvec_on_irqstack_cond(__xen_pv_evtchn_do_upcall, regs);

inhcall = get_and_clear_inhcall();
if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
- instrumentation_begin();
irqentry_exit_cond_resched();
instrumentation_end();
restore_inhcall(inhcall);
} else {
+ instrumentation_end();
irqentry_exit(regs, state);
}
}
--
2.30.2

2021-06-28 14:21:03

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 010/110] psi: Fix psi state corruption when schedule() races with cgroup move

From: Johannes Weiner <[email protected]>

commit d583d360a620e6229422b3455d0be082b8255f5e upstream.

4117cebf1a9f ("psi: Optimize task switch inside shared cgroups")
introduced a race condition that corrupts internal psi state. This
manifests as kernel warnings, sometimes followed by bogusly high IO
pressure:

psi: task underflow! cpu=1 t=2 tasks=[0 0 0 0] clear=c set=0
(schedule() decreasing RUNNING and ONCPU, both of which are 0)

psi: incosistent task state! task=2412744:systemd cpu=17 psi_flags=e clear=3 set=0
(cgroup_move_task() clearing MEMSTALL and IOWAIT, but task is MEMSTALL | RUNNING | ONCPU)

What the offending commit does is batch the two psi callbacks in
schedule() to reduce the number of cgroup tree updates. When prev is
deactivated and removed from the runqueue, nothing is done in psi at
first; when the task switch completes, TSK_RUNNING and TSK_IOWAIT are
updated along with TSK_ONCPU.

However, the deactivation and the task switch inside schedule() aren't
atomic: pick_next_task() may drop the rq lock for load balancing. When
this happens, cgroup_move_task() can run after the task has been
physically dequeued, but the psi updates are still pending. Since it
looks at the task's scheduler state, it doesn't move everything to the
new cgroup that the task switch that follows is about to clear from
it. cgroup_move_task() will leak the TSK_RUNNING count in the old
cgroup, and psi_sched_switch() will underflow it in the new cgroup.

A similar thing can happen for iowait. TSK_IOWAIT is usually set when
a p->in_iowait task is dequeued, but again this update is deferred to
the switch. cgroup_move_task() can see an unqueued p->in_iowait task
and move a non-existent TSK_IOWAIT. This results in the inconsistent
task state warning, as well as a counter underflow that will result in
permanent IO ghost pressure being reported.

Fix this bug by making cgroup_move_task() use task->psi_flags instead
of looking at the potentially mismatching scheduler state.

[ We used the scheduler state historically in order to not rely on
task->psi_flags for anything but debugging. But that ship has sailed
anyway, and this is simpler and more robust.

We previously already batched TSK_ONCPU clearing with the
TSK_RUNNING update inside the deactivation call from schedule(). But
that ordering was safe and didn't result in TSK_ONCPU corruption:
unlike most places in the scheduler, cgroup_move_task() only checked
task_current() and handled TSK_ONCPU if the task was still queued. ]

Fixes: 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups")
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Cc: Oleksandr Natalenko <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/sched/psi.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 651218ded981..ef37acd28e4a 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -965,7 +965,7 @@ void psi_cgroup_free(struct cgroup *cgroup)
*/
void cgroup_move_task(struct task_struct *task, struct css_set *to)
{
- unsigned int task_flags = 0;
+ unsigned int task_flags;
struct rq_flags rf;
struct rq *rq;

@@ -980,15 +980,31 @@ void cgroup_move_task(struct task_struct *task, struct css_set *to)

rq = task_rq_lock(task, &rf);

- if (task_on_rq_queued(task)) {
- task_flags = TSK_RUNNING;
- if (task_current(rq, task))
- task_flags |= TSK_ONCPU;
- } else if (task->in_iowait)
- task_flags = TSK_IOWAIT;
-
- if (task->in_memstall)
- task_flags |= TSK_MEMSTALL;
+ /*
+ * We may race with schedule() dropping the rq lock between
+ * deactivating prev and switching to next. Because the psi
+ * updates from the deactivation are deferred to the switch
+ * callback to save cgroup tree updates, the task's scheduling
+ * state here is not coherent with its psi state:
+ *
+ * schedule() cgroup_move_task()
+ * rq_lock()
+ * deactivate_task()
+ * p->on_rq = 0
+ * psi_dequeue() // defers TSK_RUNNING & TSK_IOWAIT updates
+ * pick_next_task()
+ * rq_unlock()
+ * rq_lock()
+ * psi_task_change() // old cgroup
+ * task->cgroups = to
+ * psi_task_change() // new cgroup
+ * rq_unlock()
+ * rq_lock()
+ * psi_sched_switch() // does deferred updates in new cgroup
+ *
+ * Don't rely on the scheduling state. Use psi_flags instead.
+ */
+ task_flags = task->psi_flags;

if (task_flags)
psi_task_change(task, task_flags, 0);
--
2.30.2

2021-06-28 14:21:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 002/110] Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."

From: Yifan Zhang <[email protected]>

commit ee5468b9f1d3bf48082eed351dace14598e8ca39 upstream.

This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20.

Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.

Signed-off-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 1fdfb7783404..d2c020a91c0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3623,12 +3623,8 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring *ring)
if (ring->use_doorbell) {
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
- /* If GC has entered CGPG, ringing doorbell > first page doesn't
- * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround
- * this issue.
- */
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
- (adev->doorbell.size - 4));
+ (adev->doorbell_index.userqueue_end * 2) << 2);
}

WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
--
2.30.2

2021-06-28 14:21:09

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 015/110] x86/entry: Fix noinstr fail in __do_fast_syscall_32()

From: Peter Zijlstra <[email protected]>

[ Upstream commit 240001d4e3041832e8a2654adc3ccf1683132b92 ]

Fix:

vmlinux.o: warning: objtool: __do_fast_syscall_32()+0xf5: call to trace_hardirqs_off() leaves .noinstr.text section

Fixes: 5d5675df792f ("x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/entry/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 4efd39aacb9f..cbe19c87e6be 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -127,8 +127,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
/* User code screwed up. */
regs->ax = -EFAULT;

- instrumentation_end();
local_irq_disable();
+ instrumentation_end();
irqentry_exit_to_user_mode(regs);
return false;
}
--
2.30.2

2021-06-28 14:21:19

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 023/110] dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc()

From: Yu Kuai <[email protected]>

[ Upstream commit 83eb4868d325b86e18509d0874e911497667cb54 ]

pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/stm32-mdma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c
index 36ba8b43e78d..18cbd1e43c2e 100644
--- a/drivers/dma/stm32-mdma.c
+++ b/drivers/dma/stm32-mdma.c
@@ -1452,7 +1452,7 @@ static int stm32_mdma_alloc_chan_resources(struct dma_chan *c)
return -ENOMEM;
}

- ret = pm_runtime_get_sync(dmadev->ddev.dev);
+ ret = pm_runtime_resume_and_get(dmadev->ddev.dev);
if (ret < 0)
return ret;

@@ -1718,7 +1718,7 @@ static int stm32_mdma_pm_suspend(struct device *dev)
u32 ccr, id;
int ret;

- ret = pm_runtime_get_sync(dev);
+ ret = pm_runtime_resume_and_get(dev);
if (ret < 0)
return ret;

--
2.30.2

2021-06-28 14:21:22

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 006/110] drm/radeon: wait for moving fence after pinning

From: Christian König <[email protected]>

commit 4b41726aae563273bb4b4a9462ba51ce4d372f78 upstream.

We actually need to wait for the moving fence after pinning
the BO to make sure that the pin is completed.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
References: https://lore.kernel.org/dri-devel/[email protected]/
CC: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/radeon/radeon_prime.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_prime.c b/drivers/gpu/drm/radeon/radeon_prime.c
index 42a87948e28c..4a90807351e7 100644
--- a/drivers/gpu/drm/radeon/radeon_prime.c
+++ b/drivers/gpu/drm/radeon/radeon_prime.c
@@ -77,9 +77,19 @@ int radeon_gem_prime_pin(struct drm_gem_object *obj)

/* pin buffer into GTT */
ret = radeon_bo_pin(bo, RADEON_GEM_DOMAIN_GTT, NULL);
- if (likely(ret == 0))
- bo->prime_shared_count++;
-
+ if (unlikely(ret))
+ goto error;
+
+ if (bo->tbo.moving) {
+ ret = dma_fence_wait(bo->tbo.moving, false);
+ if (unlikely(ret)) {
+ radeon_bo_unpin(bo);
+ goto error;
+ }
+ }
+
+ bo->prime_shared_count++;
+error:
radeon_bo_unreserve(bo);
return ret;
}
--
2.30.2

2021-06-28 14:21:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 013/110] drm/vc4: hdmi: Move the HSM clock enable to runtime_pm

From: Maxime Ripard <[email protected]>

[ Upstream commit 411efa18e4b03840553ff58ad9b4621b82a30c04 ]

In order to access the HDMI controller, we need to make sure the HSM
clock is enabled. If we were to access it with the clock disabled, the
CPU would completely hang, resulting in an hard crash.

Since we have different code path that would require it, let's move that
clock enable / disable to runtime_pm that will take care of the
reference counting for us.

Fixes: 4f6e3d66ac52 ("drm/vc4: Add runtime PM support to the HDMI encoder driver")
Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Dave Stevenson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 40 +++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 1fda574579af..84e218365045 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -473,7 +473,6 @@ static void vc4_hdmi_encoder_post_crtc_powerdown(struct drm_encoder *encoder,
HDMI_READ(HDMI_VID_CTL) & ~VC4_HD_VID_CTL_ENABLE);

clk_disable_unprepare(vc4_hdmi->pixel_bvb_clock);
- clk_disable_unprepare(vc4_hdmi->hsm_clock);
clk_disable_unprepare(vc4_hdmi->pixel_clock);

ret = pm_runtime_put(&vc4_hdmi->pdev->dev);
@@ -784,13 +783,6 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder,
return;
}

- ret = clk_prepare_enable(vc4_hdmi->hsm_clock);
- if (ret) {
- DRM_ERROR("Failed to turn on HSM clock: %d\n", ret);
- clk_disable_unprepare(vc4_hdmi->pixel_clock);
- return;
- }
-
vc4_hdmi_cec_update_clk_div(vc4_hdmi);

/*
@@ -801,7 +793,6 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder,
(hsm_rate > VC4_HSM_MID_CLOCK ? 150000000 : 75000000));
if (ret) {
DRM_ERROR("Failed to set pixel bvb clock rate: %d\n", ret);
- clk_disable_unprepare(vc4_hdmi->hsm_clock);
clk_disable_unprepare(vc4_hdmi->pixel_clock);
return;
}
@@ -809,7 +800,6 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder,
ret = clk_prepare_enable(vc4_hdmi->pixel_bvb_clock);
if (ret) {
DRM_ERROR("Failed to turn on pixel bvb clock: %d\n", ret);
- clk_disable_unprepare(vc4_hdmi->hsm_clock);
clk_disable_unprepare(vc4_hdmi->pixel_clock);
return;
}
@@ -1929,6 +1919,29 @@ static int vc5_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi)
return 0;
}

+#ifdef CONFIG_PM
+static int vc4_hdmi_runtime_suspend(struct device *dev)
+{
+ struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev);
+
+ clk_disable_unprepare(vc4_hdmi->hsm_clock);
+
+ return 0;
+}
+
+static int vc4_hdmi_runtime_resume(struct device *dev)
+{
+ struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev);
+ int ret;
+
+ ret = clk_prepare_enable(vc4_hdmi->hsm_clock);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+#endif
+
static int vc4_hdmi_bind(struct device *dev, struct device *master, void *data)
{
const struct vc4_hdmi_variant *variant = of_device_get_match_data(dev);
@@ -2165,11 +2178,18 @@ static const struct of_device_id vc4_hdmi_dt_match[] = {
{}
};

+static const struct dev_pm_ops vc4_hdmi_pm_ops = {
+ SET_RUNTIME_PM_OPS(vc4_hdmi_runtime_suspend,
+ vc4_hdmi_runtime_resume,
+ NULL)
+};
+
struct platform_driver vc4_hdmi_driver = {
.probe = vc4_hdmi_dev_probe,
.remove = vc4_hdmi_dev_remove,
.driver = {
.name = "vc4_hdmi",
.of_match_table = vc4_hdmi_dt_match,
+ .pm = &vc4_hdmi_pm_ops,
},
};
--
2.30.2

2021-06-28 14:21:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 021/110] perf/x86/intel/lbr: Zero the xstate buffer on allocation

From: Thomas Gleixner <[email protected]>

[ Upstream commit 7f049fbdd57f6ea71dc741d903c19c73b2f70950 ]

XRSTORS requires a valid xstate buffer to work correctly. XSAVES does not
guarantee to write a fully valid buffer according to the SDM:

"XSAVES does not write to any parts of the XSAVE header other than the
XSTATE_BV and XCOMP_BV fields."

XRSTORS triggers a #GP:

"If bytes 63:16 of the XSAVE header are not all zero."

It's dubious at best how this can work at all when the buffer is not zeroed
before use.

Allocate the buffers with __GFP_ZERO to prevent XRSTORS failure.

Fixes: ce711ea3cab9 ("perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/events/intel/lbr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 22d0e40a1920..991715886246 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -730,7 +730,8 @@ void reserve_lbr_buffers(void)
if (!kmem_cache || cpuc->lbr_xsave)
continue;

- cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+ cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache,
+ GFP_KERNEL | __GFP_ZERO,
cpu_to_node(cpu));
}
}
--
2.30.2

2021-06-28 14:21:50

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 025/110] dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits

From: Laurent Pinchart <[email protected]>

[ Upstream commit 9f007e7b6643799e2a6538a5fe04f51c371c6657 ]

While the descriptor ID is stored in a 32-bit field in the hardware
descriptor, only 16 bits are used by the hardware and are reported
through the XILINX_DPDMA_CH_DESC_ID register. Failure to handle the
wrap-around results in a descriptor ID mismatch after 65536 frames. Fix
it.

Signed-off-by: Laurent Pinchart <[email protected]>
Tested-by: Jianqiang Chen <[email protected]>
Reviewed-by: Jianqiang Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/xilinx/xilinx_dpdma.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_dpdma.c b/drivers/dma/xilinx/xilinx_dpdma.c
index ff7dfb3fdeb4..6c709803203a 100644
--- a/drivers/dma/xilinx/xilinx_dpdma.c
+++ b/drivers/dma/xilinx/xilinx_dpdma.c
@@ -113,6 +113,7 @@
#define XILINX_DPDMA_CH_VDO 0x020
#define XILINX_DPDMA_CH_PYLD_SZ 0x024
#define XILINX_DPDMA_CH_DESC_ID 0x028
+#define XILINX_DPDMA_CH_DESC_ID_MASK GENMASK(15, 0)

/* DPDMA descriptor fields */
#define XILINX_DPDMA_DESC_CONTROL_PREEMBLE 0xa5
@@ -866,7 +867,8 @@ static void xilinx_dpdma_chan_queue_transfer(struct xilinx_dpdma_chan *chan)
* will be used, but it should be enough.
*/
list_for_each_entry(sw_desc, &desc->descriptors, node)
- sw_desc->hw.desc_id = desc->vdesc.tx.cookie;
+ sw_desc->hw.desc_id = desc->vdesc.tx.cookie
+ & XILINX_DPDMA_CH_DESC_ID_MASK;

sw_desc = list_first_entry(&desc->descriptors,
struct xilinx_dpdma_sw_desc, node);
@@ -1086,7 +1088,8 @@ static void xilinx_dpdma_chan_vsync_irq(struct xilinx_dpdma_chan *chan)
if (!chan->running || !pending)
goto out;

- desc_id = dpdma_read(chan->reg, XILINX_DPDMA_CH_DESC_ID);
+ desc_id = dpdma_read(chan->reg, XILINX_DPDMA_CH_DESC_ID)
+ & XILINX_DPDMA_CH_DESC_ID_MASK;

/* If the retrigger raced with vsync, retry at the next frame. */
sw_desc = list_first_entry(&pending->descriptors,
--
2.30.2

2021-06-28 14:21:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 014/110] drm/vc4: hdmi: Make sure the controller is powered in detect

From: Maxime Ripard <[email protected]>

[ Upstream commit 9984d6664ce9dcbbc713962539eaf7636ea246c2 ]

If the HPD GPIO is not available and drm_probe_ddc fails, we end up
reading the HDMI_HOTPLUG register, but the controller might be powered
off resulting in a CPU hang. Make sure we have the power domain and the
HSM clock powered during the detect cycle to prevent the hang from
happening.

Fixes: 4f6e3d66ac52 ("drm/vc4: Add runtime PM support to the HDMI encoder driver")
Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Dave Stevenson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 84e218365045..8106b5634fe1 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -159,6 +159,8 @@ vc4_hdmi_connector_detect(struct drm_connector *connector, bool force)
struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector);
bool connected = false;

+ WARN_ON(pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev));
+
if (vc4_hdmi->hpd_gpio) {
if (gpio_get_value_cansleep(vc4_hdmi->hpd_gpio) ^
vc4_hdmi->hpd_active_low)
@@ -180,10 +182,12 @@ vc4_hdmi_connector_detect(struct drm_connector *connector, bool force)
}
}

+ pm_runtime_put(&vc4_hdmi->pdev->dev);
return connector_status_connected;
}

cec_phys_addr_invalidate(vc4_hdmi->cec_adap);
+ pm_runtime_put(&vc4_hdmi->pdev->dev);
return connector_status_disconnected;
}

--
2.30.2

2021-06-28 14:21:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 024/110] dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig

From: Laurent Pinchart <[email protected]>

[ Upstream commit 32828b82fb875b06511918b139d3a3cd93d34262 ]

The driver depends on both OF and IOMEM support, express those
dependencies in Kconfig. This fixes a build failure on S390 reported by
the 0day bot.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Tested-by: Jianqiang Chen <[email protected]>
Reviewed-by: Jianqiang Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 03b1b0334947..c42b17b76640 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -690,6 +690,7 @@ config XILINX_ZYNQMP_DMA

config XILINX_ZYNQMP_DPDMA
tristate "Xilinx DPDMA Engine"
+ depends on HAS_IOMEM && OF
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.30.2

2021-06-28 14:22:03

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 012/110] Revert "PCI: PM: Do not read power state in pci_enable_device_flags()"

From: "Rafael J. Wysocki" <[email protected]>

[ Upstream commit 4d6035f9bf4ea12776322746a216e856dfe46698 ]

Revert commit 4514d991d992 ("PCI: PM: Do not read power state in
pci_enable_device_flags()") that is reported to cause PCI device
initialization issues on some systems.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213481
Link: https://lore.kernel.org/linux-acpi/[email protected]
Reported-by: Michael <[email protected]>
Reported-by: Salvatore Bonaccorso <[email protected]>
Fixes: 4514d991d992 ("PCI: PM: Do not read power state in pci_enable_device_flags()")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/pci/pci.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e4d4e399004b..16a17215f633 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1870,11 +1870,21 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
int err;
int i, bars = 0;

- if (atomic_inc_return(&dev->enable_cnt) > 1) {
- pci_update_current_state(dev, dev->current_state);
- return 0; /* already enabled */
+ /*
+ * Power state could be unknown at this point, either due to a fresh
+ * boot or a device removal call. So get the current power state
+ * so that things like MSI message writing will behave as expected
+ * (e.g. if the device really is in D0 at enable time).
+ */
+ if (dev->pm_cap) {
+ u16 pmcsr;
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+ dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
}

+ if (atomic_inc_return(&dev->enable_cnt) > 1)
+ return 0; /* already enabled */
+
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_enable_bridge(bridge);
--
2.30.2

2021-06-28 14:22:11

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 028/110] cfg80211: call cfg80211_leave_ocb when switching away from OCB

From: Du Cheng <[email protected]>

[ Upstream commit a64b6a25dd9f984ed05fade603a00e2eae787d2f ]

If the userland switches back-and-forth between NL80211_IFTYPE_OCB and
NL80211_IFTYPE_ADHOC via send_msg(NL80211_CMD_SET_INTERFACE), there is a
chance where the cleanup cfg80211_leave_ocb() is not called. This leads
to initialization of in-use memory (e.g. init u.ibss while in-use by
u.ocb) due to a shared struct/union within ieee80211_sub_if_data:

struct ieee80211_sub_if_data {
...
union {
struct ieee80211_if_ap ap;
struct ieee80211_if_vlan vlan;
struct ieee80211_if_managed mgd;
struct ieee80211_if_ibss ibss; // <- shares address
struct ieee80211_if_mesh mesh;
struct ieee80211_if_ocb ocb; // <- shares address
struct ieee80211_if_mntr mntr;
struct ieee80211_if_nan nan;
} u;
...
}

Therefore add handling of otype == NL80211_IFTYPE_OCB, during
cfg80211_change_iface() to perform cleanup when leaving OCB mode.

link to syzkaller bug:
https://syzkaller.appspot.com/bug?id=0612dbfa595bf4b9b680ff7b4948257b8e3732d5

Reported-by: [email protected]
Signed-off-by: Du Cheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/wireless/util.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index f342b6147675..726e7d2342bd 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1059,6 +1059,9 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
case NL80211_IFTYPE_MESH_POINT:
/* mesh should be handled? */
break;
+ case NL80211_IFTYPE_OCB:
+ cfg80211_leave_ocb(rdev, dev);
+ break;
default:
break;
}
--
2.30.2

2021-06-28 14:22:31

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 029/110] dmaengine: idxd: Fix missing error code in idxd_cdev_open()

From: Jiapeng Chong <[email protected]>

[ Upstream commit 99b18e88a1cf737ae924123d63b46d9a3d17b1af ]

The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'rc'.

Eliminate the follow smatch warning:

drivers/dma/idxd/cdev.c:113 idxd_cdev_open() warn: missing error code
'rc'.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Acked-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/1622628446-87909-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/idxd/cdev.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 1d8a3876b745..5ba8e8bc609f 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -110,6 +110,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
pasid = iommu_sva_get_pasid(sva);
if (pasid == IOMMU_PASID_INVALID) {
iommu_sva_unbind_device(sva);
+ rc = -EINVAL;
goto failed;
}

--
2.30.2

2021-06-28 14:22:35

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 031/110] dmaengine: mediatek: free the proper desc in desc_free handler

From: Guillaume Ranquet <[email protected]>

[ Upstream commit 0a2ff58f9f8f95526ecb0ccd7517fefceb96f661 ]

The desc_free handler assumed that the desc we want to free was always
the current one associated with the channel.

This is seldom the case and this is causing use after free crashes in
multiple places (tx/rx/terminate...).

BUG: KASAN: use-after-free in mtk_uart_apdma_rx_handler+0x120/0x304

Call trace:
dump_backtrace+0x0/0x1b0
show_stack+0x24/0x34
dump_stack+0xe0/0x150
print_address_description+0x8c/0x55c
__kasan_report+0x1b8/0x218
kasan_report+0x14/0x20
__asan_load4+0x98/0x9c
mtk_uart_apdma_rx_handler+0x120/0x304
mtk_uart_apdma_irq_handler+0x50/0x80
__handle_irq_event_percpu+0xe0/0x210
handle_irq_event+0x8c/0x184
handle_fasteoi_irq+0x1d8/0x3ac
__handle_domain_irq+0xb0/0x110
gic_handle_irq+0x50/0xb8
el0_irq_naked+0x60/0x6c

Allocated by task 3541:
__kasan_kmalloc+0xf0/0x1b0
kasan_kmalloc+0x10/0x1c
kmem_cache_alloc_trace+0x90/0x2dc
mtk_uart_apdma_prep_slave_sg+0x6c/0x1a0
mtk8250_dma_rx_complete+0x220/0x2e4
vchan_complete+0x290/0x340
tasklet_action_common+0x220/0x298
tasklet_action+0x28/0x34
__do_softirq+0x158/0x35c

Freed by task 3541:
__kasan_slab_free+0x154/0x224
kasan_slab_free+0x14/0x24
slab_free_freelist_hook+0xf8/0x15c
kfree+0xb4/0x278
mtk_uart_apdma_desc_free+0x34/0x44
vchan_complete+0x1bc/0x340
tasklet_action_common+0x220/0x298
tasklet_action+0x28/0x34
__do_softirq+0x158/0x35c

The buggy address belongs to the object at ffff000063606800
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 176 bytes inside of
256-byte region [ffff000063606800, ffff000063606900)
The buggy address belongs to the page:
page:fffffe00016d8180 refcount:1 mapcount:0 mapping:ffff00000302f600 index:0x0 compound_mapcount: 0
flags: 0xffff00000010200(slab|head)
raw: 0ffff00000010200 dead000000000100 dead000000000122 ffff00000302f600
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Signed-off-by: Guillaume Ranquet <[email protected]>

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/mediatek/mtk-uart-apdma.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
index 27c07350971d..e38b67fc0c0c 100644
--- a/drivers/dma/mediatek/mtk-uart-apdma.c
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -131,10 +131,7 @@ static unsigned int mtk_uart_apdma_read(struct mtk_chan *c, unsigned int reg)

static void mtk_uart_apdma_desc_free(struct virt_dma_desc *vd)
{
- struct dma_chan *chan = vd->tx.chan;
- struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
-
- kfree(c->desc);
+ kfree(container_of(vd, struct mtk_uart_apdma_desc, vd));
}

static void mtk_uart_apdma_start_tx(struct mtk_chan *c)
--
2.30.2

2021-06-28 14:22:35

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 032/110] dmaengine: mediatek: do not issue a new desc if one is still current

From: Guillaume Ranquet <[email protected]>

[ Upstream commit 2537b40b0a4f61d2c83900744fe89b09076be9c6 ]

Avoid issuing a new desc if one is still being processed as this can
lead to some desc never being marked as completed.

Signed-off-by: Guillaume Ranquet <[email protected]>

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/mediatek/mtk-uart-apdma.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
index e38b67fc0c0c..a09ab2dd3b46 100644
--- a/drivers/dma/mediatek/mtk-uart-apdma.c
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -204,14 +204,9 @@ static void mtk_uart_apdma_start_rx(struct mtk_chan *c)

static void mtk_uart_apdma_tx_handler(struct mtk_chan *c)
{
- struct mtk_uart_apdma_desc *d = c->desc;
-
mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
-
- list_del(&d->vd.node);
- vchan_cookie_complete(&d->vd);
}

static void mtk_uart_apdma_rx_handler(struct mtk_chan *c)
@@ -242,9 +237,17 @@ static void mtk_uart_apdma_rx_handler(struct mtk_chan *c)

c->rx_status = d->avail_len - cnt;
mtk_uart_apdma_write(c, VFF_RPT, wg);
+}

- list_del(&d->vd.node);
- vchan_cookie_complete(&d->vd);
+static void mtk_uart_apdma_chan_complete_handler(struct mtk_chan *c)
+{
+ struct mtk_uart_apdma_desc *d = c->desc;
+
+ if (d) {
+ list_del(&d->vd.node);
+ vchan_cookie_complete(&d->vd);
+ c->desc = NULL;
+ }
}

static irqreturn_t mtk_uart_apdma_irq_handler(int irq, void *dev_id)
@@ -258,6 +261,7 @@ static irqreturn_t mtk_uart_apdma_irq_handler(int irq, void *dev_id)
mtk_uart_apdma_rx_handler(c);
else if (c->dir == DMA_MEM_TO_DEV)
mtk_uart_apdma_tx_handler(c);
+ mtk_uart_apdma_chan_complete_handler(c);
spin_unlock_irqrestore(&c->vc.lock, flags);

return IRQ_HANDLED;
@@ -363,7 +367,7 @@ static void mtk_uart_apdma_issue_pending(struct dma_chan *chan)
unsigned long flags;

spin_lock_irqsave(&c->vc.lock, flags);
- if (vchan_issue_pending(&c->vc)) {
+ if (vchan_issue_pending(&c->vc) && !c->desc) {
vd = vchan_next_desc(&c->vc);
c->desc = to_mtk_uart_apdma_desc(&vd->tx);

--
2.30.2

2021-06-28 14:22:35

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 033/110] dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma

From: Guillaume Ranquet <[email protected]>

[ Upstream commit 9041575348b21ade1fb74d790f1aac85d68198c7 ]

As recommended by the doc in:
Documentation/drivers-api/dmaengine/provider.rst

Use GFP_NOWAIT to not deplete the emergency pool.

Signed-off-by: Guillaume Ranquet <[email protected]>

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/mediatek/mtk-uart-apdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
index a09ab2dd3b46..375e7e647df6 100644
--- a/drivers/dma/mediatek/mtk-uart-apdma.c
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -349,7 +349,7 @@ static struct dma_async_tx_descriptor *mtk_uart_apdma_prep_slave_sg
return NULL;

/* Now allocate and setup the descriptor */
- d = kzalloc(sizeof(*d), GFP_ATOMIC);
+ d = kzalloc(sizeof(*d), GFP_NOWAIT);
if (!d)
return NULL;

--
2.30.2

2021-06-28 14:22:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 020/110] perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context

From: Like Xu <[email protected]>

[ Upstream commit 488e13a489e9707a7e81e1991fdd1f20c0f04689 ]

If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:

[] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[] Call Trace:
[] kmem_cache_alloc+0x36/0x250
[] intel_pmu_lbr_add+0x152/0x170
[] x86_pmu_add+0x83/0xd0

Make it symmetric with the release_lbr_buffers() call and mirror the
existing DS buffers.

Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Signed-off-by: Like Xu <[email protected]>
[peterz: simplified]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Kan Liang <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/events/core.c | 6 ++++--
arch/x86/events/intel/lbr.c | 26 ++++++++++++++++++++------
arch/x86/events/perf_event.h | 6 ++++++
3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df17129695..10cadd73a8ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -380,10 +380,12 @@ int x86_reserve_hardware(void)
if (!atomic_inc_not_zero(&pmc_refcount)) {
mutex_lock(&pmc_reserve_mutex);
if (atomic_read(&pmc_refcount) == 0) {
- if (!reserve_pmc_hardware())
+ if (!reserve_pmc_hardware()) {
err = -EBUSY;
- else
+ } else {
reserve_ds_buffers();
+ reserve_lbr_buffers();
+ }
}
if (!err)
atomic_inc(&pmc_refcount);
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890dacfcfe..22d0e40a1920 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)

void intel_pmu_lbr_add(struct perf_event *event)
{
- struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
perf_sched_cb_inc(event->ctx->pmu);
if (!cpuc->lbr_users++ && !event->total_time_running)
intel_pmu_lbr_reset();
-
- if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
- kmem_cache && !cpuc->lbr_xsave &&
- (cpuc->lbr_users != cpuc->lbr_pebs_users))
- cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
}

void release_lbr_buffers(void)
@@ -721,6 +715,26 @@ void release_lbr_buffers(void)
}
}

+void reserve_lbr_buffers(void)
+{
+ struct kmem_cache *kmem_cache;
+ struct cpu_hw_events *cpuc;
+ int cpu;
+
+ if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+ return;
+
+ for_each_possible_cpu(cpu) {
+ cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+ kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
+ if (!kmem_cache || cpuc->lbr_xsave)
+ continue;
+
+ cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+ cpu_to_node(cpu));
+ }
+}
+
void intel_pmu_lbr_del(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 53b2b5fc23bc..7888266c76cd 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1135,6 +1135,8 @@ void reserve_ds_buffers(void);

void release_lbr_buffers(void);

+void reserve_lbr_buffers(void);
+
extern struct event_constraint bts_constraint;
extern struct event_constraint vlbr_constraint;

@@ -1282,6 +1284,10 @@ static inline void release_lbr_buffers(void)
{
}

+static inline void reserve_lbr_buffers(void)
+{
+}
+
static inline int intel_pmu_init(void)
{
return 0;
--
2.30.2

2021-06-28 14:22:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 037/110] inet: annotate data race in inet_send_prepare() and inet_dgram_connect()

From: Eric Dumazet <[email protected]>

[ Upstream commit dcd01eeac14486b56a790f5cce9b823440ba5b34 ]

Both functions are known to be racy when reading inet_num
as we do not want to grab locks for the common case the socket
has been bound already. The race is resolved in inet_autobind()
by reading again inet_num under the socket lock.

syzbot reported:
BUG: KCSAN: data-race in inet_send_prepare / udp_lib_get_port

write to 0xffff88812cba150e of 2 bytes by task 24135 on cpu 0:
udp_lib_get_port+0x4b2/0xe20 net/ipv4/udp.c:308
udp_v6_get_port+0x5e/0x70 net/ipv6/udp.c:89
inet_autobind net/ipv4/af_inet.c:183 [inline]
inet_send_prepare+0xd0/0x210 net/ipv4/af_inet.c:807
inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff88812cba150e of 2 bytes by task 24132 on cpu 1:
inet_send_prepare+0x21/0x210 net/ipv4/af_inet.c:806
inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000 -> 0x9db4

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24132 Comm: syz-executor.2 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/af_inet.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1355e6c0d567..faa7856c7fb0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -575,7 +575,7 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr,
return err;
}

- if (!inet_sk(sk)->inet_num && inet_autobind(sk))
+ if (data_race(!inet_sk(sk)->inet_num) && inet_autobind(sk))
return -EAGAIN;
return sk->sk_prot->connect(sk, uaddr, addr_len);
}
@@ -803,7 +803,7 @@ int inet_send_prepare(struct sock *sk)
sock_rps_record_flow(sk);

/* We may need to bind the socket. */
- if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind &&
+ if (data_race(!inet_sk(sk)->inet_num) && !sk->sk_prot->no_autobind &&
inet_autobind(sk))
return -EAGAIN;

--
2.30.2

2021-06-28 14:22:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 026/110] mac80211: remove warning in ieee80211_get_sband()

From: Johannes Berg <[email protected]>

[ Upstream commit 0ee4d55534f82a0624701d0bb9fc2304d4529086 ]

Syzbot reports that it's possible to hit this from userspace,
by trying to add a station before any other connection setup
has been done. Instead of trying to catch this in some other
way simply remove the warning, that will appropriately reject
the call from userspace.

Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210517164715.f537da276d17.Id05f40ec8761d6a8cc2df87f1aa09c651988a586@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/mac80211/ieee80211_i.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 02e818d740f6..5ec437e8e713 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1442,7 +1442,7 @@ ieee80211_get_sband(struct ieee80211_sub_if_data *sdata)
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);

- if (WARN_ON_ONCE(!chanctx_conf)) {
+ if (!chanctx_conf) {
rcu_read_unlock();
return NULL;
}
--
2.30.2

2021-06-28 14:22:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 035/110] mac80211: drop multicast fragments

From: Johannes Berg <[email protected]>

[ Upstream commit a9799541ca34652d9996e45f80e8e03144c12949 ]

These are not permitted by the spec, just drop them.

Link: https://lore.kernel.org/r/20210609161305.23def022b750.Ibd6dd3cdce573dae262fcdc47f8ac52b883a9c50@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/mac80211/rx.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 59de7a86599d..cb5cbf02dbac 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2239,17 +2239,15 @@ ieee80211_rx_h_defragment(struct ieee80211_rx_data *rx)
sc = le16_to_cpu(hdr->seq_ctrl);
frag = sc & IEEE80211_SCTL_FRAG;

- if (is_multicast_ether_addr(hdr->addr1)) {
- I802_DEBUG_INC(rx->local->dot11MulticastReceivedFrameCount);
- goto out_no_led;
- }
-
if (rx->sta)
cache = &rx->sta->frags;

if (likely(!ieee80211_has_morefrags(fc) && frag == 0))
goto out;

+ if (is_multicast_ether_addr(hdr->addr1))
+ return RX_DROP_MONITOR;
+
I802_DEBUG_INC(rx->local->rx_handlers_fragments);

if (skb_linearize(rx->skb))
@@ -2375,7 +2373,6 @@ ieee80211_rx_h_defragment(struct ieee80211_rx_data *rx)

out:
ieee80211_led_rx(rx->local);
- out_no_led:
if (rx->sta)
rx->sta->rx_stats.packets++;
return RX_CONTINUE;
--
2.30.2

2021-06-28 14:23:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 039/110] net: annotate data race in sock_error()

From: Eric Dumazet <[email protected]>

[ Upstream commit f13ef10059ccf5f4ed201cd050176df62ec25bb8 ]

sock_error() is known to be racy. The code avoids
an atomic operation is sk_err is zero, and this field
could be changed under us, this is fine.

Sysbot reported:

BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock

write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1:
unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550
unix_release+0x2f/0x50 net/unix/af_unix.c:859
__sock_release net/socket.c:599 [inline]
sock_close+0x6c/0x150 net/socket.c:1258
__fput+0x25b/0x4e0 fs/file_table.c:280
____fput+0x11/0x20 fs/file_table.c:313
task_work_run+0xae/0x130 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0:
sock_error include/net/sock.h:2269 [inline]
sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336
unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671
unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
__sys_sendmsg_sock+0x25/0x30 net/socket.c:2416
io_sendmsg fs/io_uring.c:4367 [inline]
io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135
__io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414
__io_req_task_submit fs/io_uring.c:2039 [inline]
io_async_task_func+0x312/0x590 fs/io_uring.c:5074
__tctx_task_work fs/io_uring.c:1910 [inline]
tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924
task_work_run+0xae/0x130 kernel/task_work.c:164
tracehook_notify_signal include/linux/tracehook.h:212 [inline]
handle_signal_work kernel/entry/common.c:145 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -> 0x00000068

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/sock.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 62e3811e95a7..b98c80a7c7ae 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2260,8 +2260,13 @@ struct sk_buff *sock_dequeue_err_skb(struct sock *sk);
static inline int sock_error(struct sock *sk)
{
int err;
- if (likely(!sk->sk_err))
+
+ /* Avoid an atomic operation for the common case.
+ * This is racy since another cpu/thread can change sk_err under us.
+ */
+ if (likely(data_race(!sk->sk_err)))
return 0;
+
err = xchg(&sk->sk_err, 0);
return -err;
}
--
2.30.2

2021-06-28 14:23:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 040/110] inet: annotate date races around sk->sk_txhash

From: Eric Dumazet <[email protected]>

[ Upstream commit b71eaed8c04f72a919a9c44e83e4ee254e69e7f3 ]

UDP sendmsg() path can be lockless, it is possible for another
thread to re-connect an change sk->sk_txhash under us.

There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE()
pair to document the race.

BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w

write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1:
sk_set_txhash include/net/sock.h:1937 [inline]
__ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75
__ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189
ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580
__sys_connect_file net/socket.c:1837 [inline]
__sys_connect+0x245/0x280 net/socket.c:1854
__do_sys_connect net/socket.c:1864 [inline]
__se_sys_connect net/socket.c:1861 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1861
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0:
skb_set_hash_from_sk include/net/sock.h:2211 [inline]
skb_set_owner_w+0x118/0x220 net/core/sock.c:2101
sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359
sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373
__ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621
ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983
udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xbca3c43d -> 0xfdb309e0

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/sock.h | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b98c80a7c7ae..b9bdeca1d784 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1928,7 +1928,8 @@ static inline u32 net_tx_rndhash(void)

static inline void sk_set_txhash(struct sock *sk)
{
- sk->sk_txhash = net_tx_rndhash();
+ /* This pairs with READ_ONCE() in skb_set_hash_from_sk() */
+ WRITE_ONCE(sk->sk_txhash, net_tx_rndhash());
}

static inline bool sk_rethink_txhash(struct sock *sk)
@@ -2200,9 +2201,12 @@ static inline void sock_poll_wait(struct file *filp, struct socket *sock,

static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
{
- if (sk->sk_txhash) {
+ /* This pairs with WRITE_ONCE() in sk_set_txhash() */
+ u32 txhash = READ_ONCE(sk->sk_txhash);
+
+ if (txhash) {
skb->l4_hash = 1;
- skb->hash = sk->sk_txhash;
+ skb->hash = txhash;
}
}

--
2.30.2

2021-06-28 14:23:08

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 034/110] net: ipv4: Remove unneed BUG() function

From: Zheng Yongjun <[email protected]>

[ Upstream commit 5ac6b198d7e312bd10ebe7d58c64690dc59cc49a ]

When 'nla_parse_nested_deprecated' failed, it's no need to
BUG() here, return -EINVAL is ok.

Signed-off-by: Zheng Yongjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/devinet.c | 2 +-
net/ipv6/addrconf.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2e35f68da40a..1c6429c353a9 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1989,7 +1989,7 @@ static int inet_set_link_af(struct net_device *dev, const struct nlattr *nla,
return -EAFNOSUPPORT;

if (nla_parse_nested_deprecated(tb, IFLA_INET_MAX, nla, NULL, NULL) < 0)
- BUG();
+ return -EINVAL;

if (tb[IFLA_INET_CONF]) {
nla_for_each_nested(a, tb[IFLA_INET_CONF], rem)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a9e53f5942fa..eab0a46983c0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5822,7 +5822,7 @@ static int inet6_set_link_af(struct net_device *dev, const struct nlattr *nla,
return -EAFNOSUPPORT;

if (nla_parse_nested_deprecated(tb, IFLA_INET6_MAX, nla, NULL, NULL) < 0)
- BUG();
+ return -EINVAL;

if (tb[IFLA_INET6_TOKEN]) {
err = inet6_set_iftoken(idev, nla_data(tb[IFLA_INET6_TOKEN]),
--
2.30.2

2021-06-28 14:23:11

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 027/110] mac80211_hwsim: drop pending frames on stop

From: Johannes Berg <[email protected]>

[ Upstream commit bd18de517923903a177508fc8813f44e717b1c00 ]

Syzbot reports that we may be able to get into a situation where
mac80211 has pending ACK frames on shutdown with hwsim. It appears
that the reason for this is that syzbot uses the wmediumd hooks to
intercept/injection frames, and may shut down hwsim, removing the
radio(s), while frames are pending in the air simulation.

Clean out the pending queue when the interface is stopped, after
this the frames can't be reported back to mac80211 properly anyway.

Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210517170429.b0f85ab0eda1.Ie42a6ec6b940c971f3441286aeaaae2fe368e29a@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index fa7d4c20dc13..30b39cb4056a 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1693,8 +1693,13 @@ static int mac80211_hwsim_start(struct ieee80211_hw *hw)
static void mac80211_hwsim_stop(struct ieee80211_hw *hw)
{
struct mac80211_hwsim_data *data = hw->priv;
+
data->started = false;
hrtimer_cancel(&data->beacon_timer);
+
+ while (!skb_queue_empty(&data->pending))
+ ieee80211_free_txskb(hw, skb_dequeue(&data->pending));
+
wiphy_dbg(hw->wiphy, "%s\n", __func__);
}

--
2.30.2

2021-06-28 14:23:14

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 038/110] ping: Check return value of function 'ping_queue_rcv_skb'

From: Zheng Yongjun <[email protected]>

[ Upstream commit 9d44fa3e50cc91691896934d106c86e4027e61ca ]

Function 'ping_queue_rcv_skb' not always return success, which will
also return fail. If not check the wrong return value of it, lead to function
`ping_rcv` return success.

Signed-off-by: Zheng Yongjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/ping.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 8b943f85fff9..ea22768f76b8 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -952,6 +952,7 @@ bool ping_rcv(struct sk_buff *skb)
struct sock *sk;
struct net *net = dev_net(skb->dev);
struct icmphdr *icmph = icmp_hdr(skb);
+ bool rc = false;

/* We assume the packet has already been checked by icmp_rcv */

@@ -966,14 +967,15 @@ bool ping_rcv(struct sk_buff *skb)
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);

pr_debug("rcv on socket %p\n", sk);
- if (skb2)
- ping_queue_rcv_skb(sk, skb2);
+ if (skb2 && !ping_queue_rcv_skb(sk, skb2))
+ rc = true;
sock_put(sk);
- return true;
}
- pr_debug("no socket, dropping\n");

- return false;
+ if (!rc)
+ pr_debug("no socket, dropping\n");
+
+ return rc;
}
EXPORT_SYMBOL_GPL(ping_rcv);

--
2.30.2

2021-06-28 14:23:37

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 046/110] qmi_wwan: Do not call netif_rx from rx_fixup

From: Kristian Evensen <[email protected]>

[ Upstream commit 057d49334c02a79af81c30a8d240e641bd6f1741 ]

When the QMI_WWAN_FLAG_PASS_THROUGH is set, netif_rx() is called from
qmi_wwan_rx_fixup(). When the call to netif_rx() is successful (which is
most of the time), usbnet_skb_return() is called (from rx_process()).
usbnet_skb_return() will then call netif_rx() a second time for the same
skb.

Simplify the code and avoid the redundant netif_rx() call by changing
qmi_wwan_rx_fixup() to always return 1 when QMI_WWAN_FLAG_PASS_THROUGH
is set. We then leave it up to the existing infrastructure to call
netif_rx().

Suggested-by: Bjørn Mork <[email protected]>
Signed-off-by: Kristian Evensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/usb/qmi_wwan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 6700f1970b24..bc55ec739af9 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -575,7 +575,7 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)

if (info->flags & QMI_WWAN_FLAG_PASS_THROUGH) {
skb->protocol = htons(ETH_P_MAP);
- return (netif_rx(skb) == NET_RX_SUCCESS);
+ return 1;
}

switch (skb->data[0] & 0xf0) {
--
2.30.2

2021-06-28 14:23:44

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 048/110] net/packet: annotate accesses to po->ifindex

From: Eric Dumazet <[email protected]>

[ Upstream commit e032f7c9c7cefffcfb79b9fc16c53011d2d9d11f ]

Like prior patch, we need to annotate lockless accesses to po->ifindex
For instance, packet_getname() is reading po->ifindex (twice) while
another thread is able to change po->ifindex.

KCSAN reported:

BUG: KCSAN: data-race in packet_do_bind / packet_getname

write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1:
packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191
packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
__sys_bind+0x200/0x290 net/socket.c:1637
__do_sys_bind net/socket.c:1648 [inline]
__se_sys_bind net/socket.c:1646 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1646
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0:
packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525
__sys_getsockname+0x10e/0x1a0 net/socket.c:1887
__do_sys_getsockname net/socket.c:1902 [inline]
__se_sys_getsockname net/socket.c:1899 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:1899
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -> 0x00000001

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/packet/af_packet.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8e07341a98af..68a4dd251242 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3187,11 +3187,11 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
if (unlikely(unlisted)) {
dev_put(dev);
po->prot_hook.dev = NULL;
- po->ifindex = -1;
+ WRITE_ONCE(po->ifindex, -1);
packet_cached_dev_reset(po);
} else {
po->prot_hook.dev = dev;
- po->ifindex = dev ? dev->ifindex : 0;
+ WRITE_ONCE(po->ifindex, dev ? dev->ifindex : 0);
packet_cached_dev_assign(po, dev);
}
}
@@ -3505,7 +3505,7 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr,
uaddr->sa_family = AF_PACKET;
memset(uaddr->sa_data, 0, sizeof(uaddr->sa_data));
rcu_read_lock();
- dev = dev_get_by_index_rcu(sock_net(sk), pkt_sk(sk)->ifindex);
+ dev = dev_get_by_index_rcu(sock_net(sk), READ_ONCE(pkt_sk(sk)->ifindex));
if (dev)
strlcpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data));
rcu_read_unlock();
@@ -3520,16 +3520,18 @@ static int packet_getname(struct socket *sock, struct sockaddr *uaddr,
struct sock *sk = sock->sk;
struct packet_sock *po = pkt_sk(sk);
DECLARE_SOCKADDR(struct sockaddr_ll *, sll, uaddr);
+ int ifindex;

if (peer)
return -EOPNOTSUPP;

+ ifindex = READ_ONCE(po->ifindex);
sll->sll_family = AF_PACKET;
- sll->sll_ifindex = po->ifindex;
+ sll->sll_ifindex = ifindex;
sll->sll_protocol = READ_ONCE(po->num);
sll->sll_pkttype = 0;
rcu_read_lock();
- dev = dev_get_by_index_rcu(sock_net(sk), po->ifindex);
+ dev = dev_get_by_index_rcu(sock_net(sk), ifindex);
if (dev) {
sll->sll_hatype = dev->type;
sll->sll_halen = dev->addr_len;
@@ -4108,7 +4110,7 @@ static int packet_notifier(struct notifier_block *this,
}
if (msg == NETDEV_UNREGISTER) {
packet_cached_dev_reset(po);
- po->ifindex = -1;
+ WRITE_ONCE(po->ifindex, -1);
if (po->prot_hook.dev)
dev_put(po->prot_hook.dev);
po->prot_hook.dev = NULL;
@@ -4620,7 +4622,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
refcount_read(&s->sk_refcnt),
s->sk_type,
ntohs(READ_ONCE(po->num)),
- po->ifindex,
+ READ_ONCE(po->ifindex),
po->running,
atomic_read(&s->sk_rmem_alloc),
from_kuid_munged(seq_user_ns(seq), sock_i_uid(s)),
--
2.30.2

2021-06-28 14:23:53

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 043/110] riscv32: Use medany C model for modules

From: Khem Raj <[email protected]>

[ Upstream commit 5d2388dbf84adebeb6d9742164be8d32728e4269 ]

When CONFIG_CMODEL_MEDLOW is used it ends up generating riscv_hi20_rela
relocations in modules which are not resolved during runtime and
following errors would be seen

[ 4.802714] virtio_input: target 00000000c1539090 can not be addressed by the 32-bit offset from PC = 39148b7b
[ 4.854800] virtio_input: target 00000000c1539090 can not be addressed by the 32-bit offset from PC = 9774456d

Signed-off-by: Khem Raj <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/riscv/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 5243bf2327c0..a5ee34117321 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -16,7 +16,7 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
CC_FLAGS_FTRACE := -fpatchable-function-entry=8
endif

-ifeq ($(CONFIG_64BIT)$(CONFIG_CMODEL_MEDLOW),yy)
+ifeq ($(CONFIG_CMODEL_MEDLOW),y)
KBUILD_CFLAGS_MODULE += -mcmodel=medany
endif

--
2.30.2

2021-06-28 14:23:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 051/110] r8169: Avoid memcpy() over-reading of ETH_SS_STATS

From: Kees Cook <[email protected]>

[ Upstream commit da5ac772cfe2a03058b0accfac03fad60c46c24d ]

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.

The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 1df2c002c9f6..f7a56e05ec8a 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1673,7 +1673,7 @@ static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 *data)
{
switch(stringset) {
case ETH_SS_STATS:
- memcpy(data, *rtl8169_gstrings, sizeof(rtl8169_gstrings));
+ memcpy(data, rtl8169_gstrings, sizeof(rtl8169_gstrings));
break;
}
}
--
2.30.2

2021-06-28 14:24:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 036/110] net: ethtool: clear heap allocations for ethtool function

From: Austin Kim <[email protected]>

[ Upstream commit 80ec82e3d2c1fab42eeb730aaa7985494a963d3f ]

Several ethtool functions leave heap uncleared (potentially) by
drivers. This will leave the unused portion of heap unchanged and
might copy the full contents back to userspace.

Signed-off-by: Austin Kim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ethtool/ioctl.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 2603966da904..e910890a868c 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1421,7 +1421,7 @@ static int ethtool_get_any_eeprom(struct net_device *dev, void __user *useraddr,
if (eeprom.offset + eeprom.len > total_len)
return -EINVAL;

- data = kmalloc(PAGE_SIZE, GFP_USER);
+ data = kzalloc(PAGE_SIZE, GFP_USER);
if (!data)
return -ENOMEM;

@@ -1486,7 +1486,7 @@ static int ethtool_set_eeprom(struct net_device *dev, void __user *useraddr)
if (eeprom.offset + eeprom.len > ops->get_eeprom_len(dev))
return -EINVAL;

- data = kmalloc(PAGE_SIZE, GFP_USER);
+ data = kzalloc(PAGE_SIZE, GFP_USER);
if (!data)
return -ENOMEM;

@@ -1765,7 +1765,7 @@ static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
return -EFAULT;

test.len = test_len;
- data = kmalloc_array(test_len, sizeof(u64), GFP_USER);
+ data = kcalloc(test_len, sizeof(u64), GFP_USER);
if (!data)
return -ENOMEM;

@@ -2281,7 +2281,7 @@ static int ethtool_get_tunable(struct net_device *dev, void __user *useraddr)
ret = ethtool_tunable_valid(&tuna);
if (ret)
return ret;
- data = kmalloc(tuna.len, GFP_USER);
+ data = kzalloc(tuna.len, GFP_USER);
if (!data)
return -ENOMEM;
ret = ops->get_tunable(dev, &tuna, data);
@@ -2473,7 +2473,7 @@ static int get_phy_tunable(struct net_device *dev, void __user *useraddr)
ret = ethtool_phy_tunable_valid(&tuna);
if (ret)
return ret;
- data = kmalloc(tuna.len, GFP_USER);
+ data = kzalloc(tuna.len, GFP_USER);
if (!data)
return -ENOMEM;
if (phy_drv_tunable) {
--
2.30.2

2021-06-28 14:24:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 049/110] r8152: Avoid memcpy() over-reading of ETH_SS_STATS

From: Kees Cook <[email protected]>

[ Upstream commit 99718abdc00e86e4f286dd836408e2834886c16e ]

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.

The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/usb/r8152.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 20fb5638ac65..23fae943a119 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -6078,7 +6078,7 @@ static void rtl8152_get_strings(struct net_device *dev, u32 stringset, u8 *data)
{
switch (stringset) {
case ETH_SS_STATS:
- memcpy(data, *rtl8152_gstrings, sizeof(rtl8152_gstrings));
+ memcpy(data, rtl8152_gstrings, sizeof(rtl8152_gstrings));
break;
}
}
--
2.30.2

2021-06-28 14:24:17

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 055/110] mac80211: handle various extensible elements correctly

From: Johannes Berg <[email protected]>

[ Upstream commit 652e8363bbc7d149fa194a5cbf30b1001c0274b0 ]

Various elements are parsed with a requirement to have an
exact size, when really we should only check that they have
the minimum size that we need. Check only that and therefore
ignore any additional data that they might carry.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.cd101f8040a4.Iadf0e9b37b100c6c6e79c7b298cc657c2be9151a@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/mac80211/util.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 53755a05f73b..06342693799e 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -955,7 +955,7 @@ static void ieee80211_parse_extension_element(u32 *crc,

switch (elem->data[0]) {
case WLAN_EID_EXT_HE_MU_EDCA:
- if (len == sizeof(*elems->mu_edca_param_set)) {
+ if (len >= sizeof(*elems->mu_edca_param_set)) {
elems->mu_edca_param_set = data;
if (crc)
*crc = crc32_be(*crc, (void *)elem,
@@ -976,7 +976,7 @@ static void ieee80211_parse_extension_element(u32 *crc,
}
break;
case WLAN_EID_EXT_UORA:
- if (len == 1)
+ if (len >= 1)
elems->uora_element = data;
break;
case WLAN_EID_EXT_MAX_CHANNEL_SWITCH_TIME:
@@ -984,7 +984,7 @@ static void ieee80211_parse_extension_element(u32 *crc,
elems->max_channel_switch_time = data;
break;
case WLAN_EID_EXT_MULTIPLE_BSSID_CONFIGURATION:
- if (len == sizeof(*elems->mbssid_config_ie))
+ if (len >= sizeof(*elems->mbssid_config_ie))
elems->mbssid_config_ie = data;
break;
case WLAN_EID_EXT_HE_SPR:
@@ -993,7 +993,7 @@ static void ieee80211_parse_extension_element(u32 *crc,
elems->he_spr = data;
break;
case WLAN_EID_EXT_HE_6GHZ_CAPA:
- if (len == sizeof(*elems->he_6ghz_capa))
+ if (len >= sizeof(*elems->he_6ghz_capa))
elems->he_6ghz_capa = data;
break;
}
@@ -1082,14 +1082,14 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,

switch (id) {
case WLAN_EID_LINK_ID:
- if (elen + 2 != sizeof(struct ieee80211_tdls_lnkie)) {
+ if (elen + 2 < sizeof(struct ieee80211_tdls_lnkie)) {
elem_parse_failed = true;
break;
}
elems->lnk_id = (void *)(pos - 2);
break;
case WLAN_EID_CHAN_SWITCH_TIMING:
- if (elen != sizeof(struct ieee80211_ch_switch_timing)) {
+ if (elen < sizeof(struct ieee80211_ch_switch_timing)) {
elem_parse_failed = true;
break;
}
@@ -1252,7 +1252,7 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
elems->sec_chan_offs = (void *)pos;
break;
case WLAN_EID_CHAN_SWITCH_PARAM:
- if (elen !=
+ if (elen <
sizeof(*elems->mesh_chansw_params_ie)) {
elem_parse_failed = true;
break;
@@ -1261,7 +1261,7 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
break;
case WLAN_EID_WIDE_BW_CHANNEL_SWITCH:
if (!action ||
- elen != sizeof(*elems->wide_bw_chansw_ie)) {
+ elen < sizeof(*elems->wide_bw_chansw_ie)) {
elem_parse_failed = true;
break;
}
@@ -1280,7 +1280,7 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
ie = cfg80211_find_ie(WLAN_EID_WIDE_BW_CHANNEL_SWITCH,
pos, elen);
if (ie) {
- if (ie[1] == sizeof(*elems->wide_bw_chansw_ie))
+ if (ie[1] >= sizeof(*elems->wide_bw_chansw_ie))
elems->wide_bw_chansw_ie =
(void *)(ie + 2);
else
@@ -1324,7 +1324,7 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
elems->cisco_dtpc_elem = pos;
break;
case WLAN_EID_ADDBA_EXT:
- if (elen != sizeof(struct ieee80211_addba_ext_ie)) {
+ if (elen < sizeof(struct ieee80211_addba_ext_ie)) {
elem_parse_failed = true;
break;
}
@@ -1350,7 +1350,7 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
elem, elems);
break;
case WLAN_EID_S1G_CAPABILITIES:
- if (elen == sizeof(*elems->s1g_capab))
+ if (elen >= sizeof(*elems->s1g_capab))
elems->s1g_capab = (void *)pos;
else
elem_parse_failed = true;
--
2.30.2

2021-06-28 14:24:31

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 053/110] net: qed: Fix memcpy() overflow of qed_dcbx_params()

From: Kees Cook <[email protected]>

[ Upstream commit 1c200f832e14420fa770193f9871f4ce2df00d07 ]

The source (&dcbx_info->operational.params) and dest
(&p_hwfn->p_dcbx_info->set.config.params) are both struct qed_dcbx_params
(560 bytes), not struct qed_dcbx_admin_params (564 bytes), which is used
as the memcpy() size.

However it seems that struct qed_dcbx_operational_params
(dcbx_info->operational)'s layout matches struct qed_dcbx_admin_params
(p_hwfn->p_dcbx_info->set.config)'s 4 byte difference (3 padding, 1 byte
for "valid").

On the assumption that the size is wrong (rather than the source structure
type), adjust the memcpy() size argument to be 4 bytes smaller and add
a BUILD_BUG_ON() to validate any changes to the structure sizes.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 17d5b649eb36..e81dd34a3cac 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -1266,9 +1266,11 @@ int qed_dcbx_get_config_params(struct qed_hwfn *p_hwfn,
p_hwfn->p_dcbx_info->set.ver_num |= DCBX_CONFIG_VERSION_STATIC;

p_hwfn->p_dcbx_info->set.enabled = dcbx_info->operational.enabled;
+ BUILD_BUG_ON(sizeof(dcbx_info->operational.params) !=
+ sizeof(p_hwfn->p_dcbx_info->set.config.params));
memcpy(&p_hwfn->p_dcbx_info->set.config.params,
&dcbx_info->operational.params,
- sizeof(struct qed_dcbx_admin_params));
+ sizeof(p_hwfn->p_dcbx_info->set.config.params));
p_hwfn->p_dcbx_info->set.config.valid = true;

memcpy(params, &p_hwfn->p_dcbx_info->set, sizeof(struct qed_dcbx_set));
--
2.30.2

2021-06-28 14:24:43

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 047/110] net/packet: annotate accesses to po->bind

From: Eric Dumazet <[email protected]>

[ Upstream commit c7d2ef5dd4b03ed0ee1d13bc0c55f9cf62d49bd6 ]

tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show()
can read po->num without holding a lock. This means other threads
can change po->num at the same time.

KCSAN complained about this known fact [1]
Add READ_ONCE()/WRITE_ONCE() to address the issue.

[1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg

write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0:
packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181
packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
__sys_bind+0x200/0x290 net/socket.c:1637
__do_sys_bind net/socket.c:1648 [inline]
__se_sys_bind net/socket.c:1646 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1646
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1:
packet_snd net/packet/af_packet.c:2899 [inline]
packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmsg+0x1ed/0x270 net/socket.c:2433
__do_sys_sendmsg net/socket.c:2442 [inline]
__se_sys_sendmsg net/socket.c:2440 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2440
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000 -> 0x1200

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/packet/af_packet.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 84d8921391c3..8e07341a98af 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2683,7 +2683,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
}
if (likely(saddr == NULL)) {
dev = packet_cached_dev_get(po);
- proto = po->num;
+ proto = READ_ONCE(po->num);
} else {
err = -EINVAL;
if (msg->msg_namelen < sizeof(struct sockaddr_ll))
@@ -2896,7 +2896,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)

if (likely(saddr == NULL)) {
dev = packet_cached_dev_get(po);
- proto = po->num;
+ proto = READ_ONCE(po->num);
} else {
err = -EINVAL;
if (msg->msg_namelen < sizeof(struct sockaddr_ll))
@@ -3171,7 +3171,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
/* prevents packet_notifier() from calling
* register_prot_hook()
*/
- po->num = 0;
+ WRITE_ONCE(po->num, 0);
__unregister_prot_hook(sk, true);
rcu_read_lock();
dev_curr = po->prot_hook.dev;
@@ -3181,7 +3181,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
}

BUG_ON(po->running);
- po->num = proto;
+ WRITE_ONCE(po->num, proto);
po->prot_hook.type = proto;

if (unlikely(unlisted)) {
@@ -3526,7 +3526,7 @@ static int packet_getname(struct socket *sock, struct sockaddr *uaddr,

sll->sll_family = AF_PACKET;
sll->sll_ifindex = po->ifindex;
- sll->sll_protocol = po->num;
+ sll->sll_protocol = READ_ONCE(po->num);
sll->sll_pkttype = 0;
rcu_read_lock();
dev = dev_get_by_index_rcu(sock_net(sk), po->ifindex);
@@ -4414,7 +4414,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
was_running = po->running;
num = po->num;
if (was_running) {
- po->num = 0;
+ WRITE_ONCE(po->num, 0);
__unregister_prot_hook(sk, false);
}
spin_unlock(&po->bind_lock);
@@ -4449,7 +4449,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,

spin_lock(&po->bind_lock);
if (was_running) {
- po->num = num;
+ WRITE_ONCE(po->num, num);
register_prot_hook(sk);
}
spin_unlock(&po->bind_lock);
@@ -4619,7 +4619,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
s,
refcount_read(&s->sk_refcnt),
s->sk_type,
- ntohs(po->num),
+ ntohs(READ_ONCE(po->num)),
po->ifindex,
po->running,
atomic_read(&s->sk_rmem_alloc),
--
2.30.2

2021-06-28 14:25:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 062/110] pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()

From: Andy Shevchenko <[email protected]>

[ Upstream commit 76b7f8fae30a9249f820e019f1e62eca992751a2 ]

device_for_each_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.

Fixes: 7e5ea974e61c ("pinctrl: pinctrl-microchip-sgpio: Add pinctrl driver for Microsemi Serial GPIO")
Cc: Lars Povlsen <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/pinctrl/pinctrl-microchip-sgpio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-microchip-sgpio.c b/drivers/pinctrl/pinctrl-microchip-sgpio.c
index c12fa57ebd12..165cb7a59715 100644
--- a/drivers/pinctrl/pinctrl-microchip-sgpio.c
+++ b/drivers/pinctrl/pinctrl-microchip-sgpio.c
@@ -845,8 +845,10 @@ static int microchip_sgpio_probe(struct platform_device *pdev)
i = 0;
device_for_each_child_node(dev, fwnode) {
ret = microchip_sgpio_register_bank(dev, priv, fwnode, i++);
- if (ret)
+ if (ret) {
+ fwnode_handle_put(fwnode);
return ret;
+ }
}

if (priv->in.gpio.ngpio != priv->out.gpio.ngpio) {
--
2.30.2

2021-06-28 14:25:16

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 064/110] i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access

From: Heiner Kallweit <[email protected]>

[ Upstream commit 065b6211a87746e196b56759a70c7851418dd741 ]

As explained in [0] currently we may leave SMBHSTSTS_INUSE_STS set,
thus potentially breaking ACPI/BIOS usage of the SMBUS device.

Seems patch [0] needs a little bit more of review effort, therefore
I'd suggest to apply a part of it as quick win. Just clearing
SMBHSTSTS_INUSE_STS when leaving i801_access() should fix the
referenced issue and leaves more time for discussing a more
sophisticated locking handling.

[0] https://www.spinics.net/lists/linux-i2c/msg51558.html

Fixes: 01590f361e94 ("i2c: i801: Instantiate SPD EEPROMs automatically")
Suggested-by: Hector Martin <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Hector Martin <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Tested-by: Jean Delvare <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/i2c/busses/i2c-i801.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index f9e1c2ceaac0..04a1e38f2a6f 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -978,6 +978,9 @@ static s32 i801_access(struct i2c_adapter *adap, u16 addr,
}

out:
+ /* Unlock the SMBus device for use by BIOS/ACPI */
+ outb_p(SMBHSTSTS_INUSE_STS, SMBHSTSTS(priv));
+
pm_runtime_mark_last_busy(&priv->pci_dev->dev);
pm_runtime_put_autosuspend(&priv->pci_dev->dev);
mutex_unlock(&priv->acpi_lock);
--
2.30.2

2021-06-28 14:25:34

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 057/110] PCI: Add AMD RS690 quirk to enable 64-bit DMA

From: Mikel Rychliski <[email protected]>

[ Upstream commit cacf994a91d3a55c0c2f853d6429cd7b86113915 ]

Although the AMD RS690 chipset has 64-bit DMA support, BIOS implementations
sometimes fail to configure the memory limit registers correctly.

The Acer F690GVM mainboard uses this chipset and a Marvell 88E8056 NIC. The
sky2 driver programs the NIC to use 64-bit DMA, which will not work:

sky2 0000:02:00.0: error interrupt status=0x8
sky2 0000:02:00.0 eth0: tx timeout
sky2 0000:02:00.0 eth0: transmit ring 0 .. 22 report=0 done=0

Other drivers required by this mainboard either don't support 64-bit DMA,
or have it disabled using driver specific quirks. For example, the ahci
driver has quirks to enable or disable 64-bit DMA depending on the BIOS
version (see ahci_sb600_enable_64bit() in ahci.c). This ahci quirk matches
against the SB600 SATA controller, but the real issue is almost certainly
with the RS690 PCI host that it was commonly attached to.

To avoid this issue in all drivers with 64-bit DMA support, fix the
configuration of the PCI host. If the kernel is aware of physical memory
above 4GB, but the BIOS never configured the PCI host with this
information, update the registers with our values.

[bhelgaas: drop PCI_DEVICE_ID_ATI_RS690 definition]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mikel Rychliski <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/pci/fixup.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 0a0e168be1cb..9b0e771302ce 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -779,4 +779,48 @@ DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_AMD, 0x1571, pci_amd_enable_64bit_bar);
DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_AMD, 0x15b1, pci_amd_enable_64bit_bar);
DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_AMD, 0x1601, pci_amd_enable_64bit_bar);

+#define RS690_LOWER_TOP_OF_DRAM2 0x30
+#define RS690_LOWER_TOP_OF_DRAM2_VALID 0x1
+#define RS690_UPPER_TOP_OF_DRAM2 0x31
+#define RS690_HTIU_NB_INDEX 0xA8
+#define RS690_HTIU_NB_INDEX_WR_ENABLE 0x100
+#define RS690_HTIU_NB_DATA 0xAC
+
+/*
+ * Some BIOS implementations support RAM above 4GB, but do not configure the
+ * PCI host to respond to bus master accesses for these addresses. These
+ * implementations set the TOP_OF_DRAM_SLOT1 register correctly, so PCI DMA
+ * works as expected for addresses below 4GB.
+ *
+ * Reference: "AMD RS690 ASIC Family Register Reference Guide" (pg. 2-57)
+ * https://www.amd.com/system/files/TechDocs/43372_rs690_rrg_3.00o.pdf
+ */
+static void rs690_fix_64bit_dma(struct pci_dev *pdev)
+{
+ u32 val = 0;
+ phys_addr_t top_of_dram = __pa(high_memory - 1) + 1;
+
+ if (top_of_dram <= (1ULL << 32))
+ return;
+
+ pci_write_config_dword(pdev, RS690_HTIU_NB_INDEX,
+ RS690_LOWER_TOP_OF_DRAM2);
+ pci_read_config_dword(pdev, RS690_HTIU_NB_DATA, &val);
+
+ if (val)
+ return;
+
+ pci_info(pdev, "Adjusting top of DRAM to %pa for 64-bit DMA support\n", &top_of_dram);
+
+ pci_write_config_dword(pdev, RS690_HTIU_NB_INDEX,
+ RS690_UPPER_TOP_OF_DRAM2 | RS690_HTIU_NB_INDEX_WR_ENABLE);
+ pci_write_config_dword(pdev, RS690_HTIU_NB_DATA, top_of_dram >> 32);
+
+ pci_write_config_dword(pdev, RS690_HTIU_NB_INDEX,
+ RS690_LOWER_TOP_OF_DRAM2 | RS690_HTIU_NB_INDEX_WR_ENABLE);
+ pci_write_config_dword(pdev, RS690_HTIU_NB_DATA,
+ top_of_dram | RS690_LOWER_TOP_OF_DRAM2_VALID);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7910, rs690_fix_64bit_dma);
+
#endif
--
2.30.2

2021-06-28 14:25:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 071/110] s390: fix system call restart with multiple signals

From: Sven Schnelle <[email protected]>

commit fc66127dc3396338f287c3b494dfbf102547e770 upstream.

glibc complained with "The futex facility returned an unexpected error
code.". It turned out that the futex syscall returned -ERESTARTSYS because
a signal is pending. arch_do_signal_or_restart() restored the syscall
parameters (nameley regs->gprs[2]) and set PIF_SYSCALL_RESTART. When
another signal is made pending later in the exit loop
arch_do_signal_or_restart() is called again. This function clears
PIF_SYSCALL_RESTART and checks the return code which is set in
regs->gprs[2]. However, regs->gprs[2] was restored in the previous run
and no longer contains -ERESTARTSYS, so PIF_SYSCALL_RESTART isn't set
again and the syscall is skipped.

Fix this by not clearing PIF_SYSCALL_RESTART - it is already cleared in
__do_syscall() when the syscall is restarted.

Reported-by: Bjoern Walk <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Fixes: 56e62a737028 ("s390: convert to generic entry")
Cc: <[email protected]> # 5.12
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/signal.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index 90163e6184f5..080e7aed181f 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -512,7 +512,6 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)

/* No handlers present - check for system call restart */
clear_pt_regs_flag(regs, PIF_SYSCALL);
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
if (current->thread.system_call) {
regs->int_code = current->thread.system_call;
switch (regs->gprs[2]) {
--
2.30.2

2021-06-28 14:25:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 060/110] riscv: dts: fu740: fix cache-controller interrupts

From: David Abdurachmanov <[email protected]>

[ Upstream commit 7ede12b01b59dc67bef2e2035297dd2da5bfe427 ]

The order of interrupt numbers is incorrect.

The order for FU740 is: DirError, DataError, DataFail, DirFail

From SiFive FU740-C000 Manual:
19 - L2 Cache DirError
20 - L2 Cache DirFail
21 - L2 Cache DataError
22 - L2 Cache DataFail

Signed-off-by: David Abdurachmanov <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
index eeb4f8c3e0e7..d0d206cdb999 100644
--- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
+++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
@@ -272,7 +272,7 @@
cache-size = <2097152>;
cache-unified;
interrupt-parent = <&plic0>;
- interrupts = <19 20 21 22>;
+ interrupts = <19 21 22 20>;
reg = <0x0 0x2010000 0x0 0x1000>;
};
gpio: gpio@10060000 {
--
2.30.2

2021-06-28 14:26:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 061/110] perf/x86: Track pmu in per-CPU cpu_hw_events

From: Kan Liang <[email protected]>

[ Upstream commit 61e76d53c39bb768ad264d379837cfc56b9e35b4 ]

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/events/core.c | 17 +++++++++++++----
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/ds.c | 4 ++--
arch/x86/events/intel/lbr.c | 9 +++++----
arch/x86/events/perf_event.h | 4 +++-
5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 10cadd73a8ac..7050a9ebd73f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -45,9 +45,11 @@
#include "perf_event.h"

struct x86_pmu x86_pmu __read_mostly;
+static struct pmu pmu;

DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
+ .pmu = &pmu,
};

DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
@@ -726,16 +728,23 @@ void x86_pmu_enable_all(int added)
}
}

-static struct pmu pmu;
-
static inline int is_x86_event(struct perf_event *event)
{
return event->pmu == &pmu;
}

-struct pmu *x86_get_pmu(void)
+struct pmu *x86_get_pmu(unsigned int cpu)
{
- return &pmu;
+ struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
+
+ /*
+ * All CPUs of the hybrid type have been offline.
+ * The x86_get_pmu() should not be invoked.
+ */
+ if (WARN_ON_ONCE(!cpuc->pmu))
+ return &pmu;
+
+ return cpuc->pmu;
}
/*
* Event scheduler state:
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4c18e7fb58f5..77fe4fece679 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4879,7 +4879,7 @@ static void update_tfa_sched(void *ignored)
* and if so force schedule out for all event types all contexts
*/
if (test_bit(3, cpuc->active_mask))
- perf_pmu_resched(x86_get_pmu());
+ perf_pmu_resched(x86_get_pmu(smp_processor_id()));
}

static ssize_t show_sysctl_tfa(struct device *cdev,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index d32b302719fe..72df2f392c86 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2192,7 +2192,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
pebs_qual = "-baseline";
- x86_get_pmu()->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
+ x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
x86_pmu.large_pebs_flags &=
@@ -2207,7 +2207,7 @@ void __init intel_ds_init(void)

if (x86_pmu.intel_cap.pebs_output_pt_available) {
pr_cont("PEBS-via-PT, ");
- x86_get_pmu()->capabilities |= PERF_PMU_CAP_AUX_OUTPUT;
+ x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_AUX_OUTPUT;
}

break;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 991715886246..c9cd6ce0fa2a 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -699,7 +699,7 @@ void intel_pmu_lbr_add(struct perf_event *event)

void release_lbr_buffers(void)
{
- struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+ struct kmem_cache *kmem_cache;
struct cpu_hw_events *cpuc;
int cpu;

@@ -708,6 +708,7 @@ void release_lbr_buffers(void)

for_each_possible_cpu(cpu) {
cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+ kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
if (kmem_cache && cpuc->lbr_xsave) {
kmem_cache_free(kmem_cache, cpuc->lbr_xsave);
cpuc->lbr_xsave = NULL;
@@ -1624,7 +1625,7 @@ void intel_pmu_lbr_init_hsw(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;

- x86_get_pmu()->task_ctx_cache = create_lbr_kmem_cache(size, 0);
+ x86_get_pmu(smp_processor_id())->task_ctx_cache = create_lbr_kmem_cache(size, 0);

if (lbr_from_signext_quirk_needed())
static_branch_enable(&lbr_from_quirk_key);
@@ -1644,7 +1645,7 @@ __init void intel_pmu_lbr_init_skl(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;

- x86_get_pmu()->task_ctx_cache = create_lbr_kmem_cache(size, 0);
+ x86_get_pmu(smp_processor_id())->task_ctx_cache = create_lbr_kmem_cache(size, 0);

/*
* SW branch filter usage:
@@ -1741,7 +1742,7 @@ static bool is_arch_lbr_xsave_available(void)

void __init intel_pmu_arch_lbr_init(void)
{
- struct pmu *pmu = x86_get_pmu();
+ struct pmu *pmu = x86_get_pmu(smp_processor_id());
union cpuid28_eax eax;
union cpuid28_ebx ebx;
union cpuid28_ecx ecx;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7888266c76cd..35cdece5644f 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -327,6 +327,8 @@ struct cpu_hw_events {
int n_pair; /* Large increment events */

void *kfree_on_online[X86_PERF_KFREE_MAX];
+
+ struct pmu *pmu;
};

#define __EVENT_CONSTRAINT_RANGE(c, e, n, m, w, o, f) { \
@@ -905,7 +907,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \
.event_str_ht = ht, \
}

-struct pmu *x86_get_pmu(void);
+struct pmu *x86_get_pmu(unsigned int cpu);
extern struct x86_pmu x86_pmu __read_mostly;

static __always_inline struct x86_perf_task_context_opt *task_context_opt(void *ctx)
--
2.30.2

2021-06-28 14:26:11

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 066/110] scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)

From: Christoph Hellwig <[email protected]>

[ Upstream commit d1b7f92035c6fb42529ada531e2cbf3534544c82 ]

While the disk state has nothing to do with partitions, BLKRRPART is used
to force a full revalidate after things like a disk format for historical
reasons. Restore that behavior.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 471bd0af544b ("sd: use bdev_check_media_change")
Reported-by: Xiang Chen <[email protected]>
Tested-by: Xiang Chen <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/sd.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ed0b1bb99f08..a0356f3707b8 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1387,6 +1387,22 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
}
}

+static bool sd_need_revalidate(struct block_device *bdev,
+ struct scsi_disk *sdkp)
+{
+ if (sdkp->device->removable || sdkp->write_prot) {
+ if (bdev_check_media_change(bdev))
+ return true;
+ }
+
+ /*
+ * Force a full rescan after ioctl(BLKRRPART). While the disk state has
+ * nothing to do with partitions, BLKRRPART is used to force a full
+ * revalidate after things like a format for historical reasons.
+ */
+ return test_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state);
+}
+
/**
* sd_open - open a scsi disk device
* @bdev: Block device of the scsi disk to open
@@ -1423,10 +1439,8 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
if (!scsi_block_when_processing_errors(sdev))
goto error_out;

- if (sdev->removable || sdkp->write_prot) {
- if (bdev_check_media_change(bdev))
- sd_revalidate_disk(bdev->bd_disk);
- }
+ if (sd_need_revalidate(bdev, sdkp))
+ sd_revalidate_disk(bdev->bd_disk);

/*
* If the drive is empty, just let the open fail.
--
2.30.2

2021-06-28 14:26:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 084/110] mm/thp: make is_huge_zero_pmd() safe and quicker

From: Hugh Dickins <[email protected]>

commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 upstream.

Most callers of is_huge_zero_pmd() supply a pmd already verified
present; but a few (notably zap_huge_pmd()) do not - it might be a pmd
migration entry, in which the pfn is encoded differently from a present
pmd: which might pass the is_huge_zero_pmd() test (though not on x86,
since L1TF forced us to protect against that); or perhaps even crash in
pmd_page() applied to a swap-like entry.

Make it safe by adding pmd_present() check into is_huge_zero_pmd()
itself; and make it quicker by saving huge_zero_pfn, so that
is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.

__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
but is unnecessary now that is_huge_zero_pmd() checks present.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/huge_mm.h | 8 +++++++-
mm/huge_memory.c | 5 ++++-
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ba973efcd369..6686a0baa91d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -289,6 +289,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);

extern struct page *huge_zero_page;
+extern unsigned long huge_zero_pfn;

static inline bool is_huge_zero_page(struct page *page)
{
@@ -297,7 +298,7 @@ static inline bool is_huge_zero_page(struct page *page)

static inline bool is_huge_zero_pmd(pmd_t pmd)
{
- return is_huge_zero_page(pmd_page(pmd));
+ return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd);
}

static inline bool is_huge_zero_pud(pud_t pud)
@@ -443,6 +444,11 @@ static inline bool is_huge_zero_page(struct page *page)
return false;
}

+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+ return false;
+}
+
static inline bool is_huge_zero_pud(pud_t pud)
{
return false;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cd37a0829881..e1ad01e68aa3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -61,6 +61,7 @@ static struct shrinker deferred_split_shrinker;

static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
+unsigned long huge_zero_pfn __read_mostly = ~0UL;

bool transparent_hugepage_enabled(struct vm_area_struct *vma)
{
@@ -97,6 +98,7 @@ retry:
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));

/* We take additional reference here. It will be put back by shrinker */
atomic_set(&huge_zero_refcount, 2);
@@ -146,6 +148,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) {
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
+ WRITE_ONCE(huge_zero_pfn, ~0UL);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
@@ -2073,7 +2076,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
return;
}

- if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ if (is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
--
2.30.2

2021-06-28 14:26:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 073/110] KVM: do not allow mapping valid but non-reference-counted pages

From: Nicholas Piggin <[email protected]>

commit f8be156be163a052a067306417cd0ff679068c97 upstream.

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on valid pages if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

This addresses CVE-2021-22543.

Signed-off-by: Nicholas Piggin <[email protected]>
Tested-by: Paolo Bonzini <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
virt/kvm/kvm_main.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5cabc6c748db..4cce5735271e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1919,6 +1919,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
return true;
}

+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+ if (kvm_is_reserved_pfn(pfn))
+ return 1;
+ return get_page_unless_zero(pfn_to_page(pfn));
+}
+
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
unsigned long addr, bool *async,
bool write_fault, bool *writable,
@@ -1968,13 +1975,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* Whoever called remap_pfn_range is also going to call e.g.
* unmap_mapping_range before the underlying pages are freed,
* causing a call to our MMU notifier.
+ *
+ * Certain IO or PFNMAP mappings can be backed with valid
+ * struct pages, but be allocated without refcounting e.g.,
+ * tail pages of non-compound higher order allocations, which
+ * would then underflow the refcount when the caller does the
+ * required put_page. Don't allow those pages here.
*/
- kvm_get_pfn(pfn);
+ if (!kvm_try_get_pfn(pfn))
+ r = -EFAULT;

out:
pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
- return 0;
+
+ return r;
}

/*
--
2.30.2

2021-06-28 14:26:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 086/110] mm/thp: fix vma_address() if virtual address below file offset

From: Hugh Dickins <[email protected]>

commit 494334e43c16d63b878536a26505397fce6ff3a2 upstream.

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for
try_to_unmap().

vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low:
vma_address() chooses max(start, vma->vm_start), but that decides on the
wrong address, because start has become almost ULONG_MAX.

Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
be safely used from page_mapped_in_vma() and page_address_in_vma() too.

Add vma_address_end() to apply similar care to end address calculation,
in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.

An irritation is that their apparent generality breaks down on KSM
pages, which cannot be located by the page->index that page_to_pgoff()
uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte()
for ksm pages") once discovered. I dithered over the best thing to do
about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
vma_address() and vma_address_end(); though the only place in danger of
using it on them was try_to_unmap_one().

Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured. try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never
mattered.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/internal.h | 53 ++++++++++++++++++++++++++++++++------------
mm/page_vma_mapped.c | 16 +++++--------
mm/rmap.c | 16 ++++++-------
3 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 1432feec62df..08323e622bbd 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -379,27 +379,52 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page)
extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);

/*
- * At what user virtual address is page expected in @vma?
+ * At what user virtual address is page expected in vma?
+ * Returns -EFAULT if all of the page is outside the range of vma.
+ * If page is a compound head, the entire compound page is considered.
*/
static inline unsigned long
-__vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address(struct page *page, struct vm_area_struct *vma)
{
- pgoff_t pgoff = page_to_pgoff(page);
- return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ pgoff_t pgoff;
+ unsigned long address;
+
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page);
+ if (pgoff >= vma->vm_pgoff) {
+ address = vma->vm_start +
+ ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address >= vma->vm_end)
+ address = -EFAULT;
+ } else if (PageHead(page) &&
+ pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) {
+ /* Test above avoids possibility of wrap to 0 on 32-bit */
+ address = vma->vm_start;
+ } else {
+ address = -EFAULT;
+ }
+ return address;
}

+/*
+ * Then at what user virtual address will none of the page be found in vma?
+ * Assumes that vma_address() already returned a good starting address.
+ * If page is a compound head, the entire compound page is considered.
+ */
static inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address_end(struct page *page, struct vm_area_struct *vma)
{
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;
-
- /* page should be within @vma mapping range */
- VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
-
- return max(start, vma->vm_start);
+ pgoff_t pgoff;
+ unsigned long address;
+
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page) + compound_nr(page);
+ address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address > vma->vm_end)
+ address = vma->vm_end;
+ return address;
}

static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index da6744015abd..f34c065629aa 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -228,18 +228,18 @@ restart:
if (!map_pte(pvmw))
goto next_pte;
while (1) {
+ unsigned long end;
+
if (check_pte(pvmw))
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
return not_found(pvmw);
+ end = vma_address_end(pvmw->page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
- if (pvmw->address >= pvmw->vma->vm_end ||
- pvmw->address >=
- __vma_address(pvmw->page, pvmw->vma) +
- thp_size(pvmw->page))
+ if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
if (pvmw->address % PMD_SIZE == 0) {
@@ -277,14 +277,10 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
.vma = vma,
.flags = PVMW_SYNC,
};
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;

- if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+ pvmw.address = vma_address(page, vma);
+ if (pvmw.address == -EFAULT)
return 0;
- pvmw.address = max(start, vma->vm_start);
if (!page_vma_mapped_walk(&pvmw))
return 0;
page_vma_mapped_walk_done(&pvmw);
diff --git a/mm/rmap.c b/mm/rmap.c
index 735a1b514f83..cc8208b20373 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -707,7 +707,6 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
*/
unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
{
- unsigned long address;
if (PageAnon(page)) {
struct anon_vma *page__anon_vma = page_anon_vma(page);
/*
@@ -722,10 +721,8 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
return -EFAULT;
} else
return -EFAULT;
- address = __vma_address(page, vma);
- if (unlikely(address < vma->vm_start || address >= vma->vm_end))
- return -EFAULT;
- return address;
+
+ return vma_address(page, vma);
}

pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
@@ -919,7 +916,7 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
*/
mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
0, vma, vma->vm_mm, address,
- min(vma->vm_end, address + page_size(page)));
+ vma_address_end(page, vma));
mmu_notifier_invalidate_range_start(&range);

while (page_vma_mapped_walk(&pvmw)) {
@@ -1435,9 +1432,10 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* Note that the page can not be free in this function as call of
* try_to_unmap() must hold a reference on the page.
*/
+ range.end = PageKsm(page) ?
+ address + PAGE_SIZE : vma_address_end(page, vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address,
- min(vma->vm_end, address + page_size(page)));
+ address, range.end);
if (PageHuge(page)) {
/*
* If sharing is possible, start and end will be adjusted
@@ -1889,6 +1887,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(page, vma);

+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();

if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
@@ -1943,6 +1942,7 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
pgoff_start, pgoff_end) {
unsigned long address = vma_address(page, vma);

+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();

if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
--
2.30.2

2021-06-28 14:27:04

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 089/110] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split

From: Yang Shi <[email protected]>

commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 upstream.

When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
it may miss the failure when head page is unmapped but other subpage is
mapped. Then the second DEBUG_VM BUG() that check total mapcount would
catch it. This may incur some confusion.

As this is not a fatal issue, so consolidate the two DEBUG_VM checks
into one VM_WARN_ON_ONCE_PAGE().

[1] https://lore.kernel.org/linux-mm/[email protected]/

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/huge_memory.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c71a61e4c59..44c455dbbd63 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2360,15 +2360,15 @@ static void unmap_page(struct page *page)
{
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
- bool unmap_success;

VM_BUG_ON_PAGE(!PageHead(page), page);

if (PageAnon(page))
ttu_flags |= TTU_SPLIT_FREEZE;

- unmap_success = try_to_unmap(page, ttu_flags);
- VM_BUG_ON_PAGE(!unmap_success, page);
+ try_to_unmap(page, ttu_flags);
+
+ VM_WARN_ON_ONCE_PAGE(page_mapped(page), page);
}

static void remap_page(struct page *page, unsigned int nr)
@@ -2679,7 +2679,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
- int count, mapcount, extra_pins, ret;
+ int extra_pins, ret;
pgoff_t end;

VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2738,7 +2738,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
}

unmap_page(head);
- VM_BUG_ON_PAGE(compound_mapcount(head), head);

/* block interrupt reentry in xa_lock and spinlock */
local_irq_disable();
@@ -2756,9 +2755,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)

/* Prevent deferred_split_scan() touching ->_refcount */
spin_lock(&ds_queue->split_queue_lock);
- count = page_count(head);
- mapcount = total_mapcount(head);
- if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
+ if (page_ref_freeze(head, 1 + extra_pins)) {
if (!list_empty(page_deferred_list(head))) {
ds_queue->split_queue_len--;
list_del(page_deferred_list(head));
@@ -2778,16 +2775,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
__split_huge_page(page, list, end);
ret = 0;
} else {
- if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
- pr_alert("total_mapcount: %u, page_count(): %u\n",
- mapcount, count);
- if (PageTail(page))
- dump_page(head, NULL);
- dump_page(page, "total_mapcount(head) > 0");
- BUG();
- }
spin_unlock(&ds_queue->split_queue_lock);
-fail: if (mapping)
+fail:
+ if (mapping)
xa_unlock(&mapping->i_pages);
local_irq_enable();
remap_page(head, thp_nr_pages(head));
--
2.30.2

2021-06-28 14:28:39

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 091/110] mm: page_vma_mapped_walk(): settle PageHuge on entry

From: Hugh Dickins <[email protected]>

commit 6d0fd5987657cb0c9756ce684e3a74c0f6351728 upstream.

page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of
the way at the start, so no need to worry about it later.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ba24ac8e2377..9085d09c5bc3 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -153,10 +153,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pmd && !pvmw->pte)
return not_found(pvmw);

- if (pvmw->pte)
- goto next_pte;
-
if (unlikely(PageHuge(page))) {
+ /* The only possible mapping was handled on last iteration */
+ if (pvmw->pte)
+ return not_found(pvmw);
+
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -168,6 +169,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
+
+ if (pvmw->pte)
+ goto next_pte;
restart:
pgd = pgd_offset(mm, pvmw->address);
if (!pgd_present(*pgd))
@@ -233,7 +237,7 @@ restart:
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page) || PageHuge(page))
+ if (!PageTransHuge(page))
return not_found(pvmw);
end = vma_address_end(page, pvmw->vma);
do {
--
2.30.2

2021-06-28 14:29:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 099/110] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()

From: Hugh Dickins <[email protected]>

commit a7a69d8ba88d8dcee7ef00e91d413a4bd003a814 upstream.

Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
ptlock in the PVMW_SYNC case? That too might have been responsible for
BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
I've never seen any.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/linux-mm/[email protected]/
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Tested-by: Wang Yugui <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 16f139f83ebf..3350faeb199a 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -276,6 +276,10 @@ next_pte:
goto restart;
}
pvmw->pte++;
+ if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) {
+ pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
+ spin_lock(pvmw->ptl);
+ }
} while (pte_none(*pvmw->pte));

if (!pvmw->ptl) {
--
2.30.2

2021-06-28 14:29:17

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 104/110] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers

From: Naoya Horiguchi <[email protected]>

commit ea6d0630100b285f059d0a8d8e86f38a46407536 upstream.

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later. My testcase (which calls hard-offline on
some tail page in a hugetlb, then accesses the address of the hugetlb
range) showed that page allocation code detects this page lock on buddy
page and printed out "BUG: Bad page state" message.

check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page. So stop locking page again. Actions to be taken depend on the
page type of the error, so page unlocking should be done in ->action()
callbacks. So let's make it assumed and change all existing callbacks
that way.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/memory-failure.c | 44 ++++++++++++++++++++++++++++++--------------
1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2dd60ad81216..4db6f95e55be 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -658,6 +658,7 @@ static int truncate_error_page(struct page *p, unsigned long pfn,
*/
static int me_kernel(struct page *p, unsigned long pfn)
{
+ unlock_page(p);
return MF_IGNORED;
}

@@ -667,6 +668,7 @@ static int me_kernel(struct page *p, unsigned long pfn)
static int me_unknown(struct page *p, unsigned long pfn)
{
pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+ unlock_page(p);
return MF_FAILED;
}

@@ -675,6 +677,7 @@ static int me_unknown(struct page *p, unsigned long pfn)
*/
static int me_pagecache_clean(struct page *p, unsigned long pfn)
{
+ int ret;
struct address_space *mapping;

delete_from_lru_cache(p);
@@ -683,8 +686,10 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
* For anonymous pages we're done the only reference left
* should be the one m_f() holds.
*/
- if (PageAnon(p))
- return MF_RECOVERED;
+ if (PageAnon(p)) {
+ ret = MF_RECOVERED;
+ goto out;
+ }

/*
* Now truncate the page in the page cache. This is really
@@ -698,7 +703,8 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
/*
* Page has been teared down in the meanwhile
*/
- return MF_FAILED;
+ ret = MF_FAILED;
+ goto out;
}

/*
@@ -706,7 +712,10 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
*
* Open: to take i_mutex or not for this? Right now we don't.
*/
- return truncate_error_page(p, pfn, mapping);
+ ret = truncate_error_page(p, pfn, mapping);
+out:
+ unlock_page(p);
+ return ret;
}

/*
@@ -782,24 +791,26 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
*/
static int me_swapcache_dirty(struct page *p, unsigned long pfn)
{
+ int ret;
+
ClearPageDirty(p);
/* Trigger EIO in shmem: */
ClearPageUptodate(p);

- if (!delete_from_lru_cache(p))
- return MF_DELAYED;
- else
- return MF_FAILED;
+ ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
+ unlock_page(p);
+ return ret;
}

static int me_swapcache_clean(struct page *p, unsigned long pfn)
{
+ int ret;
+
delete_from_swap_cache(p);

- if (!delete_from_lru_cache(p))
- return MF_RECOVERED;
- else
- return MF_FAILED;
+ ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
+ unlock_page(p);
+ return ret;
}

/*
@@ -820,6 +831,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, pfn, mapping);
+ unlock_page(hpage);
} else {
res = MF_FAILED;
unlock_page(hpage);
@@ -834,7 +846,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
page_ref_inc(p);
res = MF_RECOVERED;
}
- lock_page(hpage);
}

return res;
@@ -866,6 +877,8 @@ static struct page_state {
unsigned long mask;
unsigned long res;
enum mf_action_page_type type;
+
+ /* Callback ->action() has to unlock the relevant page inside it. */
int (*action)(struct page *p, unsigned long pfn);
} error_states[] = {
{ reserved, reserved, MF_MSG_KERNEL, me_kernel },
@@ -929,6 +942,7 @@ static int page_action(struct page_state *ps, struct page *p,
int result;
int count;

+ /* page p should be unlocked after returning from ps->action(). */
result = ps->action(p, pfn);

count = page_count(p) - 1;
@@ -1313,7 +1327,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
goto out;
}

- res = identify_page_state(pfn, p, page_flags);
+ return identify_page_state(pfn, p, page_flags);
out:
unlock_page(head);
return res;
@@ -1595,6 +1609,8 @@ try_again:

identify_page_state:
res = identify_page_state(pfn, p, page_flags);
+ mutex_unlock(&mf_mutex);
+ return res;
unlock_page:
unlock_page(p);
unlock_mutex:
--
2.30.2

2021-06-28 14:29:17

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 105/110] Revert "drm: add a locked version of drm_is_current_master"

From: Daniel Vetter <[email protected]>

commit f54b3ca7ea1e5e02f481cf4ca54568e57bd66086 upstream.

This reverts commit 1815d9c86e3090477fbde066ff314a7e9721ee0f.

Unfortunately this inverts the locking hierarchy, so back to the
drawing board. Full lockdep splat below:

======================================================
WARNING: possible circular locking dependency detected
5.13.0-rc7-CI-CI_DRM_10254+ #1 Not tainted
------------------------------------------------------
kms_frontbuffer/1087 is trying to acquire lock:
ffff88810dcd01a8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_is_current_master+0x1b/0x40
but task is already holding lock:
ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&dev->mode_config.mutex){+.+.}-{3:3}:
__mutex_lock+0xab/0x970
drm_client_modeset_probe+0x22e/0xca0
__drm_fb_helper_initial_config_and_unlock+0x42/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x380
kthread+0x144/0x170
ret_from_fork+0x1f/0x30
-> #1 (&client->modeset_mutex){+.+.}-{3:3}:
__mutex_lock+0xab/0x970
drm_client_modeset_commit_locked+0x1c/0x180
drm_client_modeset_commit+0x1c/0x40
__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
drm_fb_helper_set_par+0x34/0x40
intel_fbdev_set_par+0x11/0x40 [i915]
fbcon_init+0x270/0x4f0
visual_init+0xc6/0x130
do_bind_con_driver+0x1e5/0x2d0
do_take_over_console+0x10e/0x180
do_fbcon_takeover+0x53/0xb0
register_framebuffer+0x22d/0x310
__drm_fb_helper_initial_config_and_unlock+0x36c/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x380
kthread+0x144/0x170
ret_from_fork+0x1f/0x30
-> #0 (&dev->master_mutex){+.+.}-{3:3}:
__lock_acquire+0x151e/0x2590
lock_acquire+0xd1/0x3d0
__mutex_lock+0xab/0x970
drm_is_current_master+0x1b/0x40
drm_mode_getconnector+0x37e/0x4a0
drm_ioctl_kernel+0xa8/0xf0
drm_ioctl+0x1e8/0x390
__x64_sys_ioctl+0x6a/0xa0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Chain exists of: &dev->master_mutex --> &client->modeset_mutex --> &dev->mode_config.mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->mode_config.mutex);
lock(&client->modeset_mutex);
lock(&dev->mode_config.mutex);
lock(&dev->master_mutex);
---
drivers/gpu/drm/drm_auth.c | 51 ++++++++++++++------------------------
1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 86d4b72e95cb..232abbba3686 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -61,35 +61,6 @@
* trusted clients.
*/

-static bool drm_is_current_master_locked(struct drm_file *fpriv)
-{
- lockdep_assert_held_once(&fpriv->master->dev->master_mutex);
-
- return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
-}
-
-/**
- * drm_is_current_master - checks whether @priv is the current master
- * @fpriv: DRM file private
- *
- * Checks whether @fpriv is current master on its device. This decides whether a
- * client is allowed to run DRM_MASTER IOCTLs.
- *
- * Most of the modern IOCTL which require DRM_MASTER are for kernel modesetting
- * - the current master is assumed to own the non-shareable display hardware.
- */
-bool drm_is_current_master(struct drm_file *fpriv)
-{
- bool ret;
-
- mutex_lock(&fpriv->master->dev->master_mutex);
- ret = drm_is_current_master_locked(fpriv);
- mutex_unlock(&fpriv->master->dev->master_mutex);
-
- return ret;
-}
-EXPORT_SYMBOL(drm_is_current_master);
-
int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
{
struct drm_auth *auth = data;
@@ -252,7 +223,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,
if (ret)
goto out_unlock;

- if (drm_is_current_master_locked(file_priv))
+ if (drm_is_current_master(file_priv))
goto out_unlock;

if (dev->master) {
@@ -301,7 +272,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
if (ret)
goto out_unlock;

- if (!drm_is_current_master_locked(file_priv)) {
+ if (!drm_is_current_master(file_priv)) {
ret = -EINVAL;
goto out_unlock;
}
@@ -350,7 +321,7 @@ void drm_master_release(struct drm_file *file_priv)
if (file_priv->magic)
idr_remove(&file_priv->master->magic_map, file_priv->magic);

- if (!drm_is_current_master_locked(file_priv))
+ if (!drm_is_current_master(file_priv))
goto out;

drm_legacy_lock_master_cleanup(dev, master);
@@ -371,6 +342,22 @@ out:
mutex_unlock(&dev->master_mutex);
}

+/**
+ * drm_is_current_master - checks whether @priv is the current master
+ * @fpriv: DRM file private
+ *
+ * Checks whether @fpriv is current master on its device. This decides whether a
+ * client is allowed to run DRM_MASTER IOCTLs.
+ *
+ * Most of the modern IOCTL which require DRM_MASTER are for kernel modesetting
+ * - the current master is assumed to own the non-shareable display hardware.
+ */
+bool drm_is_current_master(struct drm_file *fpriv)
+{
+ return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master;
+}
+EXPORT_SYMBOL(drm_is_current_master);
+
/**
* drm_master_get - reference a master pointer
* @master: &struct drm_master
--
2.30.2

2021-06-28 14:29:26

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 088/110] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()

From: Hugh Dickins <[email protected]>

commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc upstream.

There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted. This commit fixes that, but not
in the way I hoped.

The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.

However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap. Set that aside.

The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end. Nice. But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage. It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too). Set that aside.

Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.

This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details. Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but
safe.

Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not. Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: fc127da085c2 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/mm.h | 3 +++
mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++
mm/truncate.c | 43 +++++++++++++++++++------------------------
3 files changed, 63 insertions(+), 24 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6c1b29bb3563..cfb0842a7fb9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1680,6 +1680,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
+ struct page *single_page; /* Locked page to be unmapped */
};

struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
@@ -1727,6 +1728,7 @@ extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
extern int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
+void unmap_mapping_page(struct page *page);
void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows);
void unmap_mapping_range(struct address_space *mapping,
@@ -1747,6 +1749,7 @@ static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address,
BUG();
return -EFAULT;
}
+static inline void unmap_mapping_page(struct page *page) { }
static inline void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows) { }
static inline void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
index 14a6c66b3748..36624986130b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
else if (zap_huge_pmd(tlb, vma, pmd, addr))
goto next;
/* fall through */
+ } else if (details && details->single_page &&
+ PageTransCompound(details->single_page) &&
+ next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) {
+ spinlock_t *ptl = pmd_lock(tlb->mm, pmd);
+ /*
+ * Take and drop THP pmd lock so that we cannot return
+ * prematurely, while zap_huge_pmd() has cleared *pmd,
+ * but not yet decremented compound_mapcount().
+ */
+ spin_unlock(ptl);
}
+
/*
* Here there can be other concurrent MADV_DONTNEED or
* trans huge page faults running, and if the pmd is
@@ -3193,6 +3204,36 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
}
}

+/**
+ * unmap_mapping_page() - Unmap single page from processes.
+ * @page: The locked page to be unmapped.
+ *
+ * Unmap this page from any userspace process which still has it mmaped.
+ * Typically, for efficiency, the range of nearby pages has already been
+ * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once
+ * truncation or invalidation holds the lock on a page, it may find that
+ * the page has been remapped again: and then uses unmap_mapping_page()
+ * to unmap it finally.
+ */
+void unmap_mapping_page(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct zap_details details = { };
+
+ VM_BUG_ON(!PageLocked(page));
+ VM_BUG_ON(PageTail(page));
+
+ details.check_mapping = mapping;
+ details.first_index = page->index;
+ details.last_index = page->index + thp_nr_pages(page) - 1;
+ details.single_page = page;
+
+ i_mmap_lock_write(mapping);
+ if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
+ unmap_mapping_range_tree(&mapping->i_mmap, &details);
+ i_mmap_unlock_write(mapping);
+}
+
/**
* unmap_mapping_pages() - Unmap pages from processes.
* @mapping: The address space containing pages to be unmapped.
diff --git a/mm/truncate.c b/mm/truncate.c
index 455944264663..bf092be0a6f0 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -168,13 +168,10 @@ void do_invalidatepage(struct page *page, unsigned int offset,
* its lock, b) when a concurrent invalidate_mapping_pages got there first and
* c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
*/
-static void
-truncate_cleanup_page(struct address_space *mapping, struct page *page)
+static void truncate_cleanup_page(struct page *page)
{
- if (page_mapped(page)) {
- unsigned int nr = thp_nr_pages(page);
- unmap_mapping_pages(mapping, page->index, nr, false);
- }
+ if (page_mapped(page))
+ unmap_mapping_page(page);

if (page_has_private(page))
do_invalidatepage(page, 0, thp_size(page));
@@ -219,7 +216,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
if (page->mapping != mapping)
return -EIO;

- truncate_cleanup_page(mapping, page);
+ truncate_cleanup_page(page);
delete_from_page_cache(page);
return 0;
}
@@ -326,7 +323,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
index = indices[pagevec_count(&pvec) - 1] + 1;
truncate_exceptional_pvec_entries(mapping, &pvec, indices);
for (i = 0; i < pagevec_count(&pvec); i++)
- truncate_cleanup_page(mapping, pvec.pages[i]);
+ truncate_cleanup_page(pvec.pages[i]);
delete_from_page_cache_batch(mapping, &pvec);
for (i = 0; i < pagevec_count(&pvec); i++)
unlock_page(pvec.pages[i]);
@@ -652,6 +649,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
continue;
}

+ if (!did_range_unmap && page_mapped(page)) {
+ /*
+ * If page is mapped, before taking its lock,
+ * zap the rest of the file in one hit.
+ */
+ unmap_mapping_pages(mapping, index,
+ (1 + end - index), false);
+ did_range_unmap = 1;
+ }
+
lock_page(page);
WARN_ON(page_to_index(page) != index);
if (page->mapping != mapping) {
@@ -659,23 +666,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
continue;
}
wait_on_page_writeback(page);
- if (page_mapped(page)) {
- if (!did_range_unmap) {
- /*
- * Zap the rest of the file in one hit.
- */
- unmap_mapping_pages(mapping, index,
- (1 + end - index), false);
- did_range_unmap = 1;
- } else {
- /*
- * Just zap this page
- */
- unmap_mapping_pages(mapping, index,
- 1, false);
- }
- }
+
+ if (page_mapped(page))
+ unmap_mapping_page(page);
BUG_ON(page_mapped(page));
+
ret2 = do_launder_page(mapping, page);
if (ret2 == 0) {
if (!invalidate_complete_page2(mapping, page))
--
2.30.2

2021-06-28 14:29:41

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 096/110] mm: page_vma_mapped_walk(): use goto instead of while (1)

From: Hugh Dickins <[email protected]>

commit 474466301dfd8b39a10c01db740645f3f7ae9a28 upstream.

page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
and use "goto this_pte", in place of the "while (1)" loop at the end.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 4a8a069c5e87..8803dd93c049 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -144,6 +144,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
{
struct mm_struct *mm = pvmw->vma->vm_mm;
struct page *page = pvmw->page;
+ unsigned long end;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
@@ -233,10 +234,7 @@ restart:
}
if (!map_pte(pvmw))
goto next_pte;
- }
- while (1) {
- unsigned long end;
-
+this_pte:
if (check_pte(pvmw))
return true;
next_pte:
@@ -265,6 +263,7 @@ next_pte:
pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
spin_lock(pvmw->ptl);
}
+ goto this_pte;
}
}

--
2.30.2

2021-06-28 14:30:32

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 110/110] Linux 5.12.14-rc1

Signed-off-by: Sasha Levin <[email protected]>
---
Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index d2fe36db78ae..a4b63b0f262b 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 12
-SUBLEVEL = 13
-EXTRAVERSION =
+SUBLEVEL = 14
+EXTRAVERSION = -rc1
NAME = Frozen Wasteland

# *DOCUMENTATION*
--
2.30.2

2021-06-28 14:30:45

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 101/110] KVM: SVM: Call SEV Guest Decommission if ASID binding fails

From: Alper Gun <[email protected]>

commit 934002cd660b035b926438244b4294e647507e13 upstream.

Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.

The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.

Cc: [email protected]
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <[email protected]>
Signed-off-by: Alper Gun <[email protected]>
Reviewed-by: Marc Orr <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/svm/sev.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dbc6214d69de..8f3b438f6fd3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -143,9 +143,25 @@ static void sev_asid_free(int asid)
mutex_unlock(&sev_bitmap_lock);
}

-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+static void sev_decommission(unsigned int handle)
{
struct sev_data_decommission *decommission;
+
+ if (!handle)
+ return;
+
+ decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
+ if (!decommission)
+ return;
+
+ decommission->handle = handle;
+ sev_guest_decommission(decommission, NULL);
+
+ kfree(decommission);
+}
+
+static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+{
struct sev_data_deactivate *data;

if (!handle)
@@ -165,15 +181,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)

kfree(data);

- decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
- if (!decommission)
- return;
-
- /* decommission handle */
- decommission->handle = handle;
- sev_guest_decommission(decommission, NULL);
-
- kfree(decommission);
+ sev_decommission(handle);
}

static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -303,8 +311,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)

/* Bind ASID to this guest */
ret = sev_bind_asid(kvm, start->handle, error);
- if (ret)
+ if (ret) {
+ sev_decommission(start->handle);
goto e_free_session;
+ }

/* return handle to userspace */
params.handle = start->handle;
--
2.30.2

2021-06-28 15:34:56

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 5.12 004/110] drm: add a locked version of drm_is_current_master

On Mon, Jun 28, 2021 at 04:01:46PM +0100, Emil Velikov wrote:
> Hi Sasha, Greg,
>
> On Mon, 28 Jun 2021 at 15:18, Sasha Levin <[email protected]> wrote:
> >
> > From: Desmond Cheong Zhi Xi <[email protected]>
> >
> > commit 1815d9c86e3090477fbde066ff314a7e9721ee0f upstream.
> >
>
> Please drop this patch from all stable trees. See following drm-misc
> revert for details:
> https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=f54b3ca7ea1e5e02f481cf4ca54568e57bd66086

The revert is later in the series already :)

thanks,

greg k-h

2021-06-28 15:47:07

by Emil Velikov

[permalink] [raw]
Subject: Re: [PATCH 5.12 004/110] drm: add a locked version of drm_is_current_master

Hi Sasha, Greg,

On Mon, 28 Jun 2021 at 15:18, Sasha Levin <[email protected]> wrote:
>
> From: Desmond Cheong Zhi Xi <[email protected]>
>
> commit 1815d9c86e3090477fbde066ff314a7e9721ee0f upstream.
>

Please drop this patch from all stable trees. See following drm-misc
revert for details:
https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=f54b3ca7ea1e5e02f481cf4ca54568e57bd66086

-Emil

2021-06-28 16:32:07

by Chrisanthus, Anitha

[permalink] [raw]
Subject: RE: [PATCH 5.12 019/110] drm/kmb: Fix error return code in kmb_hw_init()

This patch is already pushed to drm-misc-fixes. Please check for existing patches in the dri-devel mail list before sending patches.

Thanks,
Anitha

> -----Original Message-----
> From: Sasha Levin <[email protected]>
> Sent: Monday, June 28, 2021 7:17 AM
> To: [email protected]; [email protected]
> Cc: Zhen Lei <[email protected]>; Hulk Robot
> <[email protected]>; Chrisanthus, Anitha <[email protected]>;
> Sasha Levin <[email protected]>
> Subject: [PATCH 5.12 019/110] drm/kmb: Fix error return code in
> kmb_hw_init()
>
> From: Zhen Lei <[email protected]>
>
> [ Upstream commit 6fd8f323b3e4e5290d02174559308669507c00dd ]
>
> When the call to platform_get_irq() to obtain the IRQ of the lcd fails, the
> returned error code should be propagated. However, we currently do not
> explicitly assign this error code to 'ret'. As a result, 0 was incorrectly
> returned.
>
> Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
> Reported-by: Hulk Robot <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Signed-off-by: Anitha Chrisanthus <[email protected]>
> Link: https://patchwork.freedesktop.org/patch/msgid/20210513134639.6541-
> [email protected]
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> drivers/gpu/drm/kmb/kmb_drv.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/kmb/kmb_drv.c
> b/drivers/gpu/drm/kmb/kmb_drv.c
> index f64e06e1067d..96ea1a2c11dd 100644
> --- a/drivers/gpu/drm/kmb/kmb_drv.c
> +++ b/drivers/gpu/drm/kmb/kmb_drv.c
> @@ -137,6 +137,7 @@ static int kmb_hw_init(struct drm_device *drm,
> unsigned long flags)
> /* Allocate LCD interrupt resources */
> irq_lcd = platform_get_irq(pdev, 0);
> if (irq_lcd < 0) {
> + ret = irq_lcd;
> drm_err(&kmb->drm, "irq_lcd not found");
> goto setup_fail;
> }
> --
> 2.30.2

2021-06-28 17:08:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 022/110] dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc()

From: Yu Kuai <[email protected]>

[ Upstream commit 8982d48af36d2562c0f904736b0fc80efc9f2532 ]

pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yu Kuai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/xilinx/zynqmp_dma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/xilinx/zynqmp_dma.c b/drivers/dma/xilinx/zynqmp_dma.c
index d8419565b92c..5fecf5aa6e85 100644
--- a/drivers/dma/xilinx/zynqmp_dma.c
+++ b/drivers/dma/xilinx/zynqmp_dma.c
@@ -468,7 +468,7 @@ static int zynqmp_dma_alloc_chan_resources(struct dma_chan *dchan)
struct zynqmp_dma_desc_sw *desc;
int i, ret;

- ret = pm_runtime_get_sync(chan->dev);
+ ret = pm_runtime_resume_and_get(chan->dev);
if (ret < 0)
return ret;

--
2.30.2

2021-06-28 17:21:58

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 030/110] dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe()

From: Zou Wei <[email protected]>

[ Upstream commit dea8464ddf553803382efb753b6727dbf3931d06 ]

pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/dma/sh/rcar-dmac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index d530c1bf11d9..6885b3dcd7a9 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -1913,7 +1913,7 @@ static int rcar_dmac_probe(struct platform_device *pdev)

/* Enable runtime PM and initialize the device. */
pm_runtime_enable(&pdev->dev);
- ret = pm_runtime_get_sync(&pdev->dev);
+ ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0) {
dev_err(&pdev->dev, "runtime PM get sync failed (%d)\n", ret);
return ret;
--
2.30.2

2021-06-28 17:29:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 041/110] net/packet: annotate data race in packet_sendmsg()

From: Eric Dumazet <[email protected]>

[ Upstream commit d1b5bee4c8be01585033be9b3a8878789285285f ]

There is a known race in packet_sendmsg(), addressed
in commit 32d3182cd2cd ("net/packet: fix race in tpacket_snd()")

Now we have data_race(), we can use it to avoid a future KCSAN warning,
as syzbot loves stressing af_packet sockets :)

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/packet/af_packet.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c52557ec7fb3..84d8921391c3 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3034,10 +3034,13 @@ static int packet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
struct sock *sk = sock->sk;
struct packet_sock *po = pkt_sk(sk);

- if (po->tx_ring.pg_vec)
+ /* Reading tx_ring.pg_vec without holding pg_vec_lock is racy.
+ * tpacket_snd() will redo the check safely.
+ */
+ if (data_race(po->tx_ring.pg_vec))
return tpacket_snd(po, msg);
- else
- return packet_snd(sock, msg, len);
+
+ return packet_snd(sock, msg, len);
}

/*
--
2.30.2

2021-06-28 17:29:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 042/110] net: phy: dp83867: perform soft reset and retain established link

From: Praneeth Bajjuri <[email protected]>

[ Upstream commit da9ef50f545f86ffe6ff786174d26500c4db737a ]

Current logic is performing hard reset and causing the programmed
registers to be wiped out.

as per datasheet: https://www.ti.com/lit/ds/symlink/dp83867cr.pdf
8.6.26 Control Register (CTRL)

do SW_RESTART to perform a reset not including the registers,
If performed when link is already present,
it will drop the link and trigger re-auto negotiation.

Signed-off-by: Praneeth Bajjuri <[email protected]>
Signed-off-by: Geet Modi <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/phy/dp83867.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 9bd9a5c0b1db..6bbc81ad295f 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -826,16 +826,12 @@ static int dp83867_phy_reset(struct phy_device *phydev)
{
int err;

- err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET);
+ err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESTART);
if (err < 0)
return err;

usleep_range(10, 20);

- /* After reset FORCE_LINK_GOOD bit is set. Although the
- * default value should be unset. Disable FORCE_LINK_GOOD
- * for the phy to work properly.
- */
return phy_modify(phydev, MII_DP83867_PHYCTRL,
DP83867_PHYCR_FORCE_LINK_GOOD, 0);
}
--
2.30.2

2021-06-28 17:29:36

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 045/110] bpf, selftests: Adjust few selftest outcomes wrt unreachable code

From: Daniel Borkmann <[email protected]>

[ Upstream commit 973377ffe8148180b2651825b92ae91988141b05 ]

In almost all cases from test_verifier that have been changed in here, we've
had an unreachable path with a load from a register which has an invalid
address on purpose. This was basically to make sure that we never walk this
path and to have the verifier complain if it would otherwise. Change it to
match on the right error for unprivileged given we now test these paths
under speculative execution.

There's one case where we match on exact # of insns_processed. Due to the
extra path, this will of course mismatch on unprivileged. Thus, restrict the
test->insn_processed check to privileged-only.

In one other case, we result in a 'pointer comparison prohibited' error. This
is similarly due to verifying an 'invalid' branch where we end up with a value
pointer on one side of the comparison.

Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/bpf/test_verifier.c | 2 +-
tools/testing/selftests/bpf/verifier/and.c | 2 ++
tools/testing/selftests/bpf/verifier/bounds.c | 14 ++++++++++++
.../selftests/bpf/verifier/dead_code.c | 2 ++
tools/testing/selftests/bpf/verifier/jmp32.c | 22 +++++++++++++++++++
tools/testing/selftests/bpf/verifier/jset.c | 10 +++++----
tools/testing/selftests/bpf/verifier/unpriv.c | 2 ++
.../selftests/bpf/verifier/value_ptr_arith.c | 7 +++---
8 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 58b5a349d3ba..ea3158b0d551 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1147,7 +1147,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
}
}

- if (test->insn_processed) {
+ if (!unpriv && test->insn_processed) {
uint32_t insn_processed;
char *proc;

diff --git a/tools/testing/selftests/bpf/verifier/and.c b/tools/testing/selftests/bpf/verifier/and.c
index ca8fdb1b3f01..7d7ebee5cc7a 100644
--- a/tools/testing/selftests/bpf/verifier/and.c
+++ b/tools/testing/selftests/bpf/verifier/and.c
@@ -61,6 +61,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R1 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 0
},
diff --git a/tools/testing/selftests/bpf/verifier/bounds.c b/tools/testing/selftests/bpf/verifier/bounds.c
index 8a1caf46ffbc..e061e8799ce2 100644
--- a/tools/testing/selftests/bpf/verifier/bounds.c
+++ b/tools/testing/selftests/bpf/verifier/bounds.c
@@ -508,6 +508,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, -1),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT
},
{
@@ -528,6 +530,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, -1),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT
},
{
@@ -569,6 +573,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 min value is outside of the allowed memory range",
+ .result_unpriv = REJECT,
.fixup_map_hash_8b = { 3 },
.result = ACCEPT,
},
@@ -589,6 +595,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 min value is outside of the allowed memory range",
+ .result_unpriv = REJECT,
.fixup_map_hash_8b = { 3 },
.result = ACCEPT,
},
@@ -609,6 +617,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 min value is outside of the allowed memory range",
+ .result_unpriv = REJECT,
.fixup_map_hash_8b = { 3 },
.result = ACCEPT,
},
@@ -674,6 +684,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 min value is outside of the allowed memory range",
+ .result_unpriv = REJECT,
.fixup_map_hash_8b = { 3 },
.result = ACCEPT,
},
@@ -695,6 +707,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 min value is outside of the allowed memory range",
+ .result_unpriv = REJECT,
.fixup_map_hash_8b = { 3 },
.result = ACCEPT,
},
diff --git a/tools/testing/selftests/bpf/verifier/dead_code.c b/tools/testing/selftests/bpf/verifier/dead_code.c
index 5cf361d8eb1c..721ec9391be5 100644
--- a/tools/testing/selftests/bpf/verifier/dead_code.c
+++ b/tools/testing/selftests/bpf/verifier/dead_code.c
@@ -8,6 +8,8 @@
BPF_JMP_IMM(BPF_JGE, BPF_REG_0, 10, -4),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 7,
},
diff --git a/tools/testing/selftests/bpf/verifier/jmp32.c b/tools/testing/selftests/bpf/verifier/jmp32.c
index bd5cae4a7f73..1c857b2fbdf0 100644
--- a/tools/testing/selftests/bpf/verifier/jmp32.c
+++ b/tools/testing/selftests/bpf/verifier/jmp32.c
@@ -87,6 +87,8 @@
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
},
{
@@ -150,6 +152,8 @@
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
},
{
@@ -213,6 +217,8 @@
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
},
{
@@ -280,6 +286,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -348,6 +356,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -416,6 +426,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -484,6 +496,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -552,6 +566,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -620,6 +636,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -688,6 +706,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
@@ -756,6 +776,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 2,
},
diff --git a/tools/testing/selftests/bpf/verifier/jset.c b/tools/testing/selftests/bpf/verifier/jset.c
index 8dcd4e0383d5..11fc68da735e 100644
--- a/tools/testing/selftests/bpf/verifier/jset.c
+++ b/tools/testing/selftests/bpf/verifier/jset.c
@@ -82,8 +82,8 @@
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
- .retval_unpriv = 1,
- .result_unpriv = ACCEPT,
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.retval = 1,
.result = ACCEPT,
},
@@ -141,7 +141,8 @@
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
- .result_unpriv = ACCEPT,
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
},
{
@@ -162,6 +163,7 @@
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
- .result_unpriv = ACCEPT,
+ .errstr_unpriv = "R9 !read_ok",
+ .result_unpriv = REJECT,
.result = ACCEPT,
},
diff --git a/tools/testing/selftests/bpf/verifier/unpriv.c b/tools/testing/selftests/bpf/verifier/unpriv.c
index bd436df5cc32..111801aea5e3 100644
--- a/tools/testing/selftests/bpf/verifier/unpriv.c
+++ b/tools/testing/selftests/bpf/verifier/unpriv.c
@@ -420,6 +420,8 @@
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, 0),
BPF_EXIT_INSN(),
},
+ .errstr_unpriv = "R7 invalid mem access 'inv'",
+ .result_unpriv = REJECT,
.result = ACCEPT,
.retval = 0,
},
diff --git a/tools/testing/selftests/bpf/verifier/value_ptr_arith.c b/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
index 7ae2859d495c..a3e593ddfafc 100644
--- a/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
+++ b/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
@@ -120,7 +120,7 @@
.fixup_map_array_48b = { 1 },
.result = ACCEPT,
.result_unpriv = REJECT,
- .errstr_unpriv = "R2 tried to add from different maps, paths or scalars",
+ .errstr_unpriv = "R2 pointer comparison prohibited",
.retval = 0,
},
{
@@ -159,7 +159,8 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
// fake-dead code; targeted from branch A to
- // prevent dead code sanitization
+ // prevent dead code sanitization, rejected
+ // via branch B however
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
@@ -167,7 +168,7 @@
.fixup_map_array_48b = { 1 },
.result = ACCEPT,
.result_unpriv = REJECT,
- .errstr_unpriv = "R2 tried to add from different maps, paths or scalars",
+ .errstr_unpriv = "R0 invalid mem access 'inv'",
.retval = 0,
},
{
--
2.30.2

2021-06-28 17:32:04

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 050/110] sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS

From: Kees Cook <[email protected]>

[ Upstream commit 224004fbb033600715dbd626bceec10bfd9c58bc ]

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.

The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/renesas/sh_eth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index f029c7c03804..393cf99856ed 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2287,7 +2287,7 @@ static void sh_eth_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
{
switch (stringset) {
case ETH_SS_STATS:
- memcpy(data, *sh_eth_gstrings_stats,
+ memcpy(data, sh_eth_gstrings_stats,
sizeof(sh_eth_gstrings_stats));
break;
}
--
2.30.2

2021-06-28 17:32:29

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 052/110] KVM: selftests: Fix kvm_check_cap() assertion

From: Fuad Tabba <[email protected]>

[ Upstream commit d8ac05ea13d789d5491a5920d70a05659015441d ]

KVM_CHECK_EXTENSION ioctl can return any negative value on error,
and not necessarily -1. Change the assertion to reflect that.

Signed-off-by: Fuad Tabba <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 2f0e4365f61b..8b90256bca96 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -58,7 +58,7 @@ int kvm_check_cap(long cap)
exit(KSFT_SKIP);

ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, cap);
- TEST_ASSERT(ret != -1, "KVM_CHECK_EXTENSION IOCTL failed,\n"
+ TEST_ASSERT(ret >= 0, "KVM_CHECK_EXTENSION IOCTL failed,\n"
" rc: %i errno: %i", ret, errno);

close(kvm_fd);
--
2.30.2

2021-06-28 17:34:44

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 058/110] net: ll_temac: Add memory-barriers for TX BD access

From: Esben Haabendal <[email protected]>

[ Upstream commit 28d9fab458b16bcd83f9dd07ede3d585c3e1a69e ]

Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.

In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.

When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.

Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.

Signed-off-by: Esben Haabendal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/xilinx/ll_temac_main.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c
index 01bb36e7cff0..b105e1d35d15 100644
--- a/drivers/net/ethernet/xilinx/ll_temac_main.c
+++ b/drivers/net/ethernet/xilinx/ll_temac_main.c
@@ -774,12 +774,15 @@ static void temac_start_xmit_done(struct net_device *ndev)
stat = be32_to_cpu(cur_p->app0);

while (stat & STS_CTRL_APP0_CMPLT) {
+ /* Make sure that the other fields are read after bd is
+ * released by dma
+ */
+ rmb();
dma_unmap_single(ndev->dev.parent, be32_to_cpu(cur_p->phys),
be32_to_cpu(cur_p->len), DMA_TO_DEVICE);
skb = (struct sk_buff *)ptr_from_txbd(cur_p);
if (skb)
dev_consume_skb_irq(skb);
- cur_p->app0 = 0;
cur_p->app1 = 0;
cur_p->app2 = 0;
cur_p->app3 = 0;
@@ -788,6 +791,12 @@ static void temac_start_xmit_done(struct net_device *ndev)
ndev->stats.tx_packets++;
ndev->stats.tx_bytes += be32_to_cpu(cur_p->len);

+ /* app0 must be visible last, as it is used to flag
+ * availability of the bd
+ */
+ smp_mb();
+ cur_p->app0 = 0;
+
lp->tx_bd_ci++;
if (lp->tx_bd_ci >= lp->tx_bd_num)
lp->tx_bd_ci = 0;
@@ -814,6 +823,9 @@ static inline int temac_check_tx_bd_space(struct temac_local *lp, int num_frag)
if (cur_p->app0)
return NETDEV_TX_BUSY;

+ /* Make sure to read next bd app0 after this one */
+ rmb();
+
tail++;
if (tail >= lp->tx_bd_num)
tail = 0;
--
2.30.2

2021-06-28 17:36:25

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 068/110] nilfs2: fix memory leak in nilfs_sysfs_delete_device_group

From: Pavel Skripkin <[email protected]>

[ Upstream commit 8fd0c1b0647a6bda4067ee0cd61e8395954b6f28 ]

My local syzbot instance hit memory leak in nilfs2. The problem was in
missing kobject_put() in nilfs_sysfs_delete_device_group().

kobject_del() does not call kobject_cleanup() for passed kobject and it
leads to leaking duped kobject name if kobject_put() was not called.

Fail log:

BUG: memory leak
unreferenced object 0xffff8880596171e0 (size 8):
comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s)
hex dump (first 8 bytes):
6c 6f 6f 70 30 00 00 00 loop0...
backtrace:
kstrdup+0x36/0x70 mm/util.c:60
kstrdup_const+0x53/0x80 mm/util.c:83
kvasprintf_const+0x108/0x190 lib/kasprintf.c:48
kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289
kobject_add_varg lib/kobject.c:384 [inline]
kobject_init_and_add+0xc9/0x160 lib/kobject.c:473
nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999
init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637

Link: https://lkml.kernel.org/r/[email protected]
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Pavel Skripkin <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Cc: Michael L. Semon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/nilfs2/sysfs.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c
index 303d71430bdd..9c6c0e2e5880 100644
--- a/fs/nilfs2/sysfs.c
+++ b/fs/nilfs2/sysfs.c
@@ -1053,6 +1053,7 @@ void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs)
nilfs_sysfs_delete_superblock_group(nilfs);
nilfs_sysfs_delete_segctor_group(nilfs);
kobject_del(&nilfs->ns_dev_kobj);
+ kobject_put(&nilfs->ns_dev_kobj);
kfree(nilfs->ns_dev_subgroups);
}

--
2.30.2

2021-06-28 17:39:36

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 044/110] net: caif: fix memory leak in ldisc_open

From: Pavel Skripkin <[email protected]>

[ Upstream commit 58af3d3d54e87bfc1f936e16c04ade3369d34011 ]

Syzbot reported memory leak in tty_init_dev().
The problem was in unputted tty in ldisc_open()

static int ldisc_open(struct tty_struct *tty)
{
...
ser->tty = tty_kref_get(tty);
...
result = register_netdevice(dev);
if (result) {
rtnl_unlock();
free_netdev(dev);
return -ENODEV;
}
...
}

Ser pointer is netdev private_data, so after free_netdev()
this pointer goes away with unputted tty reference. So, fix
it by adding tty_kref_put() before freeing netdev.

Reported-and-tested-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/caif/caif_serial.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/caif/caif_serial.c b/drivers/net/caif/caif_serial.c
index 9f30748da4ab..8c38f224becb 100644
--- a/drivers/net/caif/caif_serial.c
+++ b/drivers/net/caif/caif_serial.c
@@ -350,6 +350,7 @@ static int ldisc_open(struct tty_struct *tty)
rtnl_lock();
result = register_netdevice(dev);
if (result) {
+ tty_kref_put(tty);
rtnl_unlock();
free_netdev(dev);
return -ENODEV;
--
2.30.2

2021-06-28 17:40:44

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 054/110] mac80211: reset profile_periodicity/ema_ap

From: Johannes Berg <[email protected]>

[ Upstream commit bbc6f03ff26e7b71d6135a7b78ce40e7dee3d86a ]

Apparently we never clear these values, so they'll remain set
since the setting of them is conditional. Clear the values in
the relevant other cases.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.316e32d136a9.I2a12e51814258e1e1b526103894f4b9f19a91c8d@changeid
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/mac80211/mlme.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 0fe91dc9817e..437d88822d8f 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -4062,10 +4062,14 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata,
if (elems.mbssid_config_ie)
bss_conf->profile_periodicity =
elems.mbssid_config_ie->profile_periodicity;
+ else
+ bss_conf->profile_periodicity = 0;

if (elems.ext_capab_len >= 11 &&
(elems.ext_capab[10] & WLAN_EXT_CAPA11_EMA_SUPPORT))
bss_conf->ema_ap = true;
+ else
+ bss_conf->ema_ap = false;

/* continue assoc process */
ifmgd->assoc_data->timeout = jiffies;
@@ -5802,12 +5806,16 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
beacon_ies->data, beacon_ies->len);
if (elem && elem->datalen >= 3)
sdata->vif.bss_conf.profile_periodicity = elem->data[2];
+ else
+ sdata->vif.bss_conf.profile_periodicity = 0;

elem = cfg80211_find_elem(WLAN_EID_EXT_CAPABILITY,
beacon_ies->data, beacon_ies->len);
if (elem && elem->datalen >= 11 &&
(elem->data[10] & WLAN_EXT_CAPA11_EMA_SUPPORT))
sdata->vif.bss_conf.ema_ap = true;
+ else
+ sdata->vif.bss_conf.ema_ap = false;
} else {
assoc_data->timeout = jiffies;
assoc_data->timeout_started = true;
--
2.30.2

2021-06-28 17:41:15

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 056/110] recordmcount: Correct st_shndx handling

From: Peter Zijlstra <[email protected]>

[ Upstream commit fb780761e7bd9f2e94f5b9a296ead6b35b944206 ]

One should only use st_shndx when >SHN_UNDEF and <SHN_LORESERVE. When
SHN_XINDEX, then use .symtab_shndx. Otherwise use 0.

This handles the case: st_shndx >= SHN_LORESERVE && st_shndx != SHN_XINDEX.

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]

Reported-by: Mark-PK Tsai <[email protected]>
Tested-by: Mark-PK Tsai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[handle endianness of sym->st_shndx]
Signed-off-by: Mark-PK Tsai <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
scripts/recordmcount.h | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
index f9b19524da11..1e9baa5c4fc6 100644
--- a/scripts/recordmcount.h
+++ b/scripts/recordmcount.h
@@ -192,15 +192,20 @@ static unsigned int get_symindex(Elf_Sym const *sym, Elf32_Word const *symtab,
Elf32_Word const *symtab_shndx)
{
unsigned long offset;
+ unsigned short shndx = w2(sym->st_shndx);
int index;

- if (sym->st_shndx != SHN_XINDEX)
- return w2(sym->st_shndx);
+ if (shndx > SHN_UNDEF && shndx < SHN_LORESERVE)
+ return shndx;

- offset = (unsigned long)sym - (unsigned long)symtab;
- index = offset / sizeof(*sym);
+ if (shndx == SHN_XINDEX) {
+ offset = (unsigned long)sym - (unsigned long)symtab;
+ index = offset / sizeof(*sym);

- return w(symtab_shndx[index]);
+ return w(symtab_shndx[index]);
+ }
+
+ return 0;
}

static unsigned int get_shnum(Elf_Ehdr const *ehdr, Elf_Shdr const *shdr0)
--
2.30.2

2021-06-28 17:41:18

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 059/110] net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY

From: Esben Haabendal <[email protected]>

[ Upstream commit f6396341194234e9b01cd7538bc2c6ac4501ab14 ]

As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.

Signed-off-by: Esben Haabendal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/xilinx/ll_temac_main.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c
index b105e1d35d15..6bd3a389d389 100644
--- a/drivers/net/ethernet/xilinx/ll_temac_main.c
+++ b/drivers/net/ethernet/xilinx/ll_temac_main.c
@@ -942,6 +942,11 @@ temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
wmb();
lp->dma_out(lp, TX_TAILDESC_PTR, tail_p); /* DMA start */

+ if (temac_check_tx_bd_space(lp, MAX_SKB_FRAGS + 1)) {
+ netdev_info(ndev, "%s -> netif_stop_queue\n", __func__);
+ netif_stop_queue(ndev);
+ }
+
return NETDEV_TX_OK;
}

--
2.30.2

2021-06-28 17:41:24

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 065/110] gpiolib: cdev: zero padding during conversion to gpioline_info_changed

From: Gabriel Knezek <[email protected]>

[ Upstream commit cb8f63b8cbf39845244f3ccae43bb7e63bd70543 ]

When userspace requests a GPIO v1 line info changed event,
lineinfo_watch_read() populates and returns the gpioline_info_changed
structure. It contains 5 words of padding at the end which are not
initialized before being returned to userspace.

Zero the structure in gpio_v2_line_info_change_to_v1() before populating
its contents.

Fixes: aad955842d1c ("gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL")
Signed-off-by: Gabriel Knezek <[email protected]>
Reviewed-by: Kent Gibson <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpio/gpiolib-cdev.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index 1631727bf0da..c7b5446d01fd 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -1880,6 +1880,7 @@ static void gpio_v2_line_info_changed_to_v1(
struct gpio_v2_line_info_changed *lic_v2,
struct gpioline_info_changed *lic_v1)
{
+ memset(lic_v1, 0, sizeof(*lic_v1));
gpio_v2_line_info_to_v1(&lic_v2->info, &lic_v1->info);
lic_v1->timestamp = lic_v2->timestamp_ns;
lic_v1->event_type = lic_v2->event_type;
--
2.30.2

2021-06-28 17:41:26

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 063/110] pinctrl: stm32: fix the reported number of GPIO lines per bank

From: Fabien Dessenne <[email protected]>

[ Upstream commit 67e2996f72c71ebe4ac2fcbcf77e54479bb7aa11 ]

Each GPIO bank supports a variable number of lines which is usually 16, but
is less in some cases : this is specified by the last argument of the
"gpio-ranges" bank node property.
Report to the framework, the actual number of lines, so the libgpiod
gpioinfo command lists the actually existing GPIO lines.

Fixes: 1dc9d289154b ("pinctrl: stm32: add possibility to use gpio-ranges to declare bank range")
Signed-off-by: Fabien Dessenne <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/pinctrl/stm32/pinctrl-stm32.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pinctrl/stm32/pinctrl-stm32.c b/drivers/pinctrl/stm32/pinctrl-stm32.c
index 7d9bdedcd71b..3af4430543dc 100644
--- a/drivers/pinctrl/stm32/pinctrl-stm32.c
+++ b/drivers/pinctrl/stm32/pinctrl-stm32.c
@@ -1229,7 +1229,7 @@ static int stm32_gpiolib_register_bank(struct stm32_pinctrl *pctl,
struct device *dev = pctl->dev;
struct resource res;
int npins = STM32_GPIO_PINS_PER_BANK;
- int bank_nr, err;
+ int bank_nr, err, i = 0;

if (!IS_ERR(bank->rstc))
reset_control_deassert(bank->rstc);
@@ -1251,9 +1251,14 @@ static int stm32_gpiolib_register_bank(struct stm32_pinctrl *pctl,

of_property_read_string(np, "st,bank-name", &bank->gpio_chip.label);

- if (!of_parse_phandle_with_fixed_args(np, "gpio-ranges", 3, 0, &args)) {
+ if (!of_parse_phandle_with_fixed_args(np, "gpio-ranges", 3, i, &args)) {
bank_nr = args.args[1] / STM32_GPIO_PINS_PER_BANK;
bank->gpio_chip.base = args.args[1];
+
+ npins = args.args[2];
+ while (!of_parse_phandle_with_fixed_args(np, "gpio-ranges", 3,
+ ++i, &args))
+ npins += args.args[2];
} else {
bank_nr = pctl->nbanks;
bank->gpio_chip.base = bank_nr * STM32_GPIO_PINS_PER_BANK;
--
2.30.2

2021-06-28 17:41:31

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 067/110] software node: Handle software node injection to an existing device properly

From: Heikki Krogerus <[email protected]>

[ Upstream commit 5dca69e26fe97f17d4a6cbd6872103c868577b14 ]

The function software_node_notify() - the function that creates
and removes the symlinks between the node and the device - was
called unconditionally in device_add_software_node() and
device_remove_software_node(), but it needs to be called in
those functions only in the special case where the node is
added to a device that has already been registered.

This fixes NULL pointer dereference that happens if
device_remove_software_node() is used with device that was
never registered.

Fixes: b622b24519f5 ("software node: Allow node addition to already existing device")
Reported-and-tested-by: Dominik Brodowski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Heikki Krogerus <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/base/swnode.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 88310ac9ce90..62c536f9d925 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -1032,7 +1032,15 @@ int device_add_software_node(struct device *dev, const struct software_node *nod
}

set_secondary_fwnode(dev, &swnode->fwnode);
- software_node_notify(dev, KOBJ_ADD);
+
+ /*
+ * If the device has been fully registered by the time this function is
+ * called, software_node_notify() must be called separately so that the
+ * symlinks get created and the reference count of the node is kept in
+ * balance.
+ */
+ if (device_is_registered(dev))
+ software_node_notify(dev, KOBJ_ADD);

return 0;
}
@@ -1052,7 +1060,8 @@ void device_remove_software_node(struct device *dev)
if (!swnode)
return;

- software_node_notify(dev, KOBJ_REMOVE);
+ if (device_is_registered(dev))
+ software_node_notify(dev, KOBJ_REMOVE);
set_secondary_fwnode(dev, NULL);
kobject_put(&swnode->kobj);
}
@@ -1106,8 +1115,7 @@ int software_node_notify(struct device *dev, unsigned long action)

switch (action) {
case KOBJ_ADD:
- ret = sysfs_create_link_nowarn(&dev->kobj, &swnode->kobj,
- "software_node");
+ ret = sysfs_create_link(&dev->kobj, &swnode->kobj, "software_node");
if (ret)
break;

--
2.30.2

2021-06-28 17:41:33

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 069/110] s390/topology: clear thread/group maps for offline cpus

From: Sven Schnelle <[email protected]>

commit 9e3d62d55bf455d4f9fdf2ede5c8756410c64102 upstream.

The current code doesn't clear the thread/group maps for offline
CPUs. This may cause kernel crashes like the one bewlow in common
code that assumes if a CPU has sibblings it is online.

Unable to handle kernel pointer dereference in virtual kernel address space

Call Trace:
[<000000013a4b8c3c>] blk_mq_map_swqueue+0x10c/0x388
([<000000013a4b8bcc>] blk_mq_map_swqueue+0x9c/0x388)
[<000000013a4b9300>] blk_mq_init_allocated_queue+0x448/0x478
[<000000013a4b9416>] blk_mq_init_queue+0x4e/0x90
[<000003ff8019d3e6>] loop_add+0x106/0x278 [loop]
[<000003ff801b8148>] loop_init+0x148/0x1000 [loop]
[<0000000139de4924>] do_one_initcall+0x3c/0x1e0
[<0000000139ef449a>] do_init_module+0x6a/0x2a0
[<0000000139ef61bc>] __do_sys_finit_module+0xa4/0xc0
[<0000000139de9e6e>] do_syscall+0x7e/0xd0
[<000000013a8e0aec>] __do_syscall+0xbc/0x110
[<000000013a8ee2e8>] system_call+0x78/0xa0

Fixes: 52aeda7accb6 ("s390/topology: remove offline CPUs from CPU topology masks")
Cc: <[email protected]> # 5.7+
Reported-by: Marius Hillenbrand <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/topology.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index bfcc327acc6b..26aa2614ee35 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -66,7 +66,10 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c
{
static cpumask_t mask;

- cpumask_copy(&mask, cpumask_of(cpu));
+ cpumask_clear(&mask);
+ if (!cpu_online(cpu))
+ goto out;
+ cpumask_set_cpu(cpu, &mask);
switch (topology_mode) {
case TOPOLOGY_MODE_HW:
while (info) {
@@ -83,10 +86,10 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c
default:
fallthrough;
case TOPOLOGY_MODE_SINGLE:
- cpumask_copy(&mask, cpumask_of(cpu));
break;
}
cpumask_and(&mask, &mask, cpu_online_mask);
+out:
cpumask_copy(dst, &mask);
}

@@ -95,7 +98,10 @@ static void cpu_thread_map(cpumask_t *dst, unsigned int cpu)
static cpumask_t mask;
int i;

- cpumask_copy(&mask, cpumask_of(cpu));
+ cpumask_clear(&mask);
+ if (!cpu_online(cpu))
+ goto out;
+ cpumask_set_cpu(cpu, &mask);
if (topology_mode != TOPOLOGY_MODE_HW)
goto out;
cpu -= cpu % (smp_cpu_mtid + 1);
--
2.30.2

2021-06-28 17:42:22

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 072/110] s390: clear pt_regs::flags on irq entry

From: Sven Schnelle <[email protected]>

commit ca1f4d702d534387aa1f16379edb3b03cdb6ceda upstream.

The current irq entry code doesn't initialize pt_regs::flags. On exit to
user mode arch_do_signal_or_restart() tests whether PIF_SYSCALL is set,
which might yield wrong results.

Fix this by clearing pt_regs::flags in the entry.S irq handler
code.

Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Fixes: 56e62a737028 ("s390: convert to generic entry")
Cc: <[email protected]> # 5.12
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/entry.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 9cc71ca9a88f..e84f495e7eb2 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -418,6 +418,7 @@ ENTRY(\name)
xgr %r6,%r6
xgr %r7,%r7
xgr %r10,%r10
+ xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg %r8,%r9,__PT_PSW(%r11)
tm %r8,0x0001 # coming from user space?
--
2.30.2

2021-06-28 17:42:29

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 070/110] s390/stack: fix possible register corruption with stack switch helper

From: Heiko Carstens <[email protected]>

commit 67147e96a332b56c7206238162771d82467f86c0 upstream.

The CALL_ON_STACK macro is used to call a C function from inline
assembly, and therefore must consider the C ABI, which says that only
registers 6-13, and 15 are non-volatile (restored by the called
function).

The inline assembly incorrectly marks all registers used to pass
parameters to the called function as read-only input operands, instead
of operands that are read and written to. This might result in
register corruption depending on usage, compiler, and compile options.

Fix this by marking all operands used to pass parameters as read/write
operands. To keep the code simple even register 6, if used, is marked
as read-write operand.

Fixes: ff340d2472ec ("s390: add stack switch helper")
Cc: <[email protected]> # 4.20
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/stacktrace.h | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
index 2b543163d90a..76c6034428be 100644
--- a/arch/s390/include/asm/stacktrace.h
+++ b/arch/s390/include/asm/stacktrace.h
@@ -91,12 +91,16 @@ struct stack_frame {
CALL_ARGS_4(arg1, arg2, arg3, arg4); \
register unsigned long r4 asm("6") = (unsigned long)(arg5)

-#define CALL_FMT_0 "=&d" (r2) :
-#define CALL_FMT_1 "+&d" (r2) :
-#define CALL_FMT_2 CALL_FMT_1 "d" (r3),
-#define CALL_FMT_3 CALL_FMT_2 "d" (r4),
-#define CALL_FMT_4 CALL_FMT_3 "d" (r5),
-#define CALL_FMT_5 CALL_FMT_4 "d" (r6),
+/*
+ * To keep this simple mark register 2-6 as being changed (volatile)
+ * by the called function, even though register 6 is saved/nonvolatile.
+ */
+#define CALL_FMT_0 "=&d" (r2)
+#define CALL_FMT_1 "+&d" (r2)
+#define CALL_FMT_2 CALL_FMT_1, "+&d" (r3)
+#define CALL_FMT_3 CALL_FMT_2, "+&d" (r4)
+#define CALL_FMT_4 CALL_FMT_3, "+&d" (r5)
+#define CALL_FMT_5 CALL_FMT_4, "+&d" (r6)

#define CALL_CLOBBER_5 "0", "1", "14", "cc", "memory"
#define CALL_CLOBBER_4 CALL_CLOBBER_5
@@ -118,7 +122,7 @@ struct stack_frame {
" brasl 14,%[_fn]\n" \
" la 15,0(%[_prev])\n" \
: [_prev] "=&a" (prev), CALL_FMT_##nr \
- [_stack] "R" (stack), \
+ : [_stack] "R" (stack), \
[_bc] "i" (offsetof(struct stack_frame, back_chain)), \
[_frame] "d" (frame), \
[_fn] "X" (fn) : CALL_CLOBBER_##nr); \
--
2.30.2

2021-06-28 17:43:09

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 074/110] i2c: robotfuzz-osif: fix control-request directions

From: Johan Hovold <[email protected]>

commit 4ca070ef0dd885616ef294d269a9bf8e3b258e1a upstream.

The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.

Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.

Fix the OSIFI2C_SET_BIT_RATE and OSIFI2C_STOP requests which erroneously
used the osif_usb_read() helper and set the IN direction bit.

Reported-by: [email protected]
Fixes: 83e53a8f120f ("i2c: Add bus driver for for OSIF USB i2c device.")
Cc: [email protected] # 3.14
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/i2c/busses/i2c-robotfuzz-osif.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-robotfuzz-osif.c b/drivers/i2c/busses/i2c-robotfuzz-osif.c
index a39f7d092797..66dfa211e736 100644
--- a/drivers/i2c/busses/i2c-robotfuzz-osif.c
+++ b/drivers/i2c/busses/i2c-robotfuzz-osif.c
@@ -83,7 +83,7 @@ static int osif_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs,
}
}

- ret = osif_usb_read(adapter, OSIFI2C_STOP, 0, 0, NULL, 0);
+ ret = osif_usb_write(adapter, OSIFI2C_STOP, 0, 0, NULL, 0);
if (ret) {
dev_err(&adapter->dev, "failure sending STOP\n");
return -EREMOTEIO;
@@ -153,7 +153,7 @@ static int osif_probe(struct usb_interface *interface,
* Set bus frequency. The frequency is:
* 120,000,000 / ( 16 + 2 * div * 4^prescale).
* Using dev = 52, prescale = 0 give 100KHz */
- ret = osif_usb_read(&priv->adapter, OSIFI2C_SET_BIT_RATE, 52, 0,
+ ret = osif_usb_write(&priv->adapter, OSIFI2C_SET_BIT_RATE, 52, 0,
NULL, 0);
if (ret) {
dev_err(&interface->dev, "failure sending bit rate");
--
2.30.2

2021-06-28 17:43:26

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 078/110] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()

From: Petr Mladek <[email protected]>

commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.

The system might hang with the following backtrace:

schedule+0x80/0x100
schedule_timeout+0x48/0x138
wait_for_common+0xa4/0x134
wait_for_completion+0x1c/0x2c
kthread_flush_work+0x114/0x1cc
kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
kthread_cancel_delayed_work_sync+0x18/0x2c
xxxx_pm_notify+0xb0/0xd8
blocking_notifier_call_chain_robust+0x80/0x194
pm_notifier_call_chain_robust+0x28/0x4c
suspend_prepare+0x40/0x260
enter_state+0x80/0x3f4
pm_suspend+0x60/0xdc
state_store+0x108/0x144
kobj_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec

It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():

CPU0 CPU1

Context: Thread A Context: Thread B

kthread_mod_delayed_work()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
kthread_cancel_delayed_work_sync()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
spin_lock()

work->canceling++
spin_unlock
spin_lock()
queue_delayed_work()
// dwork is put into the worker->delayed_work_list

spin_unlock()

kthread_flush_work()
// flush_work is put at the tail of the dwork

wait_for_completion()

Context: IRQ

kthread_delayed_work_timer_fn()
spin_lock()
list_del_init(&work->node);
spin_unlock()

BANG: flush_work is not longer linked and will never get proceed.

The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.

A simple solution is to (re)check work->canceling after
__kthread_cancel_work(). But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.

The return value might be used for reference counting. The caller has
to know whether a new work has been queued or an existing one was
replaced.

The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set. The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.

Note that kthread_mod_delayed_work() could remove the timer and then
bail out. It is fine. The other canceling caller needs to cancel the
timer as well. The important thing is that the queue (list)
manipulation is done atomically under worker->lock.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work")
Signed-off-by: Petr Mladek <[email protected]>
Reported-by: Martin Liu <[email protected]>
Cc: <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/kthread.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index fc7536079f7e..4fdf2bd9b558 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1119,8 +1119,11 @@ static void kthread_cancel_delayed_work_timer(struct kthread_work *work,
}

/*
- * This function removes the work from the worker queue. Also it makes sure
- * that it won't get queued later via the delayed work's timer.
+ * This function removes the work from the worker queue.
+ *
+ * It is called under worker->lock. The caller must make sure that
+ * the timer used by delayed work is not running, e.g. by calling
+ * kthread_cancel_delayed_work_timer().
*
* The work might still be in use when this function finishes. See the
* current_work proceed by the worker.
@@ -1128,13 +1131,8 @@ static void kthread_cancel_delayed_work_timer(struct kthread_work *work,
* Return: %true if @work was pending and successfully canceled,
* %false if @work was not pending
*/
-static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork,
- unsigned long *flags)
+static bool __kthread_cancel_work(struct kthread_work *work)
{
- /* Try to cancel the timer if exists. */
- if (is_dwork)
- kthread_cancel_delayed_work_timer(work, flags);
-
/*
* Try to remove the work from a worker list. It might either
* be from worker->work_list or from worker->delayed_work_list.
@@ -1187,11 +1185,23 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
/* Work must not be used with >1 worker, see kthread_queue_work() */
WARN_ON_ONCE(work->worker != worker);

- /* Do not fight with another command that is canceling this work. */
+ /*
+ * Temporary cancel the work but do not fight with another command
+ * that is canceling the work as well.
+ *
+ * It is a bit tricky because of possible races with another
+ * mod_delayed_work() and cancel_delayed_work() callers.
+ *
+ * The timer must be canceled first because worker->lock is released
+ * when doing so. But the work can be removed from the queue (list)
+ * only when it can be queued again so that the return value can
+ * be used for reference counting.
+ */
+ kthread_cancel_delayed_work_timer(work, &flags);
if (work->canceling)
goto out;
+ ret = __kthread_cancel_work(work);

- ret = __kthread_cancel_work(work, true, &flags);
fast_queue:
__kthread_queue_delayed_work(worker, dwork, delay);
out:
@@ -1213,7 +1223,10 @@ static bool __kthread_cancel_work_sync(struct kthread_work *work, bool is_dwork)
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);

- ret = __kthread_cancel_work(work, is_dwork, &flags);
+ if (is_dwork)
+ kthread_cancel_delayed_work_timer(work, &flags);
+
+ ret = __kthread_cancel_work(work);

if (worker->current_work != work)
goto out_fast;
--
2.30.2

2021-06-28 17:44:09

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 076/110] xen/events: reset active flag for lateeoi events later

From: Juergen Gross <[email protected]>

commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream.

In order to avoid a race condition for user events when changing
cpu affinity reset the active flag only when EOI-ing the event.

This is working fine as all user events are lateeoi events. Note that
lateeoi_ack_mask_dynirq() is not modified as there is no explicit call
to xen_irq_lateeoi() expected later.

Cc: [email protected]
Reported-by: Julien Grall <[email protected]>
Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time")
Tested-by: Julien Grall <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/events/events_base.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 7bbfd58958bc..d7e361fb0548 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -642,6 +642,9 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
}

info->eoi_time = 0;
+
+ /* is_active hasn't been reset yet, do it now. */
+ smp_store_release(&info->is_active, 0);
do_unmask(info, EVT_MASK_REASON_EOI_PENDING);
}

@@ -811,6 +814,7 @@ static void xen_evtchn_close(evtchn_port_t port)
BUG();
}

+/* Not called for lateeoi events. */
static void event_handler_exit(struct irq_info *info)
{
smp_store_release(&info->is_active, 0);
@@ -1883,7 +1887,12 @@ static void lateeoi_ack_dynirq(struct irq_data *data)

if (VALID_EVTCHN(evtchn)) {
do_mask(info, EVT_MASK_REASON_EOI_PENDING);
- event_handler_exit(info);
+ /*
+ * Don't call event_handler_exit().
+ * Need to keep is_active non-zero in order to ignore re-raised
+ * events after cpu affinity changes while a lateeoi is pending.
+ */
+ clear_evtchn(evtchn);
}
}

--
2.30.2

2021-06-28 17:44:58

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 079/110] x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()

From: Thomas Gleixner <[email protected]>

commit 9301982c424a003c0095bf157154a85bf5322bd0 upstream.

sanitize_restored_user_xstate() preserves the supervisor states only
when the fx_only argument is zero, which allows unprivileged user space
to put supervisor states back into init state.

Preserve them unconditionally.

[ bp: Fix a typo or two in the text. ]

Fixes: 5d6b6a6f9b5c ("x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/fpu/signal.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index ec3ae3054792..b7b92cdf3add 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -221,28 +221,18 @@ sanitize_restored_user_xstate(union fpregs_state *state,

if (use_xsave()) {
/*
- * Note: we don't need to zero the reserved bits in the
- * xstate_header here because we either didn't copy them at all,
- * or we checked earlier that they aren't set.
+ * Clear all feature bits which are not set in
+ * user_xfeatures and clear all extended features
+ * for fx_only mode.
*/
+ u64 mask = fx_only ? XFEATURE_MASK_FPSSE : user_xfeatures;

/*
- * 'user_xfeatures' might have bits clear which are
- * set in header->xfeatures. This represents features that
- * were in init state prior to a signal delivery, and need
- * to be reset back to the init state. Clear any user
- * feature bits which are set in the kernel buffer to get
- * them back to the init state.
- *
- * Supervisor state is unchanged by input from userspace.
- * Ensure supervisor state bits stay set and supervisor
- * state is not modified.
+ * Supervisor state has to be preserved. The sigframe
+ * restore can only modify user features, i.e. @mask
+ * cannot contain them.
*/
- if (fx_only)
- header->xfeatures = XFEATURE_MASK_FPSSE;
- else
- header->xfeatures &= user_xfeatures |
- xfeatures_mask_supervisor();
+ header->xfeatures &= mask | xfeatures_mask_supervisor();
}

if (use_fxsr()) {
--
2.30.2

2021-06-28 17:46:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 080/110] x86/fpu: Make init_fpstate correct with optimized XSAVE

From: Thomas Gleixner <[email protected]>

commit f9dfb5e390fab2df9f7944bb91e7705aba14cd26 upstream.

The XSAVE init code initializes all enabled and supported components with
XRSTOR(S) to init state. Then it XSAVEs the state of the components back
into init_fpstate which is used in several places to fill in the init state
of components.

This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
those use the init optimization and skip writing state of components which
are in init state. So init_fpstate.xsave still contains all zeroes after
this operation.

There are two ways to solve that:

1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
XSAVES is enabled because XSAVES uses compacted format.

2) Save the components which are known to have a non-zero init state by other
means.

Looking deeper, #2 is the right thing to do because all components the
kernel supports have all-zeroes init state except the legacy features (FP,
SSE). Those cannot be hard coded because the states are not identical on all
CPUs, but they can be saved with FXSAVE which avoids all conditionals.

Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
a BUILD_BUG_ON() which reminds developers to validate that a newly added
component has all zeroes init state. As a bonus remove the now unused
copy_xregs_to_kernel_booting() crutch.

The XSAVE and reshuffle method can still be implemented in the unlikely
case that components are added which have a non-zero init state and no
other means to save them. For now, FXSAVE is just simple and good enough.

[ bp: Fix a typo or two in the text. ]

Fixes: 6bad06b76892 ("x86, xsave: Use xsaveopt in context-switch path when supported")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 30 ++++++---------------
arch/x86/kernel/fpu/xstate.c | 41 ++++++++++++++++++++++++++---
2 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index fdee23ea4e17..16bf4d4a8159 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -204,6 +204,14 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state.fxsave));
}

+static inline void fxsave(struct fxregs_state *fx)
+{
+ if (IS_ENABLED(CONFIG_X86_32))
+ asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
+ else
+ asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
+}
+
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27"
#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37"
@@ -268,28 +276,6 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")

-/*
- * This function is called only during boot time when x86 caps are not set
- * up and alternative can not be used yet.
- */
-static inline void copy_xregs_to_kernel_booting(struct xregs_state *xstate)
-{
- u64 mask = xfeatures_mask_all;
- u32 lmask = mask;
- u32 hmask = mask >> 32;
- int err;
-
- WARN_ON(system_state != SYSTEM_BOOTING);
-
- if (boot_cpu_has(X86_FEATURE_XSAVES))
- XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
- else
- XSTATE_OP(XSAVE, xstate, lmask, hmask, err);
-
- /* We should never fault when copying to a kernel buffer: */
- WARN_ON_FPU(err);
-}
-
/*
* This function is called only during boot time when x86 caps are not set
* up and alternative can not be used yet.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ad57cc14b83..451435d7ff41 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -440,6 +440,25 @@ static void __init print_xstate_offset_size(void)
}
}

+/*
+ * All supported features have either init state all zeros or are
+ * handled in setup_init_fpu() individually. This is an explicit
+ * feature list and does not use XFEATURE_MASK*SUPPORTED to catch
+ * newly added supported features at build time and make people
+ * actually look at the init state for the new feature.
+ */
+#define XFEATURES_INIT_FPSTATE_HANDLED \
+ (XFEATURE_MASK_FP | \
+ XFEATURE_MASK_SSE | \
+ XFEATURE_MASK_YMM | \
+ XFEATURE_MASK_OPMASK | \
+ XFEATURE_MASK_ZMM_Hi256 | \
+ XFEATURE_MASK_Hi16_ZMM | \
+ XFEATURE_MASK_PKRU | \
+ XFEATURE_MASK_BNDREGS | \
+ XFEATURE_MASK_BNDCSR | \
+ XFEATURE_MASK_PASID)
+
/*
* setup the xstate image representing the init state
*/
@@ -447,6 +466,10 @@ static void __init setup_init_fpu_buf(void)
{
static int on_boot_cpu __initdata = 1;

+ BUILD_BUG_ON((XFEATURE_MASK_USER_SUPPORTED |
+ XFEATURE_MASK_SUPERVISOR_SUPPORTED) !=
+ XFEATURES_INIT_FPSTATE_HANDLED);
+
WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0;

@@ -466,10 +489,22 @@ static void __init setup_init_fpu_buf(void)
copy_kernel_to_xregs_booting(&init_fpstate.xsave);

/*
- * Dump the init state again. This is to identify the init state
- * of any feature which is not represented by all zero's.
+ * All components are now in init state. Read the state back so
+ * that init_fpstate contains all non-zero init state. This only
+ * works with XSAVE, but not with XSAVEOPT and XSAVES because
+ * those use the init optimization which skips writing data for
+ * components in init state.
+ *
+ * XSAVE could be used, but that would require to reshuffle the
+ * data when XSAVES is available because XSAVES uses xstate
+ * compaction. But doing so is a pointless exercise because most
+ * components have an all zeros init state except for the legacy
+ * ones (FP and SSE). Those can be saved with FXSAVE into the
+ * legacy area. Adding new features requires to ensure that init
+ * state is all zeroes or if not to add the necessary handling
+ * here.
*/
- copy_xregs_to_kernel_booting(&init_fpstate.xsave);
+ fxsave(&init_fpstate.fxsave);
}

static int xfeature_uncompacted_offset(int xfeature_nr)
--
2.30.2

2021-06-28 17:46:15

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 081/110] mm/memory-failure: use a mutex to avoid memory_failure() races

From: Tony Luck <[email protected]>

commit 171936ddaf97e6f4e1264f4128bb5cf15691339c upstream.

Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.

I wrote this patchset to materialize what I think is the current
allowable solution mentioned by the previous discussion [1]. I simply
borrowed Tony's mutex patch and Aili's return code patch, then I queued
another one to find error virtual address in the best effort manner. I
know that this is not a perfect solution, but should work for some
typical case.

[1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/

This patch (of 2):

There can be races when multiple CPUs consume poison from the same page.
The first into memory_failure() atomically sets the HWPoison page flag
and begins hunting for tasks that map this page. Eventually it
invalidates those mappings and may send a SIGBUS to the affected tasks.

But while all that work is going on, other CPUs see a "success" return
code from memory_failure() and so they believe the error has been
handled and continue executing.

Fix by wrapping most of the internal parts of memory_failure() in a
mutex.

[[email protected]: make mf_mutex local to memory_failure()]

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Aili Yao <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/memory-failure.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 704d05057d8c..2dd60ad81216 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1429,9 +1429,10 @@ int memory_failure(unsigned long pfn, int flags)
struct page *hpage;
struct page *orig_head;
struct dev_pagemap *pgmap;
- int res;
+ int res = 0;
unsigned long page_flags;
bool retry = true;
+ static DEFINE_MUTEX(mf_mutex);

if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -1449,13 +1450,18 @@ int memory_failure(unsigned long pfn, int flags)
return -ENXIO;
}

+ mutex_lock(&mf_mutex);
+
try_again:
- if (PageHuge(p))
- return memory_failure_hugetlb(pfn, flags);
+ if (PageHuge(p)) {
+ res = memory_failure_hugetlb(pfn, flags);
+ goto unlock_mutex;
+ }
+
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
- return 0;
+ goto unlock_mutex;
}

orig_head = hpage = compound_head(p);
@@ -1488,17 +1494,19 @@ try_again:
res = MF_FAILED;
}
action_result(pfn, MF_MSG_BUDDY, res);
- return res == MF_RECOVERED ? 0 : -EBUSY;
+ res = res == MF_RECOVERED ? 0 : -EBUSY;
} else {
action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
- return -EBUSY;
+ res = -EBUSY;
}
+ goto unlock_mutex;
}

if (PageTransHuge(hpage)) {
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
- return -EBUSY;
+ res = -EBUSY;
+ goto unlock_mutex;
}
VM_BUG_ON_PAGE(!page_count(p), p);
}
@@ -1522,7 +1530,7 @@ try_again:
if (PageCompound(p) && compound_head(p) != orig_head) {
action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
res = -EBUSY;
- goto out;
+ goto unlock_page;
}

/*
@@ -1542,14 +1550,14 @@ try_again:
num_poisoned_pages_dec();
unlock_page(p);
put_page(p);
- return 0;
+ goto unlock_mutex;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unlock_page(p);
put_page(p);
- return 0;
+ goto unlock_mutex;
}

/*
@@ -1573,7 +1581,7 @@ try_again:
if (!hwpoison_user_mappings(p, pfn, flags, &p)) {
action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
res = -EBUSY;
- goto out;
+ goto unlock_page;
}

/*
@@ -1582,13 +1590,15 @@ try_again:
if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) {
action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
res = -EBUSY;
- goto out;
+ goto unlock_page;
}

identify_page_state:
res = identify_page_state(pfn, p, page_flags);
-out:
+unlock_page:
unlock_page(p);
+unlock_mutex:
+ mutex_unlock(&mf_mutex);
return res;
}
EXPORT_SYMBOL_GPL(memory_failure);
--
2.30.2

2021-06-28 17:47:55

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 082/110] mm, thp: use head page in __migration_entry_wait()

From: Xu Yu <[email protected]>

commit ffc90cbb2970ab88b66ea51dd580469eede57b67 upstream.

We notice that hung task happens in a corner but practical scenario when
CONFIG_PREEMPT_NONE is enabled, as follows.

Process 0 Process 1 Process 2..Inf
split_huge_page_to_list
unmap_page
split_huge_pmd_address
__migration_entry_wait(head)
__migration_entry_wait(tail)
remap_page (roll back)
remove_migration_ptes
rmap_walk_anon
cond_resched

Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
copy_to_user in fstat, which will immediately fault again without
rescheduling, and thus occupy the cpu fully.

When there are too many processes performing __migration_entry_wait on
tail page, remap_page will never be done after cond_resched.

This makes __migration_entry_wait operate on the compound head page,
thus waits for remap_page to complete, whether the THP is split
successfully or roll back.

Note that put_and_wait_on_page_locked helps to drop the page reference
acquired with get_page_unless_zero, as soon as the page is on the wait
queue, before actually waiting. So splitting the THP is only prevented
for a brief interval.

Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com
Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split")
Suggested-by: Hugh Dickins <[email protected]>
Signed-off-by: Gang Deng <[email protected]>
Signed-off-by: Xu Yu <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/migrate.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 773622cffe77..40455e753c5b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -322,6 +322,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
goto out;

page = migration_entry_to_page(entry);
+ page = compound_head(page);

/*
* Once page cache replacement of page migration started, page_count
--
2.30.2

2021-06-28 17:49:36

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 083/110] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry

From: Hugh Dickins <[email protected]>

commit 99fa8a48203d62b3743d866fc48ef6abaee682be upstream.

Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.

Here is v2 batch of long-standing THP bug fixes that I had not got
around to sending before, but prompted now by Wang Yugui's report
https://lore.kernel.org/linux-mm/[email protected]/

Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and
they have done no harm, but have *not* fixed that issue: something more
is needed and I have no idea of what.

This patch (of 7):

Stressing huge tmpfs page migration racing hole punch often crashed on
the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
kernel; or shortly afterwards, on a bad dereference in
__split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd
migration entries in the non-anonymous case.

Full disclosure: those particular experiments were on a kernel with more
relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
vanilla kernel: it is conceivable that stricter locking happens to avoid
those cases, or makes them less likely; but __split_huge_pmd_locked()
already allowed for pmd migration entries when handling anonymous THPs,
so this commit brings the shmem and file THP handling into line.

And while there: use old_pmd rather than _pmd, as in the following
blocks; and make it clearer to the eye that the !vma_is_anonymous()
block is self-contained, making an early return after accounting for
unmapping.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/huge_memory.c | 27 ++++++++++++++++++---------
mm/pgtable-generic.c | 5 ++---
2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ae907a9c2050..cd37a0829881 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2046,7 +2046,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
count_vm_event(THP_SPLIT_PMD);

if (!vma_is_anonymous(vma)) {
- _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
+ old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
/*
* We are going to unmap this huge page. So
* just go ahead and zap it
@@ -2055,16 +2055,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
zap_deposited_table(mm, pmd);
if (vma_is_special_huge(vma))
return;
- page = pmd_page(_pmd);
- if (!PageDirty(page) && pmd_dirty(_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, true);
- put_page(page);
+ if (unlikely(is_pmd_migration_entry(old_pmd))) {
+ swp_entry_t entry;
+
+ entry = pmd_to_swp_entry(old_pmd);
+ page = migration_entry_to_page(entry);
+ } else {
+ page = pmd_page(old_pmd);
+ if (!PageDirty(page) && pmd_dirty(old_pmd))
+ set_page_dirty(page);
+ if (!PageReferenced(page) && pmd_young(old_pmd))
+ SetPageReferenced(page);
+ page_remove_rmap(page, true);
+ put_page(page);
+ }
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
- } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ }
+
+ if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index c2210e1cdb51..4e640baf9794 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -135,9 +135,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
{
pmd_t pmd;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
- VM_BUG_ON(!pmd_present(*pmdp));
- /* Below assumes pmd_present() is true */
- VM_BUG_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
+ !pmd_devmap(*pmdp));
pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return pmd;
--
2.30.2

2021-06-28 17:53:22

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 087/110] mm/thp: fix page_address_in_vma() on file THP tails

From: Jue Wang <[email protected]>

commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 upstream.

Anon THP tails were already supported, but memory-failure may need to
use page_address_in_vma() on file THP tails, which its page->mapping
check did not permit: fix it.

hughd adds: no current usage is known to hit the issue, but this does
fix a subtle trap in a general helper: best fixed in stable sooner than
later.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Jue Wang <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/rmap.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index cc8208b20373..3665d062cc9c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -716,11 +716,11 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
if (!vma->anon_vma || !page__anon_vma ||
vma->anon_vma->root != page__anon_vma->root)
return -EFAULT;
- } else if (page->mapping) {
- if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping)
- return -EFAULT;
- } else
+ } else if (!vma->vm_file) {
+ return -EFAULT;
+ } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) {
return -EFAULT;
+ }

return vma_address(page, vma);
}
--
2.30.2

2021-06-28 18:12:38

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 075/110] ceph: must hold snap_rwsem when filling inode for async create

From: Jeff Layton <[email protected]>

commit 27171ae6a0fdc75571e5bf3d0961631a1e4fb765 upstream.

...and add a lockdep assertion for it to ceph_fill_inode().

Cc: [email protected] # v5.7+
Fixes: 9a8d03ca2e2c3 ("ceph: attempt to do async create when possible")
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ceph/file.c | 3 +++
fs/ceph/inode.c | 2 ++
2 files changed, 5 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 209535d5b8d3..3d2e3dd4ee01 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -578,6 +578,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
struct ceph_inode_info *ci = ceph_inode(dir);
struct inode *inode;
struct timespec64 now;
+ struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
struct ceph_vino vino = { .ino = req->r_deleg_ino,
.snap = CEPH_NOSNAP };

@@ -615,8 +616,10 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,

ceph_file_layout_to_legacy(lo, &in.layout);

+ down_read(&mdsc->snap_rwsem);
ret = ceph_fill_inode(inode, NULL, &iinfo, NULL, req->r_session,
req->r_fmode, NULL);
+ up_read(&mdsc->snap_rwsem);
if (ret) {
dout("%s failed to fill inode: %d\n", __func__, ret);
ceph_dir_clear_complete(dir);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 179d2ef69a24..7ee6023adb36 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -762,6 +762,8 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
bool new_version = false;
bool fill_inline = false;

+ lockdep_assert_held(&mdsc->snap_rwsem);
+
dout("%s %p ino %llx.%llx v %llu had %llu\n", __func__,
inode, ceph_vinop(inode), le64_to_cpu(info->version),
ci->i_version);
--
2.30.2

2021-06-28 18:13:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 077/110] kthread_worker: split code for canceling the delayed work timer

From: Petr Mladek <[email protected]>

commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream.

Patch series "kthread_worker: Fix race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync()".

This patchset fixes the race between kthread_mod_delayed_work() and
kthread_cancel_delayed_work_sync() including proper return value
handling.

This patch (of 2):

Simple code refactoring as a preparation step for fixing a race between
kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().

It does not modify the existing behavior.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Cc: <[email protected]>
Cc: Martin Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/kthread.c | 46 +++++++++++++++++++++++++++++-----------------
1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6d3c488a0f82..fc7536079f7e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1091,6 +1091,33 @@ void kthread_flush_work(struct kthread_work *work)
}
EXPORT_SYMBOL_GPL(kthread_flush_work);

+/*
+ * Make sure that the timer is neither set nor running and could
+ * not manipulate the work list_head any longer.
+ *
+ * The function is called under worker->lock. The lock is temporary
+ * released but the timer can't be set again in the meantime.
+ */
+static void kthread_cancel_delayed_work_timer(struct kthread_work *work,
+ unsigned long *flags)
+{
+ struct kthread_delayed_work *dwork =
+ container_of(work, struct kthread_delayed_work, work);
+ struct kthread_worker *worker = work->worker;
+
+ /*
+ * del_timer_sync() must be called to make sure that the timer
+ * callback is not running. The lock must be temporary released
+ * to avoid a deadlock with the callback. In the meantime,
+ * any queuing is blocked by setting the canceling counter.
+ */
+ work->canceling++;
+ raw_spin_unlock_irqrestore(&worker->lock, *flags);
+ del_timer_sync(&dwork->timer);
+ raw_spin_lock_irqsave(&worker->lock, *flags);
+ work->canceling--;
+}
+
/*
* This function removes the work from the worker queue. Also it makes sure
* that it won't get queued later via the delayed work's timer.
@@ -1105,23 +1132,8 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork,
unsigned long *flags)
{
/* Try to cancel the timer if exists. */
- if (is_dwork) {
- struct kthread_delayed_work *dwork =
- container_of(work, struct kthread_delayed_work, work);
- struct kthread_worker *worker = work->worker;
-
- /*
- * del_timer_sync() must be called to make sure that the timer
- * callback is not running. The lock must be temporary released
- * to avoid a deadlock with the callback. In the meantime,
- * any queuing is blocked by setting the canceling counter.
- */
- work->canceling++;
- raw_spin_unlock_irqrestore(&worker->lock, *flags);
- del_timer_sync(&dwork->timer);
- raw_spin_lock_irqsave(&worker->lock, *flags);
- work->canceling--;
- }
+ if (is_dwork)
+ kthread_cancel_delayed_work_timer(work, flags);

/*
* Try to remove the work from a worker list. It might either
--
2.30.2

2021-06-28 18:17:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 085/110] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting

From: Hugh Dickins <[email protected]>

commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 upstream.

Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e. mapcount 0.

And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
all indicative of some mapcount difficulty in development here perhaps.
But the !CONFIG_DEBUG_VM path handles the failures correctly and
silently.

I believe the problem is that once a racing unmap has cleared pte or
pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
from try_to_unmap() before the racing task has reached decrementing
mapcount.

Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
TTU_SYNC to the options, and passing that from unmap_page().

When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem. Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race
tolerated.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/rmap.h | 1 +
mm/huge_memory.c | 2 +-
mm/page_vma_mapped.c | 11 +++++++++++
mm/rmap.c | 17 ++++++++++++++++-
4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index def5c62c93b3..8d04e7deedc6 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -91,6 +91,7 @@ enum ttu_flags {

TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
+ TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
* and caller guarantees they will
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e1ad01e68aa3..9c71a61e4c59 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2358,7 +2358,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,

static void unmap_page(struct page *page)
{
- enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
+ enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
bool unmap_success;

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 86e3a3688d59..da6744015abd 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -212,6 +212,17 @@ restart:
pvmw->ptl = NULL;
}
} else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(pvmw->page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
return false;
}
if (!map_pte(pvmw))
diff --git a/mm/rmap.c b/mm/rmap.c
index b0fc27e77d6d..735a1b514f83 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;

+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ if (flags & TTU_SYNC)
+ pvmw.flags = PVMW_SYNC;
+
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
return true;
@@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
else
rmap_walk(page, &rwc);

- return !page_mapcount(page) ? true : false;
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ return !page_mapcount(page);
}

/**
--
2.30.2

2021-06-28 18:18:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 090/110] mm: page_vma_mapped_walk(): use page for pvmw->page

From: Hugh Dickins <[email protected]>

commit f003c03bd29e6f46fef1b9a8e8d636ac732286d5 upstream.

Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".

I've marked all of these for stable: many are merely cleanups, but I
think they are much better before the main fix than after.

This patch (of 11):

page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page
was used, sometimes pvmw->page itself: use the local copy "page"
throughout.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index f34c065629aa..ba24ac8e2377 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -156,7 +156,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;

- if (unlikely(PageHuge(pvmw->page))) {
+ if (unlikely(PageHuge(page))) {
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -217,8 +217,7 @@ restart:
* cannot return prematurely, while zap_huge_pmd() has
* cleared *pmd but not decremented compound_mapcount().
*/
- if ((pvmw->flags & PVMW_SYNC) &&
- PageTransCompound(pvmw->page)) {
+ if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);

spin_unlock(ptl);
@@ -234,9 +233,9 @@ restart:
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
+ if (!PageTransHuge(page) || PageHuge(page))
return not_found(pvmw);
- end = vma_address_end(pvmw->page, pvmw->vma);
+ end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
--
2.30.2

2021-06-28 18:18:12

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 092/110] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd

From: Hugh Dickins <[email protected]>

commit 3306d3119ceacc43ea8b141a73e21fea68eec30c upstream.

page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
use it in subsequent tests, instead of repeatedly dereferencing pointer.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 9085d09c5bc3..825c97a479e6 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -191,18 +191,19 @@ restart:
pmde = READ_ONCE(*pvmw->pmd);
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- if (likely(pmd_trans_huge(*pvmw->pmd))) {
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
- if (pmd_page(*pvmw->pmd) != page)
+ if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(*pvmw->pmd)) {
+ } else if (!pmd_present(pmde)) {
if (thp_migration_supported()) {
if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
- swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd);
+ if (is_migration_entry(pmd_to_swp_entry(pmde))) {
+ swp_entry_t entry = pmd_to_swp_entry(pmde);

if (migration_entry_to_page(entry) != page)
return not_found(pvmw);
--
2.30.2

2021-06-28 18:18:56

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 097/110] mm: page_vma_mapped_walk(): get vma_address_end() earlier

From: Hugh Dickins <[email protected]>

commit a765c417d876cc635f628365ec9aa6f09470069a upstream.

page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the
start, rather than later at next_pte.

It's a little unnecessary overhead on the first call, but makes for a
simpler loop in the following commit.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 8803dd93c049..b4c731ca2752 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -171,6 +171,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return true;
}

+ /*
+ * Seek to next pte only makes sense for THP.
+ * But more important than that optimization, is to filter out
+ * any PageKsm page: whose page->index misleads vma_address()
+ * and vma_address_end() to disaster.
+ */
+ end = PageTransCompound(page) ?
+ vma_address_end(page, pvmw->vma) :
+ pvmw->address + PAGE_SIZE;
if (pvmw->pte)
goto next_pte;
restart:
@@ -238,10 +247,6 @@ this_pte:
if (check_pte(pvmw))
return true;
next_pte:
- /* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page))
- return not_found(pvmw);
- end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
--
2.30.2

2021-06-28 18:18:56

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 094/110] mm: page_vma_mapped_walk(): crossing page table boundary

From: Hugh Dickins <[email protected]>

commit 448282487483d6fa5b2eeeafaa0acc681e544a9c upstream.

page_vma_mapped_walk() cleanup: adjust the test for crossing page table
boundary - I believe pvmw->address is always page-aligned, but nothing
else here assumed that; and remember to reset pvmw->pte to NULL after
unmapping the page table, though I never saw any bug from that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 96cef824f52f..aad116532140 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -244,16 +244,16 @@ next_pte:
if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
- if (pvmw->address % PMD_SIZE == 0) {
- pte_unmap(pvmw->pte);
+ if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
if (pvmw->ptl) {
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
+ pte_unmap(pvmw->pte);
+ pvmw->pte = NULL;
goto restart;
- } else {
- pvmw->pte++;
}
+ pvmw->pte++;
} while (pte_none(*pvmw->pte));

if (!pvmw->ptl) {
--
2.30.2

2021-06-28 18:18:58

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 095/110] mm: page_vma_mapped_walk(): add a level of indentation

From: Hugh Dickins <[email protected]>

commit b3807a91aca7d21c05d5790612e49969117a72b9 upstream.

page_vma_mapped_walk() cleanup: add a level of indentation to much of
the body, making no functional change in this commit, but reducing the
later diff when this is all converted to a loop.

[[email protected]: : page_vma_mapped_walk(): add a level of indentation fix]
Link: https://lkml.kernel.org/r/[email protected]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 105 ++++++++++++++++++++++---------------------
1 file changed, 55 insertions(+), 50 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index aad116532140..4a8a069c5e87 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -173,62 +173,67 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;
restart:
- pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
- p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
- pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
- pvmw->pmd = pmd_offset(pud, pvmw->address);
- /*
- * Make sure the pmd value isn't cached in a register by the
- * compiler and used as a stale value after we've observed a
- * subsequent update.
- */
- pmde = READ_ONCE(*pvmw->pmd);
- if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- pmde = *pvmw->pmd;
- if (likely(pmd_trans_huge(pmde))) {
- if (pvmw->flags & PVMW_MIGRATION)
- return not_found(pvmw);
- if (pmd_page(pmde) != page)
- return not_found(pvmw);
- return true;
- }
- if (!pmd_present(pmde)) {
- swp_entry_t entry;
+ {
+ pgd = pgd_offset(mm, pvmw->address);
+ if (!pgd_present(*pgd))
+ return false;
+ p4d = p4d_offset(pgd, pvmw->address);
+ if (!p4d_present(*p4d))
+ return false;
+ pud = pud_offset(p4d, pvmw->address);
+ if (!pud_present(*pud))
+ return false;

- if (!thp_migration_supported() ||
- !(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- entry = pmd_to_swp_entry(pmde);
- if (!is_migration_entry(entry) ||
- migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
- } else if (!pmd_present(pmde)) {
+ pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
- * If PVMW_SYNC, take and drop THP pmd lock so that we
- * cannot return prematurely, while zap_huge_pmd() has
- * cleared *pmd but not decremented compound_mapcount().
+ * Make sure the pmd value isn't cached in a register by the
+ * compiler and used as a stale value after we've observed a
+ * subsequent update.
*/
- if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
- spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = READ_ONCE(*pvmw->pmd);

- spin_unlock(ptl);
+ if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (pmd_page(pmde) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;
+
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
+ } else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
+ return false;
}
- return false;
+ if (!map_pte(pvmw))
+ goto next_pte;
}
- if (!map_pte(pvmw))
- goto next_pte;
while (1) {
unsigned long end;

--
2.30.2

2021-06-28 18:19:05

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 102/110] swiotlb: manipulate orig_addr when tlb_addr has offset

From: Bumyong Lee <[email protected]>

commit 5f89468e2f060031cd89fd4287298e0eaf246bf6 upstream.

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.

It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.

From Linus's email:
"That commit which the removed the offset calculation entirely, because the old

(unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

(tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation."

(Unfortunatly that broke NVMe).

The use-case that drivers are hitting is as follow:

1. Get dma_addr_t from dma_map_single()

dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

|<---------------vsize------------->|
+-----------------------------------+
| | original buffer
+-----------------------------------+
vaddr

swiotlb_align_offset
|<----->|<---------------vsize------------->|
+-------+-----------------------------------+
| | | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)

dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

Error case.
Copy data to original buffer but it is from base addr (instead of
base addr + offset) in original buffer:

swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

|<- size ->|
+-----------------------------------+
|##########| | original buffer
+-----------------------------------+
vaddr

The fix is to copy the data to the original buffer and take into
account the offset, like so:

swiotlb_align_offset
|<----->|<- offset ->|<- size ->|
+-------+-----------------------------------+
| | |##########| | swiotlb buffer
+-------+-----------------------------------+
tlb_addr

|<- offset ->|<- size ->|
+-----------------------------------+
| |##########| | original buffer
+-----------------------------------+
vaddr

[One fix which was Linus's that made more sense to as it created a
symmetry would break NVMe. The reason for that is the:
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

would come up with the proper offset, but it would lose the
alignment (which this patch contains).]

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee <[email protected]>
Signed-off-by: Chanho Park <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reported-by: Dominique MARTINET <[email protected]>
Reported-by: Horia Geantă <[email protected]>
Tested-by: Horia Geantă <[email protected]>
CC: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/dma/swiotlb.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fe4c01c14ab2..e96f3808e431 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -724,11 +724,17 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
size_t orig_size = io_tlb_orig_size[index];
phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ unsigned int tlb_offset;

if (orig_addr == INVALID_PHYS_ADDR)
return;

- validate_sync_size_and_truncate(hwdev, orig_size, &size);
+ tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
+ swiotlb_align_offset(hwdev, orig_addr);
+
+ orig_addr += tlb_offset;
+
+ validate_sync_size_and_truncate(hwdev, orig_size - tlb_offset, &size);

switch (target) {
case SYNC_FOR_CPU:
--
2.30.2

2021-06-28 18:19:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 103/110] netfs: fix test for whether we can skip read when writing beyond EOF

From: Jeff Layton <[email protected]>

commit 827a746f405d25f79560c7868474aec5aee174e1 upstream.

It's not sufficient to skip reading when the pos is beyond the EOF.
There may be data at the head of the page that we need to fill in
before the write.

Add a new helper function that corrects and clarifies the logic of
when we can skip reads, and have it only zero out the part of the page
that won't have data copied in for the write.

Finally, don't set the page Uptodate after zeroing. It's not up to date
since the write data won't have been copied in yet.

[DH made the following changes:

- Prefixed the new function with "netfs_".

- Don't call zero_user_segments() for a full-page write.

- Altered the beyond-last-page check to avoid a DIV instruction and got
rid of then-redundant zero-length file check.
]

[ Note: this fix is commit 827a746f405d in mainline kernels. The
original bug was in ceph, but got lifted into the fs/netfs
library for v5.13. This backport should apply to stable
kernels v5.10 though v5.12. ]

Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
Reported-by: Andrew W Elble <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/162367683365.460125.4467036947364047314.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162391826758.1173366.11794946719301590013.stgit@warthog.procyon.org.uk/ # v2
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ceph/addr.c | 54 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 41 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 26e66436f005..c000fe338f7e 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1302,6 +1302,45 @@ ceph_find_incompatible(struct page *page)
return NULL;
}

+/**
+ * prep_noread_page - prep a page for writing without reading first
+ * @page: page being prepared
+ * @pos: starting position for the write
+ * @len: length of write
+ *
+ * In some cases, write_begin doesn't need to read at all:
+ * - full page write
+ * - file is currently zero-length
+ * - write that lies in a page that is completely beyond EOF
+ * - write that covers the the page from start to EOF or beyond it
+ *
+ * If any of these criteria are met, then zero out the unwritten parts
+ * of the page and return true. Otherwise, return false.
+ */
+static bool skip_page_read(struct page *page, loff_t pos, size_t len)
+{
+ struct inode *inode = page->mapping->host;
+ loff_t i_size = i_size_read(inode);
+ size_t offset = offset_in_page(pos);
+
+ /* Full page write */
+ if (offset == 0 && len >= PAGE_SIZE)
+ return true;
+
+ /* pos beyond last page in the file */
+ if (pos - offset >= i_size)
+ goto zero_out;
+
+ /* write that covers the whole page from start to EOF or beyond it */
+ if (offset == 0 && (pos + len) >= i_size)
+ goto zero_out;
+
+ return false;
+zero_out:
+ zero_user_segments(page, 0, offset, offset + len, PAGE_SIZE);
+ return true;
+}
+
/*
* We are only allowed to write into/dirty the page if the page is
* clean, or already dirty within the same snap context.
@@ -1315,7 +1354,6 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
struct ceph_snap_context *snapc;
struct page *page = NULL;
pgoff_t index = pos >> PAGE_SHIFT;
- int pos_in_page = pos & ~PAGE_MASK;
int r = 0;

dout("write_begin file %p inode %p page %p %d~%d\n", file, inode, page, (int)pos, (int)len);
@@ -1350,19 +1388,9 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
break;
}

- /*
- * In some cases we don't need to read at all:
- * - full page write
- * - write that lies completely beyond EOF
- * - write that covers the the page from start to EOF or beyond it
- */
- if ((pos_in_page == 0 && len == PAGE_SIZE) ||
- (pos >= i_size_read(inode)) ||
- (pos_in_page == 0 && (pos + len) >= i_size_read(inode))) {
- zero_user_segments(page, 0, pos_in_page,
- pos_in_page + len, PAGE_SIZE);
+ /* No need to read in some cases */
+ if (skip_page_read(page, pos, len))
break;
- }

/*
* We need to read it. If we get back -EINPROGRESS, then the page was
--
2.30.2

2021-06-28 18:19:12

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 106/110] certs: Add EFI_CERT_X509_GUID support for dbx entries

From: Eric Snowberg <[email protected]>

[ Upstream commit 56c5812623f95313f6a46fbf0beee7fa17c68bbf ]

This fixes CVE-2020-26541.

The Secure Boot Forbidden Signature Database, dbx, contains a list of now
revoked signatures and keys previously approved to boot with UEFI Secure
Boot enabled. The dbx is capable of containing any number of
EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID
entries.

Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are
skipped.

Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID
is found, it is added as an asymmetrical key to the .blacklist keyring.
Anytime the .platform keyring is used, the keys in the .blacklist keyring
are referenced, if a matching key is found, the key will be rejected.

[DH: Made the following changes:
- Added to have a config option to enable the facility. This allows a
Kconfig solution to make sure that pkcs7_validate_trust() is
enabled.[1][2]
- Moved the functions out from the middle of the blacklist functions.
- Added kerneldoc comments.]

Signed-off-by: Eric Snowberg <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
cc: Randy Dunlap <[email protected]>
cc: Mickaël Salaün <[email protected]>
cc: Arnd Bergmann <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # rfc
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3
Link: https://lore.kernel.org/r/[email protected]/ # v4
Link: https://lore.kernel.org/r/[email protected]/ # v5
Link: https://lore.kernel.org/r/161428672051.677100.11064981943343605138.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433310942.902181.4901864302675874242.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529605075.163428.14625520893961300757.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Signed-off-by: Sasha Levin <[email protected]>
---
certs/Kconfig | 9 ++++
certs/blacklist.c | 43 +++++++++++++++++++
certs/blacklist.h | 2 +
certs/system_keyring.c | 6 +++
include/keys/system_keyring.h | 15 +++++++
.../platform_certs/keyring_handler.c | 11 +++++
6 files changed, 86 insertions(+)

diff --git a/certs/Kconfig b/certs/Kconfig
index c94e93d8bccf..76e469b56a77 100644
--- a/certs/Kconfig
+++ b/certs/Kconfig
@@ -83,4 +83,13 @@ config SYSTEM_BLACKLIST_HASH_LIST
wrapper to incorporate the list into the kernel. Each <hash> should
be a string of hex digits.

+config SYSTEM_REVOCATION_LIST
+ bool "Provide system-wide ring of revocation certificates"
+ depends on SYSTEM_BLACKLIST_KEYRING
+ depends on PKCS7_MESSAGE_PARSER=y
+ help
+ If set, this allows revocation certificates to be stored in the
+ blacklist keyring and implements a hook whereby a PKCS#7 message can
+ be checked to see if it matches such a certificate.
+
endmenu
diff --git a/certs/blacklist.c b/certs/blacklist.c
index bffe4c6f4a9e..2b8644123d5f 100644
--- a/certs/blacklist.c
+++ b/certs/blacklist.c
@@ -145,6 +145,49 @@ int is_binary_blacklisted(const u8 *hash, size_t hash_len)
}
EXPORT_SYMBOL_GPL(is_binary_blacklisted);

+#ifdef CONFIG_SYSTEM_REVOCATION_LIST
+/**
+ * add_key_to_revocation_list - Add a revocation certificate to the blacklist
+ * @data: The data blob containing the certificate
+ * @size: The size of data blob
+ */
+int add_key_to_revocation_list(const char *data, size_t size)
+{
+ key_ref_t key;
+
+ key = key_create_or_update(make_key_ref(blacklist_keyring, true),
+ "asymmetric",
+ NULL,
+ data,
+ size,
+ ((KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW),
+ KEY_ALLOC_NOT_IN_QUOTA | KEY_ALLOC_BUILT_IN);
+
+ if (IS_ERR(key)) {
+ pr_err("Problem with revocation key (%ld)\n", PTR_ERR(key));
+ return PTR_ERR(key);
+ }
+
+ return 0;
+}
+
+/**
+ * is_key_on_revocation_list - Determine if the key for a PKCS#7 message is revoked
+ * @pkcs7: The PKCS#7 message to check
+ */
+int is_key_on_revocation_list(struct pkcs7_message *pkcs7)
+{
+ int ret;
+
+ ret = pkcs7_validate_trust(pkcs7, blacklist_keyring);
+
+ if (ret == 0)
+ return -EKEYREJECTED;
+
+ return -ENOKEY;
+}
+#endif
+
/*
* Initialise the blacklist
*/
diff --git a/certs/blacklist.h b/certs/blacklist.h
index 1efd6fa0dc60..51b320cf8574 100644
--- a/certs/blacklist.h
+++ b/certs/blacklist.h
@@ -1,3 +1,5 @@
#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <crypto/pkcs7.h>

extern const char __initconst *const blacklist_hashes[];
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 4b693da488f1..ed98754d5795 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -242,6 +242,12 @@ int verify_pkcs7_message_sig(const void *data, size_t len,
pr_devel("PKCS#7 platform keyring is not available\n");
goto error;
}
+
+ ret = is_key_on_revocation_list(pkcs7);
+ if (ret != -ENOKEY) {
+ pr_devel("PKCS#7 platform key is on revocation list\n");
+ goto error;
+ }
}
ret = pkcs7_validate_trust(pkcs7, trusted_keys);
if (ret < 0) {
diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
index fb8b07daa9d1..875e002a4180 100644
--- a/include/keys/system_keyring.h
+++ b/include/keys/system_keyring.h
@@ -31,6 +31,7 @@ extern int restrict_link_by_builtin_and_secondary_trusted(
#define restrict_link_by_builtin_and_secondary_trusted restrict_link_by_builtin_trusted
#endif

+extern struct pkcs7_message *pkcs7;
#ifdef CONFIG_SYSTEM_BLACKLIST_KEYRING
extern int mark_hash_blacklisted(const char *hash);
extern int is_hash_blacklisted(const u8 *hash, size_t hash_len,
@@ -49,6 +50,20 @@ static inline int is_binary_blacklisted(const u8 *hash, size_t hash_len)
}
#endif

+#ifdef CONFIG_SYSTEM_REVOCATION_LIST
+extern int add_key_to_revocation_list(const char *data, size_t size);
+extern int is_key_on_revocation_list(struct pkcs7_message *pkcs7);
+#else
+static inline int add_key_to_revocation_list(const char *data, size_t size)
+{
+ return 0;
+}
+static inline int is_key_on_revocation_list(struct pkcs7_message *pkcs7)
+{
+ return -ENOKEY;
+}
+#endif
+
#ifdef CONFIG_IMA_BLACKLIST_KEYRING
extern struct key *ima_blacklist_keyring;

diff --git a/security/integrity/platform_certs/keyring_handler.c b/security/integrity/platform_certs/keyring_handler.c
index c5ba695c10e3..5604bd57c990 100644
--- a/security/integrity/platform_certs/keyring_handler.c
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -55,6 +55,15 @@ static __init void uefi_blacklist_binary(const char *source,
uefi_blacklist_hash(source, data, len, "bin:", 4);
}

+/*
+ * Add an X509 cert to the revocation list.
+ */
+static __init void uefi_revocation_list_x509(const char *source,
+ const void *data, size_t len)
+{
+ add_key_to_revocation_list(data, len);
+}
+
/*
* Return the appropriate handler for particular signature list types found in
* the UEFI db and MokListRT tables.
@@ -76,5 +85,7 @@ __init efi_element_handler_t get_handler_for_dbx(const efi_guid_t *sig_type)
return uefi_blacklist_x509_tbs;
if (efi_guidcmp(*sig_type, efi_cert_sha256_guid) == 0)
return uefi_blacklist_binary;
+ if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
+ return uefi_revocation_list_x509;
return 0;
}
--
2.30.2

2021-06-28 18:19:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 107/110] certs: Move load_system_certificate_list to a common function

From: Eric Snowberg <[email protected]>

[ Upstream commit 2565ca7f5ec1a98d51eea8860c4ab923f1ca2c85 ]

Move functionality within load_system_certificate_list to a common
function, so it can be reused in the future.

DH Changes:
- Added inclusion of common.h to common.c (Eric [1]).

Signed-off-by: Eric Snowberg <[email protected]>
Acked-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/ # v5
Link: https://lore.kernel.org/r/161428672825.677100.7545516389752262918.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433311696.902181.3599366124784670368.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529605850.163428.7786675680201528556.stgit@warthog.procyon.org.uk/ # v3
Signed-off-by: Sasha Levin <[email protected]>
---
certs/Makefile | 2 +-
certs/common.c | 57 ++++++++++++++++++++++++++++++++++++++++++
certs/common.h | 9 +++++++
certs/system_keyring.c | 49 +++---------------------------------
4 files changed, 70 insertions(+), 47 deletions(-)
create mode 100644 certs/common.c
create mode 100644 certs/common.h

diff --git a/certs/Makefile b/certs/Makefile
index f4c25b67aad9..f4b90bad8690 100644
--- a/certs/Makefile
+++ b/certs/Makefile
@@ -3,7 +3,7 @@
# Makefile for the linux kernel signature checking certificates.
#

-obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o
+obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o common.o
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist.o
ifneq ($(CONFIG_SYSTEM_BLACKLIST_HASH_LIST),"")
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist_hashes.o
diff --git a/certs/common.c b/certs/common.c
new file mode 100644
index 000000000000..16a220887a53
--- /dev/null
+++ b/certs/common.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/kernel.h>
+#include <linux/key.h>
+#include "common.h"
+
+int load_certificate_list(const u8 cert_list[],
+ const unsigned long list_size,
+ const struct key *keyring)
+{
+ key_ref_t key;
+ const u8 *p, *end;
+ size_t plen;
+
+ p = cert_list;
+ end = p + list_size;
+ while (p < end) {
+ /* Each cert begins with an ASN.1 SEQUENCE tag and must be more
+ * than 256 bytes in size.
+ */
+ if (end - p < 4)
+ goto dodgy_cert;
+ if (p[0] != 0x30 &&
+ p[1] != 0x82)
+ goto dodgy_cert;
+ plen = (p[2] << 8) | p[3];
+ plen += 4;
+ if (plen > end - p)
+ goto dodgy_cert;
+
+ key = key_create_or_update(make_key_ref(keyring, 1),
+ "asymmetric",
+ NULL,
+ p,
+ plen,
+ ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
+ KEY_USR_VIEW | KEY_USR_READ),
+ KEY_ALLOC_NOT_IN_QUOTA |
+ KEY_ALLOC_BUILT_IN |
+ KEY_ALLOC_BYPASS_RESTRICTION);
+ if (IS_ERR(key)) {
+ pr_err("Problem loading in-kernel X.509 certificate (%ld)\n",
+ PTR_ERR(key));
+ } else {
+ pr_notice("Loaded X.509 cert '%s'\n",
+ key_ref_to_ptr(key)->description);
+ key_ref_put(key);
+ }
+ p += plen;
+ }
+
+ return 0;
+
+dodgy_cert:
+ pr_err("Problem parsing in-kernel X.509 certificate list\n");
+ return 0;
+}
diff --git a/certs/common.h b/certs/common.h
new file mode 100644
index 000000000000..abdb5795936b
--- /dev/null
+++ b/certs/common.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _CERT_COMMON_H
+#define _CERT_COMMON_H
+
+int load_certificate_list(const u8 cert_list[], const unsigned long list_size,
+ const struct key *keyring);
+
+#endif
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index ed98754d5795..0c9a4795e847 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -16,6 +16,7 @@
#include <keys/asymmetric-type.h>
#include <keys/system_keyring.h>
#include <crypto/pkcs7.h>
+#include "common.h"

static struct key *builtin_trusted_keys;
#ifdef CONFIG_SECONDARY_TRUSTED_KEYRING
@@ -137,54 +138,10 @@ device_initcall(system_trusted_keyring_init);
*/
static __init int load_system_certificate_list(void)
{
- key_ref_t key;
- const u8 *p, *end;
- size_t plen;
-
pr_notice("Loading compiled-in X.509 certificates\n");

- p = system_certificate_list;
- end = p + system_certificate_list_size;
- while (p < end) {
- /* Each cert begins with an ASN.1 SEQUENCE tag and must be more
- * than 256 bytes in size.
- */
- if (end - p < 4)
- goto dodgy_cert;
- if (p[0] != 0x30 &&
- p[1] != 0x82)
- goto dodgy_cert;
- plen = (p[2] << 8) | p[3];
- plen += 4;
- if (plen > end - p)
- goto dodgy_cert;
-
- key = key_create_or_update(make_key_ref(builtin_trusted_keys, 1),
- "asymmetric",
- NULL,
- p,
- plen,
- ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
- KEY_USR_VIEW | KEY_USR_READ),
- KEY_ALLOC_NOT_IN_QUOTA |
- KEY_ALLOC_BUILT_IN |
- KEY_ALLOC_BYPASS_RESTRICTION);
- if (IS_ERR(key)) {
- pr_err("Problem loading in-kernel X.509 certificate (%ld)\n",
- PTR_ERR(key));
- } else {
- pr_notice("Loaded X.509 cert '%s'\n",
- key_ref_to_ptr(key)->description);
- key_ref_put(key);
- }
- p += plen;
- }
-
- return 0;
-
-dodgy_cert:
- pr_err("Problem parsing in-kernel X.509 certificate list\n");
- return 0;
+ return load_certificate_list(system_certificate_list, system_certificate_list_size,
+ builtin_trusted_keys);
}
late_initcall(load_system_certificate_list);

--
2.30.2

2021-06-28 18:19:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 108/110] certs: Add ability to preload revocation certs

From: Eric Snowberg <[email protected]>

[ Upstream commit d1f044103dad70c1cec0a8f3abdf00834fec8b98 ]

Add a new Kconfig option called SYSTEM_REVOCATION_KEYS. If set,
this option should be the filename of a PEM-formated file containing
X.509 certificates to be included in the default blacklist keyring.

DH Changes:
- Make the new Kconfig option depend on SYSTEM_REVOCATION_LIST.
- Fix SYSTEM_REVOCATION_KEYS=n, but CONFIG_SYSTEM_REVOCATION_LIST=y[1][2].
- Use CONFIG_SYSTEM_REVOCATION_LIST for extract-cert[3].
- Use CONFIG_SYSTEM_REVOCATION_LIST for revocation_certificates.o[3].

Signed-off-by: Eric Snowberg <[email protected]>
Acked-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Randy Dunlap <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/[email protected]/ [3]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/ # v5
Link: https://lore.kernel.org/r/161428673564.677100.4112098280028451629.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433312452.902181.4146169951896577982.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529606657.163428.3340689182456495390.stgit@warthog.procyon.org.uk/ # v3
Signed-off-by: Sasha Levin <[email protected]>
---
certs/Kconfig | 8 ++++++++
certs/Makefile | 19 +++++++++++++++++--
certs/blacklist.c | 21 +++++++++++++++++++++
certs/revocation_certificates.S | 21 +++++++++++++++++++++
scripts/Makefile | 1 +
5 files changed, 68 insertions(+), 2 deletions(-)
create mode 100644 certs/revocation_certificates.S

diff --git a/certs/Kconfig b/certs/Kconfig
index 76e469b56a77..ab88d2a7f3c7 100644
--- a/certs/Kconfig
+++ b/certs/Kconfig
@@ -92,4 +92,12 @@ config SYSTEM_REVOCATION_LIST
blacklist keyring and implements a hook whereby a PKCS#7 message can
be checked to see if it matches such a certificate.

+config SYSTEM_REVOCATION_KEYS
+ string "X.509 certificates to be preloaded into the system blacklist keyring"
+ depends on SYSTEM_REVOCATION_LIST
+ help
+ If set, this option should be the filename of a PEM-formatted file
+ containing X.509 certificates to be included in the default blacklist
+ keyring.
+
endmenu
diff --git a/certs/Makefile b/certs/Makefile
index f4b90bad8690..b6db52ebf0be 100644
--- a/certs/Makefile
+++ b/certs/Makefile
@@ -4,7 +4,8 @@
#

obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o common.o
-obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist.o
+obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist.o common.o
+obj-$(CONFIG_SYSTEM_REVOCATION_LIST) += revocation_certificates.o
ifneq ($(CONFIG_SYSTEM_BLACKLIST_HASH_LIST),"")
obj-$(CONFIG_SYSTEM_BLACKLIST_KEYRING) += blacklist_hashes.o
else
@@ -29,7 +30,7 @@ $(obj)/x509_certificate_list: scripts/extract-cert $(SYSTEM_TRUSTED_KEYS_SRCPREF
$(call if_changed,extract_certs,$(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_TRUSTED_KEYS))
endif # CONFIG_SYSTEM_TRUSTED_KEYRING

-clean-files := x509_certificate_list .x509.list
+clean-files := x509_certificate_list .x509.list x509_revocation_list

ifeq ($(CONFIG_MODULE_SIG),y)
###############################################################################
@@ -104,3 +105,17 @@ targets += signing_key.x509
$(obj)/signing_key.x509: scripts/extract-cert $(X509_DEP) FORCE
$(call if_changed,extract_certs,$(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY))
endif # CONFIG_MODULE_SIG
+
+ifeq ($(CONFIG_SYSTEM_REVOCATION_LIST),y)
+
+$(eval $(call config_filename,SYSTEM_REVOCATION_KEYS))
+
+$(obj)/revocation_certificates.o: $(obj)/x509_revocation_list
+
+quiet_cmd_extract_certs = EXTRACT_CERTS $(patsubst "%",%,$(2))
+ cmd_extract_certs = scripts/extract-cert $(2) $@
+
+targets += x509_revocation_list
+$(obj)/x509_revocation_list: scripts/extract-cert $(SYSTEM_REVOCATION_KEYS_SRCPREFIX)$(SYSTEM_REVOCATION_KEYS_FILENAME) FORCE
+ $(call if_changed,extract_certs,$(SYSTEM_REVOCATION_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_REVOCATION_KEYS))
+endif
diff --git a/certs/blacklist.c b/certs/blacklist.c
index 2b8644123d5f..c9a435b15af4 100644
--- a/certs/blacklist.c
+++ b/certs/blacklist.c
@@ -17,9 +17,15 @@
#include <linux/uidgid.h>
#include <keys/system_keyring.h>
#include "blacklist.h"
+#include "common.h"

static struct key *blacklist_keyring;

+#ifdef CONFIG_SYSTEM_REVOCATION_LIST
+extern __initconst const u8 revocation_certificate_list[];
+extern __initconst const unsigned long revocation_certificate_list_size;
+#endif
+
/*
* The description must be a type prefix, a colon and then an even number of
* hex digits. The hash is kept in the description.
@@ -220,3 +226,18 @@ static int __init blacklist_init(void)
* Must be initialised before we try and load the keys into the keyring.
*/
device_initcall(blacklist_init);
+
+#ifdef CONFIG_SYSTEM_REVOCATION_LIST
+/*
+ * Load the compiled-in list of revocation X.509 certificates.
+ */
+static __init int load_revocation_certificate_list(void)
+{
+ if (revocation_certificate_list_size)
+ pr_notice("Loading compiled-in revocation X.509 certificates\n");
+
+ return load_certificate_list(revocation_certificate_list, revocation_certificate_list_size,
+ blacklist_keyring);
+}
+late_initcall(load_revocation_certificate_list);
+#endif
diff --git a/certs/revocation_certificates.S b/certs/revocation_certificates.S
new file mode 100644
index 000000000000..f21aae8a8f0e
--- /dev/null
+++ b/certs/revocation_certificates.S
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/export.h>
+#include <linux/init.h>
+
+ __INITRODATA
+
+ .align 8
+ .globl revocation_certificate_list
+revocation_certificate_list:
+__revocation_list_start:
+ .incbin "certs/x509_revocation_list"
+__revocation_list_end:
+
+ .align 8
+ .globl revocation_certificate_list_size
+revocation_certificate_list_size:
+#ifdef CONFIG_64BIT
+ .quad __revocation_list_end - __revocation_list_start
+#else
+ .long __revocation_list_end - __revocation_list_start
+#endif
diff --git a/scripts/Makefile b/scripts/Makefile
index c36106bce80e..9adb6d247818 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -14,6 +14,7 @@ hostprogs-always-$(CONFIG_ASN1) += asn1_compiler
hostprogs-always-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
hostprogs-always-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
hostprogs-always-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
+hostprogs-always-$(CONFIG_SYSTEM_REVOCATION_LIST) += extract-cert

HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
--
2.30.2

2021-06-28 18:19:17

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 109/110] integrity: Load mokx variables into the blacklist keyring

From: Eric Snowberg <[email protected]>

[ Upstream commit ebd9c2ae369a45bdd9f8615484db09be58fc242b ]

During boot the Secure Boot Forbidden Signature Database, dbx,
is loaded into the blacklist keyring. Systems booted with shim
have an equivalent Forbidden Signature Database called mokx.
Currently mokx is only used by shim and grub, the contents are
ignored by the kernel.

Add the ability to load mokx into the blacklist keyring during boot.

Signed-off-by: Eric Snowberg <[email protected]>
Suggested-by: James Bottomley <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/c33c8e3839a41e9654f41cc92c7231104931b1d7.camel@HansenPartnership.com/
Link: https://lore.kernel.org/r/[email protected]/ # v5
Link: https://lore.kernel.org/r/161428674320.677100.12637282414018170743.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161433313205.902181.2502803393898221637.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161529607422.163428.13530426573612578854.stgit@warthog.procyon.org.uk/ # v3
Signed-off-by: Sasha Levin <[email protected]>
---
security/integrity/platform_certs/load_uefi.c | 20 +++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/security/integrity/platform_certs/load_uefi.c b/security/integrity/platform_certs/load_uefi.c
index ee4b4c666854..f290f78c3f30 100644
--- a/security/integrity/platform_certs/load_uefi.c
+++ b/security/integrity/platform_certs/load_uefi.c
@@ -132,8 +132,9 @@ static int __init load_moklist_certs(void)
static int __init load_uefi_certs(void)
{
efi_guid_t secure_var = EFI_IMAGE_SECURITY_DATABASE_GUID;
- void *db = NULL, *dbx = NULL;
- unsigned long dbsize = 0, dbxsize = 0;
+ efi_guid_t mok_var = EFI_SHIM_LOCK_GUID;
+ void *db = NULL, *dbx = NULL, *mokx = NULL;
+ unsigned long dbsize = 0, dbxsize = 0, mokxsize = 0;
efi_status_t status;
int rc = 0;

@@ -175,6 +176,21 @@ static int __init load_uefi_certs(void)
kfree(dbx);
}

+ mokx = get_cert_list(L"MokListXRT", &mok_var, &mokxsize, &status);
+ if (!mokx) {
+ if (status == EFI_NOT_FOUND)
+ pr_debug("mokx variable wasn't found\n");
+ else
+ pr_info("Couldn't get mokx list\n");
+ } else {
+ rc = parse_efi_signature_list("UEFI:MokListXRT",
+ mokx, mokxsize,
+ get_handler_for_dbx);
+ if (rc)
+ pr_err("Couldn't parse mokx signatures %d\n", rc);
+ kfree(mokx);
+ }
+
/* Load the MokListRT certs */
rc = load_moklist_certs();

--
2.30.2

2021-06-28 18:20:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 093/110] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block

From: Hugh Dickins <[email protected]>

commit e2e1d4076c77b3671cf8ce702535ae7dee3acf89 upstream.

page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
follow the same "return not_found, return not_found, return true"
pattern as the block above it (note: returning not_found there is never
premature, since existence or prior existence of huge pmd guarantees
good alignment).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 825c97a479e6..96cef824f52f 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -198,24 +198,22 @@ restart:
if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(pmde)) {
- if (thp_migration_supported()) {
- if (!(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(pmde))) {
- swp_entry_t entry = pmd_to_swp_entry(pmde);
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;

- if (migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- }
- return not_found(pvmw);
- } else {
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
}
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
/*
* If PVMW_SYNC, take and drop THP pmd lock so that we
--
2.30.2

2021-06-28 18:47:23

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 098/110] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes

From: Hugh Dickins <[email protected]>

commit a9a7504d9beaf395481faa91e70e2fd08f7a3dde upstream.

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

Crash dumps showed two tail pages of a shmem huge page remained mapped
by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
of a long unmapped range; and no page table had yet been allocated for
the head of the huge page to be mapped into.

Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in
the next.

Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before. Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times
less often: because of the way redundant levels are folded together, but
folded differently in different configurations, it was just too
difficult to use them correctly; and step_forward() is simpler anyway.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_vma_mapped.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index b4c731ca2752..16f139f83ebf 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -116,6 +116,13 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
return pfn_is_match(pvmw->page, pfn);
}

+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size)
+{
+ pvmw->address = (pvmw->address + size) & ~(size - 1);
+ if (!pvmw->address)
+ pvmw->address = ULONG_MAX;
+}
+
/**
* page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at
* @pvmw->address
@@ -183,16 +190,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;
restart:
- {
+ do {
pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
+ if (!pgd_present(*pgd)) {
+ step_forward(pvmw, PGDIR_SIZE);
+ continue;
+ }
p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
+ if (!p4d_present(*p4d)) {
+ step_forward(pvmw, P4D_SIZE);
+ continue;
+ }
pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
+ if (!pud_present(*pud)) {
+ step_forward(pvmw, PUD_SIZE);
+ continue;
+ }

pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
@@ -239,7 +252,8 @@ restart:

spin_unlock(ptl);
}
- return false;
+ step_forward(pvmw, PMD_SIZE);
+ continue;
}
if (!map_pte(pvmw))
goto next_pte;
@@ -269,7 +283,9 @@ next_pte:
spin_lock(pvmw->ptl);
}
goto this_pte;
- }
+ } while (pvmw->address < end);
+
+ return false;
}

/**
--
2.30.2

2021-06-28 18:47:33

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 5.12 100/110] mm, futex: fix shared futex pgoff on shmem huge page

From: Hugh Dickins <[email protected]>

commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 upstream.

If more than one futex is placed on a shmem huge page, it can happen
that waking the second wakes the first instead, and leaves the second
waiting: the key's shared.pgoff is wrong.

When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when
generating futex_key"), the only shared huge pages came from hugetlbfs,
and the code added to deal with its exceptional page->index was put into
hugetlb source. Then that was missed when 4.8 added shmem huge pages.

page_to_pgoff() is what others use for this nowadays: except that, as
currently written, it gives the right answer on hugetlbfs head, but
nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific
hugetlb_basepage_index() on PageHuge tails as well as on head.

Yes, it's unconventional to declare hugetlb_basepage_index() there in
pagemap.h, rather than in hugetlb.h; but I do not expect anything but
page_to_pgoff() ever to need it.

[[email protected]: give hugetlb_basepage_index() prototype the correct scope]

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Reported-by: Neel Natu <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/hugetlb.h | 16 ----------------
include/linux/pagemap.h | 13 +++++++------
kernel/futex.c | 3 +--
mm/hugetlb.c | 5 +----
4 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5dae4187210d..28fa3f9bbbfd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -728,17 +728,6 @@ static inline int hstate_index(struct hstate *h)
return h - hstates;
}

-pgoff_t __basepage_index(struct page *page);
-
-/* Return page->index in PAGE_SIZE units */
-static inline pgoff_t basepage_index(struct page *page)
-{
- if (!PageCompound(page))
- return page->index;
-
- return __basepage_index(page);
-}
-
extern int dissolve_free_huge_page(struct page *page);
extern int dissolve_free_huge_pages(unsigned long start_pfn,
unsigned long end_pfn);
@@ -969,11 +958,6 @@ static inline int hstate_index(struct hstate *h)
return 0;
}

-static inline pgoff_t basepage_index(struct page *page)
-{
- return page->index;
-}
-
static inline int dissolve_free_huge_page(struct page *page)
{
return 0;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8c9947fd62f3..e0023e5f9aa6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -501,7 +501,7 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
}

/*
- * Get index of the page with in radix-tree
+ * Get index of the page within radix-tree (but not for hugetlb pages).
* (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE)
*/
static inline pgoff_t page_to_index(struct page *page)
@@ -520,15 +520,16 @@ static inline pgoff_t page_to_index(struct page *page)
return pgoff;
}

+extern pgoff_t hugetlb_basepage_index(struct page *page);
+
/*
- * Get the offset in PAGE_SIZE.
- * (TODO: hugepage should have ->index in PAGE_SIZE)
+ * Get the offset in PAGE_SIZE (even for hugetlb pages).
+ * (TODO: hugetlb pages should have ->index in PAGE_SIZE)
*/
static inline pgoff_t page_to_pgoff(struct page *page)
{
- if (unlikely(PageHeadHuge(page)))
- return page->index << compound_order(page);
-
+ if (unlikely(PageHuge(page)))
+ return hugetlb_basepage_index(page);
return page_to_index(page);
}

diff --git a/kernel/futex.c b/kernel/futex.c
index a8629b695d38..5aa6d0a6c767 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -35,7 +35,6 @@
#include <linux/jhash.h>
#include <linux/pagemap.h>
#include <linux/syscalls.h>
-#include <linux/hugetlb.h>
#include <linux/freezer.h>
#include <linux/memblock.h>
#include <linux/fault-inject.h>
@@ -650,7 +649,7 @@ again:

key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.i_seq = get_inode_sequence_number(inode);
- key->shared.pgoff = basepage_index(tail);
+ key->shared.pgoff = page_to_pgoff(tail);
rcu_read_unlock();
}

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3da4817190f3..7ba7d9b20494 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1584,15 +1584,12 @@ struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage)
return NULL;
}

-pgoff_t __basepage_index(struct page *page)
+pgoff_t hugetlb_basepage_index(struct page *page)
{
struct page *page_head = compound_head(page);
pgoff_t index = page_index(page_head);
unsigned long compound_idx;

- if (!PageHuge(page_head))
- return page_index(page);
-
if (compound_order(page_head) >= MAX_ORDER)
compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
else
--
2.30.2

2021-06-28 23:41:14

by Fox Chen

[permalink] [raw]
Subject: RE: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, 28 Jun 2021 10:16:38 -0400, Sasha Levin <[email protected]> wrote:
>
> This is the start of the stable review cycle for the 5.12.14 release.
> There are 110 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
>

5.12.14-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)

Tested-by: Fox Chen <[email protected]>

2021-06-29 00:32:51

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 5.12 019/110] drm/kmb: Fix error return code in kmb_hw_init()

On Mon, Jun 28, 2021 at 04:29:37PM +0000, Chrisanthus, Anitha wrote:
>This patch is already pushed to drm-misc-fixes. Please check for existing patches in the dri-devel mail list before sending patches.

Indeed, it's even upstream at this point.

This mail is a review for the stable trees rather than Linus's tree.

--
Thanks,
Sasha

2021-06-29 00:38:41

by Justin Forbes

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, Jun 28, 2021 at 10:16:38AM -0400, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 5.12.14 release.
> There are 110 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
>

Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le,
s390x, x86_64), and boot tested x86_64. No regressions noted.

Tested-by: Justin M. Forbes <[email protected]>

2021-06-29 06:13:12

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, 28 Jun 2021 at 19:48, Sasha Levin <[email protected]> wrote:
>
>
> This is the start of the stable review cycle for the 5.12.14 release.
> There are 110 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 5.12.14-rc1
* git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-5.12.y
* git commit: 5f4499a4d0cb6da10815c298938bce95915a3388
* git describe: v5.12.13-110-g5f4499a4d0cb
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.12.y/build/v5.12.13-110-g5f4499a4d0cb

## No regressions (compared to v5.12.13)

## No fixes (compared to v5.12.13)

## Test result summary
total: 83608, pass: 68828, fail: 2262, skip: 11832, xfail: 686,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 193 total, 193 passed, 0 failed
* arm64: 27 total, 27 passed, 0 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 26 total, 26 passed, 0 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 18 total, 18 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 1 total, 0 passed, 1 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 27 total, 27 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-breakpoints
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-drivers
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* packetdrill
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

2021-06-29 18:24:06

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, Jun 28, 2021 at 10:16:38AM -0400, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 5.12.14 release.
> There are 110 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
> Anything received after that time might be too late.
>

Build results:
total: 151 pass: 151 fail: 0
Qemu test results:
total: 462 pass: 462 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2021-06-30 12:44:24

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, Jun 28, 2021 at 11:21:02AM -0700, Fox Chen wrote:
>On Mon, 28 Jun 2021 10:16:38 -0400, Sasha Levin <[email protected]> wrote:
>>
>> This is the start of the stable review cycle for the 5.12.14 release.
>> There are 110 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
>> or in the git tree and branch at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
>> and the diffstat can be found below.
>>
>> Thanks,
>> Sasha
>>
>
>5.12.14-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
>
>Tested-by: Fox Chen <[email protected]>

Thanks for testing Fox!

--
Thanks,
Sasha

2021-06-30 12:46:14

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Mon, Jun 28, 2021 at 05:53:54PM -0500, Justin Forbes wrote:
>On Mon, Jun 28, 2021 at 10:16:38AM -0400, Sasha Levin wrote:
>>
>> This is the start of the stable review cycle for the 5.12.14 release.
>> There are 110 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
>> or in the git tree and branch at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
>> and the diffstat can be found below.
>>
>> Thanks,
>> Sasha
>>
>
>Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le,
>s390x, x86_64), and boot tested x86_64. No regressions noted.
>
>Tested-by: Justin M. Forbes <[email protected]>

Thanks for testing Justin!

--
Thanks,
Sasha

2021-06-30 12:46:46

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Tue, Jun 29, 2021 at 11:21:02AM -0700, Guenter Roeck wrote:
>On Mon, Jun 28, 2021 at 10:16:38AM -0400, Sasha Levin wrote:
>>
>> This is the start of the stable review cycle for the 5.12.14 release.
>> There are 110 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
>> Anything received after that time might be too late.
>>
>
>Build results:
> total: 151 pass: 151 fail: 0
>Qemu test results:
> total: 462 pass: 462 fail: 0
>
>Tested-by: Guenter Roeck <[email protected]>

Thanks for testing all of the kernels Guenter!

--
Thanks,
Sasha

2021-06-30 12:47:30

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 5.12 000/110] 5.12.14-rc1 review

On Tue, Jun 29, 2021 at 11:40:47AM +0530, Naresh Kamboju wrote:
>On Mon, 28 Jun 2021 at 19:48, Sasha Levin <[email protected]> wrote:
>>
>>
>> This is the start of the stable review cycle for the 5.12.14 release.
>> There are 110 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Wed 30 Jun 2021 02:18:05 PM UTC.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.12.y&id2=v5.12.13
>> or in the git tree and branch at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
>> and the diffstat can be found below.
>>
>> Thanks,
>> Sasha
>
>Results from Linaro’s test farm.
>No regressions on arm64, arm, x86_64, and i386.
>
>Tested-by: Linux Kernel Functional Testing <[email protected]>

Thanks for testing Naresh!

--
Thanks,
Sasha