2024-03-13 17:08:12

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 00/71] 6.1.82-rc1 review


This is the start of the stable review cycle for the 6.1.82 release.
There are 71 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.

Thanks,
Sasha

-------------
Pseudo-Shortlog of commits:

Breno Leitao (1):
blk-iocost: Pass gendisk to ioc_refresh_params

Christian Borntraeger (1):
KVM: s390: vsie: fix race during shadow creation

Christoph Hellwig (6):
blk-wbt: pass a gendisk to wbt_{enable,disable}_default
blk-wbt: pass a gendisk to wbt_init
blk-rq-qos: move rq_qos_add and rq_qos_del out of line
blk-rq-qos: make rq_qos_add and rq_qos_del more useful
blk-rq-qos: constify rq_qos_ops
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos

Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down

Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()

Fangzhi Zuo (1):
drm/amd/display: Fix MST Null Ptr for RV

Florian Kauer (1):
igc: avoid returning frame twice in XDP_REDIRECT

Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family

Friedrich Vock (1):
drm/amdgpu: Reset IH OVERFLOW_CLEAR bit

Gao Xiang (1):
erofs: apply proper VMA alignment for memory mapped files on THP

Horatiu Vultur (1):
net: sparx5: Fix use after free inside sparx5_del_mact_entry

Hui Zhou (1):
nfp: flower: add hardware offload check for post ct entry

Jacob Keller (1):
ice: virtchnl: stop pretending to support RSS over AQ or registers

Jan Kara (2):
readahead: avoid multiple marked readahead pages
blk-wbt: Fix detection of dirty-throttled tasks

Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around
sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around
sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read

Johan Hovold (1):
ASoC: codecs: wcd938x: fix headphones volume controls

Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of
range

Ma Hanghong (1):
drm/amd/display: Wrong colorimetry workaround

Maciej Fijalkowski (3):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ice: reorder disabling IRQ and NAPI in ice_qp_dis

Mathias Nyman (1):
xhci: process isoc TD properly when there was a transaction error mid
TD.

Matthieu Baerts (NGI0) (1):
selftests: mptcp: decrease BW in simult flows

Michal Pecio (1):
xhci: handle isoc Babble and Buffer Overrun events properly

Muhammad Usama Anjum (1):
selftests/mm: switch to bash from sh

Nico Boehr (1):
KVM: s390: add stat counter for shadow gmap events

Nico Pache (1):
selftests: mm: fix map_hugetlb failure on 64K page size systems

Oleg Nesterov (7):
getrusage: add the "signal_struct *sig" local variable
getrusage: move thread_group_cputime_adjusted() outside of
lock_task_sighand()
getrusage: use __for_each_thread()
getrusage: use sig->stats_lock rather than lock_task_sighand()
fs/proc: do_task_stat: use __for_each_thread()
fs/proc: do_task_stat: use sig->stats_lock to gather the
threads/children stats
exit: wait_task_zombie: kill the no longer necessary
spin_lock_irq(siglock)

Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop

Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests

Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in
ice_bridge_setlink()

Sasha Levin (1):
Linux 6.1.82-rc1

Srinivasan Shanmugam (1):
drm/amd/display: Fix uninitialized variable usage in core_link_
'read_dpcd() & write_dpcd()' functions

Steven Rostedt (Google) (1):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string

Tobias Jakobi (Compleo) (1):
net: dsa: microchip: fix register write order in ksz8_ind_write8()

Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program

Wentao Jia (1):
nfp: flower: add goto_chain_index for ct entry

Xiubo Li (1):
ceph: switch to corrected encoding of max_xattr_size in mdsmap

Yu Kuai (6):
blk-iocost: disable writeback throttling
elevator: remove redundant code in elv_unregister_queue()
blk-wbt: remove unnecessary check in wbt_enable_default()
elevator: add new field flags in struct elevator_queue
blk-wbt: don't enable throttling if default elevator is bfq
blk-wbt: fix that wbt can't be disabled by default

.../ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 21 ++++
Makefile | 4 +-
arch/s390/include/asm/kvm_host.h | 7 ++
arch/s390/kvm/gaccess.c | 7 ++
arch/s390/kvm/kvm-s390.c | 9 +-
arch/s390/kvm/vsie.c | 6 +-
arch/s390/mm/gmap.c | 1 +
arch/x86/Kconfig | 11 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++-
arch/x86/kvm/x86.c | 5 +-
block/bfq-iosched.c | 6 +-
block/blk-iocost.c | 51 +++++----
block/blk-iolatency.c | 30 ++---
block/blk-mq-debugfs.c | 10 +-
block/blk-rq-qos.c | 67 +++++++++++
block/blk-rq-qos.h | 66 +----------
block/blk-sysfs.c | 4 +-
block/blk-wbt.c | 50 +++++----
block/blk-wbt.h | 12 +-
block/elevator.c | 8 +-
block/elevator.h | 5 +-
drivers/base/cpu.c | 8 ++
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 6 +
drivers/gpu/drm/amd/amdgpu/cz_ih.c | 5 +
drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 5 +
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 +
drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 +
drivers/gpu/drm/amd/amdgpu/si_ih.c | 6 +
drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 6 +
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 +
drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 6 +
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 17 ++-
.../drm/amd/display/dc/core/dc_link_dpcd.c | 4 +-
.../gpu/drm/amd/display/dc/core/dc_resource.c | 6 +
.../amd/display/modules/inc/mod_info_packet.h | 3 +-
.../display/modules/info_packet/info_packet.c | 6 +-
drivers/net/dsa/microchip/ksz8795.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +-
.../intel/ice/ice_virtchnl_allowlist.c | 2 -
drivers/net/ethernet/intel/ice/ice_xsk.c | 9 +-
drivers/net/ethernet/intel/igc/igc_main.c | 13 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++--
.../microchip/sparx5/sparx5_mactable.c | 4 +-
.../ethernet/netronome/nfp/flower/conntrack.c | 28 ++++-
.../ethernet/netronome/nfp/flower/conntrack.h | 2 +
drivers/net/geneve.c | 18 ++-
drivers/net/usb/lan78xx.c | 3 +-
drivers/usb/host/xhci-ring.c | 80 +++++++++++---
drivers/usb/host/xhci.h | 1 +
fs/ceph/mdsmap.c | 7 +-
fs/erofs/data.c | 1 +
fs/proc/array.c | 57 +++++-----
include/linux/backing-dev-defs.h | 7 +-
include/linux/ceph/mdsmap.h | 6 +-
include/linux/cpu.h | 2 +
include/trace/events/qdisc.h | 20 ++--
kernel/bpf/cpumap.c | 2 +-
kernel/exit.c | 10 +-
kernel/sys.c | 91 ++++++++-------
mm/backing-dev.c | 2 +-
mm/page-writeback.c | 2 +-
mm/readahead.c | 4 +-
net/ipv6/route.c | 21 ++--
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +--
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
sound/soc/codecs/wcd938x.c | 2 +-
.../selftests/net/mptcp/simult_flows.sh | 8 +-
.../selftests/vm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/vm/map_hugetlb.c | 7 ++
.../selftests/vm/write_hugetlb_memory.sh | 2 +-
86 files changed, 903 insertions(+), 365 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

--
2.43.0



2024-03-13 17:08:38

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 20/71] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser

From: Jason Xing <[email protected]>

[ Upstream commit cfd9f4a740f772298308b2e6070d2c744fb5cf79 ]

We need to protect the reader reading the sysctl value
because the value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 6f709fdffc11f..b8ddd8048f352 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -766,7 +766,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
if (ax25 != NULL) {
ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
ax25->ax25_dev->dev, 0,
- sysctl_netrom_obsolescence_count_initialiser);
+ READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
if (ret)
return ret;
}
--
2.43.0


2024-03-13 17:09:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 09/71] ice: virtchnl: stop pretending to support RSS over AQ or registers

From: Jacob Keller <[email protected]>

[ Upstream commit 2652b99e43403dc464f3648483ffb38e48872fe4 ]

The E800 series hardware uses the same iAVF driver as older devices,
including the virtchnl negotiation scheme.

This negotiation scheme includes a mechanism to determine what type of RSS
should be supported, including RSS over PF virtchnl messages, RSS over
firmware AdminQ messages, and RSS via direct register access.

The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
supported by the VF driver. However, if an older VF driver is loaded, it
may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.

The ice driver happily agrees to support these methods. Unfortunately, the
underlying hardware does not support these mechanisms. The E800 series VFs
don't have the appropriate registers for RSS_REG. The mailbox queue used by
VFs for VF to PF communication blocks messages which do not have the
VF-to-PF opcode.

Stop lying to the VF that it could support RSS over AdminQ or registers, as
these interfaces do not work when the hardware is operating on an E800
series device.

In practice this is unlikely to be hit by any normal user. The iAVF driver
has supported RSS over PF virtchnl commands since 2016, and always defaults
to using RSS_PF if possible.

In principle, nothing actually stops the existing VF from attempting to
access the registers or send an AQ command. However a properly coded VF
will check the capability flags and will report a more useful error if it
detects a case where the driver does not support the RSS offloads that it
does.

Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Alan Brady <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +--------
drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c | 2 --
2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 6c03ebf81ffda..4b71392f60df1 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -440,7 +440,6 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vf->driver_caps = *(u32 *)msg;
else
vf->driver_caps = VIRTCHNL_VF_OFFLOAD_L2 |
- VIRTCHNL_VF_OFFLOAD_RSS_REG |
VIRTCHNL_VF_OFFLOAD_VLAN;

vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
@@ -453,14 +452,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vfres->vf_cap_flags |= ice_vc_get_vlan_caps(hw, vf, vsi,
vf->driver_caps);

- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
+ if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
- } else {
- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ)
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
- else
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
- }

if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_FDIR_PF)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_FDIR_PF;
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 5a82216e7d034..63e83e8b97e55 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -13,8 +13,6 @@
* - opcodes needed by VF when caps are activated
*
* Caps that don't use new opcodes (no opcodes should be allowed):
- * - VIRTCHNL_VF_OFFLOAD_RSS_AQ
- * - VIRTCHNL_VF_OFFLOAD_RSS_REG
* - VIRTCHNL_VF_OFFLOAD_WB_ON_ITR
* - VIRTCHNL_VF_OFFLOAD_CRC
* - VIRTCHNL_VF_OFFLOAD_RX_POLLING
--
2.43.0


2024-03-13 17:09:35

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 24/71] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay

From: Jason Xing <[email protected]>

[ Upstream commit 806f462ba9029d41aadf8ec93f2f99c5305deada ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index be404ace98786..7428ea436e318 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -455,7 +455,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t1 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
nr->t2 =
- msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_acknowledge_delay));
nr->n2 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
--
2.43.0


2024-03-13 17:10:31

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 28/71] netrom: Fix a data-race around sysctl_netrom_routing_control

From: Jason Xing <[email protected]>

[ Upstream commit b5dffcb8f71bdd02a4e5799985b51b12f4eeaf76 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b8ddd8048f352..89e12e6eea2ef 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -780,7 +780,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
return ret;
}

- if (!sysctl_netrom_routing_control && ax25 != NULL)
+ if (!READ_ONCE(sysctl_netrom_routing_control) && ax25 != NULL)
return 0;

/* Its Time-To-Live has expired */
--
2.43.0


2024-03-13 17:11:34

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 31/71] KVM: s390: add stat counter for shadow gmap events

From: Nico Boehr <[email protected]>

[ Upstream commit c3235e2dd6956448a562d6b1112205eeebc8ab43 ]

The shadow gmap tracks memory of nested guests (guest-3). In certain
scenarios, the shadow gmap needs to be rebuilt, which is a costly operation
since it involves a SIE exit into guest-1 for every entry in the respective
shadow level.

Add kvm stat counters when new shadow structures are created at various
levels. Also add a counter gmap_shadow_create when a completely fresh
shadow gmap is created as well as a counter gmap_shadow_reuse when an
existing gmap is being reused.

Note that when several levels are shadowed at once, counters on all
affected levels will be increased.

Also note that not all page table levels need to be present and a ASCE
can directly point to e.g. a segment table. In this case, a new segment
table will always be equivalent to a new shadow gmap and hence will be
counted as gmap_shadow_create and not as gmap_shadow_segment.

Signed-off-by: Nico Boehr <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Janosch Frank <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Stable-dep-of: fe752331d4b3 ("KVM: s390: vsie: fix race during shadow creation")
Signed-off-by: Sasha Levin <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 7 +++++++
arch/s390/kvm/gaccess.c | 7 +++++++
arch/s390/kvm/kvm-s390.c | 9 ++++++++-
arch/s390/kvm/vsie.c | 5 ++++-
4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b1e98a9ed152b..09abf000359f8 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -777,6 +777,13 @@ struct kvm_vm_stat {
u64 inject_service_signal;
u64 inject_virtio;
u64 aen_forward;
+ u64 gmap_shadow_create;
+ u64 gmap_shadow_reuse;
+ u64 gmap_shadow_r1_entry;
+ u64 gmap_shadow_r2_entry;
+ u64 gmap_shadow_r3_entry;
+ u64 gmap_shadow_sg_entry;
+ u64 gmap_shadow_pg_entry;
};

struct kvm_arch_memory_slot {
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 0243b6e38d364..3beceff5f1c09 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -1273,6 +1273,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
unsigned long *pgt, int *dat_protection,
int *fake)
{
+ struct kvm *kvm;
struct gmap *parent;
union asce asce;
union vaddress vaddr;
@@ -1281,6 +1282,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,

*fake = 0;
*dat_protection = 0;
+ kvm = sg->private;
parent = sg->parent;
vaddr.addr = saddr;
asce.val = sg->orig_asce;
@@ -1341,6 +1343,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r1_entry++;
}
fallthrough;
case ASCE_TYPE_REGION2: {
@@ -1369,6 +1372,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r2_entry++;
}
fallthrough;
case ASCE_TYPE_REGION3: {
@@ -1406,6 +1410,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r3_entry++;
}
fallthrough;
case ASCE_TYPE_SEGMENT: {
@@ -1439,6 +1444,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_sg_entry++;
}
}
/* Return the parent address of the page table */
@@ -1509,6 +1515,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
pte.p |= dat_protection;
if (!rc)
rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
+ vcpu->kvm->stat.gmap_shadow_pg_entry++;
ipte_unlock(vcpu->kvm);
mmap_read_unlock(sg->mm);
return rc;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f604946ab2c85..348d49268a7ec 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -66,7 +66,14 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
STATS_DESC_COUNTER(VM, inject_virtio),
- STATS_DESC_COUNTER(VM, aen_forward)
+ STATS_DESC_COUNTER(VM, aen_forward),
+ STATS_DESC_COUNTER(VM, gmap_shadow_reuse),
+ STATS_DESC_COUNTER(VM, gmap_shadow_create),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r1_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r2_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r3_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_sg_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_pg_entry),
};

const struct kvm_stats_header kvm_vm_stats_header = {
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 740f8b56e63f9..b2dbf08a961e5 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1206,8 +1206,10 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
* we're holding has been unshadowed. If the gmap is still valid,
* we can safely reuse it.
*/
- if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat))
+ if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat)) {
+ vcpu->kvm->stat.gmap_shadow_reuse++;
return 0;
+ }

/* release the old shadow - if any, and mark the prefix as unmapped */
release_gmap_shadow(vsie_page);
@@ -1215,6 +1217,7 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;
+ vcpu->kvm->stat.gmap_shadow_create++;
WRITE_ONCE(vsie_page->gmap, gmap);
return 0;
}
--
2.43.0


2024-03-13 17:12:08

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 35/71] nfp: flower: add goto_chain_index for ct entry

From: Wentao Jia <[email protected]>

[ Upstream commit 3e44d19934b92398785b3ffc2353b9eba264140e ]

The chain_index has different means in pre ct entry and post ct entry.
In pre ct entry, it means chain index, but in post ct entry, it means
goto chain index, it is confused.

chain_index and goto_chain_index may be present in one flow rule, It
cannot be distinguished by one field chain_index, both chain_index
and goto_chain_index are required in the follow-up patch to support
multiple ct zones

Another field goto_chain_index is added to record the goto chain index.
If no goto action in post ct entry, goto_chain_index is 0.

Signed-off-by: Wentao Jia <[email protected]>
Acked-by: Simon Horman <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Stable-dep-of: cefa98e806fd ("nfp: flower: add hardware offload check for post ct entry")
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/netronome/nfp/flower/conntrack.c | 8 ++++++--
drivers/net/ethernet/netronome/nfp/flower/conntrack.h | 2 ++
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
index 7af03b45555dd..da7a47416a208 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
@@ -1243,7 +1243,7 @@ static int nfp_ct_do_tc_merge(struct nfp_fl_ct_zone_entry *zt,
/* Checks that the chain_index of the filter matches the
* chain_index of the GOTO action.
*/
- if (post_ct_entry->chain_index != pre_ct_entry->chain_index)
+ if (post_ct_entry->chain_index != pre_ct_entry->goto_chain_index)
return -EINVAL;

err = nfp_ct_merge_check(pre_ct_entry, post_ct_entry);
@@ -1776,7 +1776,8 @@ int nfp_fl_ct_handle_pre_ct(struct nfp_flower_priv *priv,
if (IS_ERR(ct_entry))
return PTR_ERR(ct_entry);
ct_entry->type = CT_TYPE_PRE_CT;
- ct_entry->chain_index = ct_goto->chain_index;
+ ct_entry->chain_index = flow->common.chain_index;
+ ct_entry->goto_chain_index = ct_goto->chain_index;
list_add(&ct_entry->list_node, &zt->pre_ct_list);
zt->pre_ct_count++;

@@ -1799,6 +1800,7 @@ int nfp_fl_ct_handle_post_ct(struct nfp_flower_priv *priv,
struct nfp_fl_ct_zone_entry *zt;
bool wildcarded = false;
struct flow_match_ct ct;
+ struct flow_action_entry *ct_goto;

flow_rule_match_ct(rule, &ct);
if (!ct.mask->ct_zone) {
@@ -1823,6 +1825,8 @@ int nfp_fl_ct_handle_post_ct(struct nfp_flower_priv *priv,

ct_entry->type = CT_TYPE_POST_CT;
ct_entry->chain_index = flow->common.chain_index;
+ ct_goto = get_flow_act(flow->rule, FLOW_ACTION_GOTO);
+ ct_entry->goto_chain_index = ct_goto ? ct_goto->chain_index : 0;
list_add(&ct_entry->list_node, &zt->post_ct_list);
zt->post_ct_count++;

diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.h b/drivers/net/ethernet/netronome/nfp/flower/conntrack.h
index 762c0b36e269b..9440ab776ecea 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.h
@@ -112,6 +112,7 @@ enum nfp_nfp_layer_name {
* @cookie: Flow cookie, same as original TC flow, used as key
* @list_node: Used by the list
* @chain_index: Chain index of the original flow
+ * @goto_chain_index: goto chain index of the flow
* @netdev: netdev structure.
* @type: Type of pre-entry from enum ct_entry_type
* @zt: Reference to the zone table this belongs to
@@ -125,6 +126,7 @@ struct nfp_fl_ct_flow_entry {
unsigned long cookie;
struct list_head list_node;
u32 chain_index;
+ u32 goto_chain_index;
enum ct_entry_type type;
struct net_device *netdev;
struct nfp_fl_ct_zone_entry *zt;
--
2.43.0


2024-03-13 17:12:18

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 36/71] nfp: flower: add hardware offload check for post ct entry

From: Hui Zhou <[email protected]>

[ Upstream commit cefa98e806fd4e2a5e2047457a11ae5f17b8f621 ]

The nfp offload flow pay will not allocate a mask id when the out port
is openvswitch internal port. This is because these flows are used to
configure the pre_tun table and are never actually send to the firmware
as an add-flow message. When a tc rule which action contains ct and
the post ct entry's out port is openvswitch internal port, the merge
offload flow pay with the wrong mask id of 0 will be send to the
firmware. Actually, the nfp can not support hardware offload for this
situation, so return EOPNOTSUPP.

Fixes: bd0fe7f96a3c ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows")
CC: [email protected] # 5.14+
Signed-off-by: Hui Zhou <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
.../ethernet/netronome/nfp/flower/conntrack.c | 22 ++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
index da7a47416a208..497766ecdd91d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
@@ -1797,10 +1797,30 @@ int nfp_fl_ct_handle_post_ct(struct nfp_flower_priv *priv,
{
struct flow_rule *rule = flow_cls_offload_flow_rule(flow);
struct nfp_fl_ct_flow_entry *ct_entry;
+ struct flow_action_entry *ct_goto;
struct nfp_fl_ct_zone_entry *zt;
+ struct flow_action_entry *act;
bool wildcarded = false;
struct flow_match_ct ct;
- struct flow_action_entry *ct_goto;
+ int i;
+
+ flow_action_for_each(i, act, &rule->action) {
+ switch (act->id) {
+ case FLOW_ACTION_REDIRECT:
+ case FLOW_ACTION_REDIRECT_INGRESS:
+ case FLOW_ACTION_MIRRED:
+ case FLOW_ACTION_MIRRED_INGRESS:
+ if (act->dev->rtnl_link_ops &&
+ !strcmp(act->dev->rtnl_link_ops->kind, "openvswitch")) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "unsupported offload: out port is openvswitch internal port");
+ return -EOPNOTSUPP;
+ }
+ break;
+ default:
+ break;
+ }
+ }

flow_rule_match_ct(rule, &ct);
if (!ct.mask->ct_zone) {
--
2.43.0


2024-03-13 17:12:32

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 37/71] readahead: avoid multiple marked readahead pages

From: Jan Kara <[email protected]>

[ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]

ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page being
allocated. This can however lead to multiple consecutive pages being
marked with readahead flag. Consider situation with index == 1, mark ==
1, order == 0. We insert order 0 page at index 1 and mark it. Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well. Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well. The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1. This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.

Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.

This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:

"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."

Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:

[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s

is not impacted by the rounding change and achieves ~70MB/s in both cases.

[[email protected]: fix one more place where mark rounding was done as well]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Guo Xuenan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
mm/readahead.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index ba43428043a35..e4b772bb70e68 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -483,7 +483,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,

if (!folio)
return -ENOMEM;
- mark = round_up(mark, 1UL << order);
+ mark = round_down(mark, 1UL << order);
if (index == mark)
folio_set_readahead(folio);
err = filemap_add_folio(ractl->mapping, folio, index, gfp);
@@ -591,7 +591,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
* It's the expected callback index, assume sequential access.
* Ramp up sizes, and push forward the readahead window.
*/
- expected = round_up(ra->start + ra->size - ra->async_size,
+ expected = round_down(ra->start + ra->size - ra->async_size,
1UL << order);
if (index == expected || index == (ra->start + ra->size)) {
ra->start += ra->size;
--
2.43.0


2024-03-13 17:12:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 39/71] selftests: mm: fix map_hugetlb failure on 64K page size systems

From: Nico Pache <[email protected]>

[ Upstream commit 91b80cc5b39f00399e8e2d17527cad2c7fa535e2 ]

On systems with 64k page size and 512M huge page sizes, the allocation and
test succeeds but errors out at the munmap. As the comment states, munmap
will failure if its not HUGEPAGE aligned. This is due to the length of
the mapping being 1/2 the size of the hugepage causing the munmap to not
be hugepage aligned. Fix this by making the mapping length the full
hugepage if the hugepage is larger than the length of the mapping.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nico Pache <[email protected]>
Cc: Donet Tom <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/vm/map_hugetlb.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/vm/map_hugetlb.c b/tools/testing/selftests/vm/map_hugetlb.c
index 312889edb84ab..c65c55b7a789f 100644
--- a/tools/testing/selftests/vm/map_hugetlb.c
+++ b/tools/testing/selftests/vm/map_hugetlb.c
@@ -15,6 +15,7 @@
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
+#include "vm_util.h"

#define LENGTH (256UL*1024*1024)
#define PROTECTION (PROT_READ | PROT_WRITE)
@@ -70,10 +71,16 @@ int main(int argc, char **argv)
{
void *addr;
int ret;
+ size_t hugepage_size;
size_t length = LENGTH;
int flags = FLAGS;
int shift = 0;

+ hugepage_size = default_huge_page_size();
+ /* munmap with fail if the length is not page aligned */
+ if (hugepage_size > length)
+ length = hugepage_size;
+
if (argc > 1)
length = atol(argv[1]) << 20;
if (argc > 2) {
--
2.43.0


2024-03-13 17:14:21

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 45/71] x86/rfds: Mitigate Register File Data Sampling (RFDS)

From: Pawan Gupta <[email protected]>

commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream.

RFDS is a CPU vulnerability that may allow userspace to infer kernel
stale data previously used in floating point registers, vector registers
and integer registers. RFDS only affects certain Intel Atom processors.

Intel released a microcode update that uses VERW instruction to clear
the affected CPU buffers. Unlike MDS, none of the affected cores support
SMT.

Add RFDS bug infrastructure and enable the VERW based mitigation by
default, that clears the affected buffers just before exiting to
userspace. Also add sysfs reporting and cmdline parameter
"reg_file_data_sampling" to control the mitigation.

For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
.../ABI/testing/sysfs-devices-system-cpu | 1 +
.../admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 78 ++++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++++-
drivers/base/cpu.c | 8 ++
include/linux/cpu.h | 2 +
9 files changed, 162 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 13c01b641dc70..78c26280c473b 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -519,6 +519,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+ /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
/sys/devices/system/cpu/vulnerabilities/retbleed
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4ad60e127e048..2dfe75104e7de 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1107,6 +1107,26 @@
The filter can be disabled or changed to another
driver later using sysfs.

+ reg_file_data_sampling=
+ [X86] Controls mitigation for Register File Data
+ Sampling (RFDS) vulnerability. RFDS is a CPU
+ vulnerability which may allow userspace to infer
+ kernel data values previously stored in floating point
+ registers, vector registers, or integer registers.
+ RFDS only affects Intel Atom processors.
+
+ on: Turns ON the mitigation.
+ off: Turns OFF the mitigation.
+
+ This parameter overrides the compile time default set
+ by CONFIG_MITIGATION_RFDS. Mitigation cannot be
+ disabled when other VERW based mitigations (like MDS)
+ are enabled. In order to disable RFDS mitigation all
+ VERW based mitigations need to be disabled.
+
+ For details see:
+ Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
+
driver_async_probe= [KNL]
List of driver names to be probed asynchronously. *
matches with all driver names. If * is specified, the
@@ -3262,6 +3282,7 @@
nospectre_bhb [ARM64]
nospectre_v1 [X86,PPC]
nospectre_v2 [X86,PPC,S390,ARM64]
+ reg_file_data_sampling=off [X86]
retbleed=off [X86]
spec_store_bypass_disable=off [X86,PPC]
spectre_v2_user=off [X86]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2f7af61b49b6c..5caa023e98397 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2565,6 +2565,17 @@ config GDS_FORCE_MITIGATION

If in doubt, say N.

+config MITIGATION_RFDS
+ bool "RFDS Mitigation"
+ depends on CPU_SUP_INTEL
+ default y
+ help
+ Enable mitigation for Register File Data Sampling (RFDS) by default.
+ RFDS is a hardware vulnerability which affects Intel Atom CPUs. It
+ allows unprivileged speculative access to stale data previously
+ stored in floating point, vector and integer registers.
+ See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
+
endif

config ARCH_HAS_ADD_PAGES
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b60f24b30cb90..b97a70aa4de90 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -477,4 +477,5 @@
/* BUG word 2 */
#define X86_BUG_SRSO X86_BUG(1*32 + 0) /* AMD SRSO bug */
#define X86_BUG_DIV0 X86_BUG(1*32 + 1) /* AMD DIV0 speculation bug */
+#define X86_BUG_RFDS X86_BUG(1*32 + 2) /* CPU is vulnerable to Register File Data Sampling */
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ec955ab2ff034..005e41dc7ee5a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -168,6 +168,14 @@
* CPU is not vulnerable to Gather
* Data Sampling (GDS).
*/
+#define ARCH_CAP_RFDS_NO BIT(27) /*
+ * Not susceptible to Register
+ * File Data Sampling.
+ */
+#define ARCH_CAP_RFDS_CLEAR BIT(28) /*
+ * VERW clears CPU Register
+ * File.
+ */

#define ARCH_CAP_XAPIC_DISABLE BIT(21) /*
* IA32_XAPIC_DISABLE_STATUS MSR
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c66f6eb40afb1..c68789fdc123b 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -479,6 +479,57 @@ static int __init mmio_stale_data_parse_cmdline(char *str)
}
early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);

+#undef pr_fmt
+#define pr_fmt(fmt) "Register File Data Sampling: " fmt
+
+enum rfds_mitigations {
+ RFDS_MITIGATION_OFF,
+ RFDS_MITIGATION_VERW,
+ RFDS_MITIGATION_UCODE_NEEDED,
+};
+
+/* Default mitigation for Register File Data Sampling */
+static enum rfds_mitigations rfds_mitigation __ro_after_init =
+ IS_ENABLED(CONFIG_MITIGATION_RFDS) ? RFDS_MITIGATION_VERW : RFDS_MITIGATION_OFF;
+
+static const char * const rfds_strings[] = {
+ [RFDS_MITIGATION_OFF] = "Vulnerable",
+ [RFDS_MITIGATION_VERW] = "Mitigation: Clear Register File",
+ [RFDS_MITIGATION_UCODE_NEEDED] = "Vulnerable: No microcode",
+};
+
+static void __init rfds_select_mitigation(void)
+{
+ if (!boot_cpu_has_bug(X86_BUG_RFDS) || cpu_mitigations_off()) {
+ rfds_mitigation = RFDS_MITIGATION_OFF;
+ return;
+ }
+ if (rfds_mitigation == RFDS_MITIGATION_OFF)
+ return;
+
+ if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR)
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
+ else
+ rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
+}
+
+static __init int rfds_parse_cmdline(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ return 0;
+
+ if (!strcmp(str, "off"))
+ rfds_mitigation = RFDS_MITIGATION_OFF;
+ else if (!strcmp(str, "on"))
+ rfds_mitigation = RFDS_MITIGATION_VERW;
+
+ return 0;
+}
+early_param("reg_file_data_sampling", rfds_parse_cmdline);
+
#undef pr_fmt
#define pr_fmt(fmt) "" fmt

@@ -512,6 +563,11 @@ static void __init md_clear_update_mitigation(void)
mmio_mitigation = MMIO_MITIGATION_VERW;
mmio_select_mitigation();
}
+ if (rfds_mitigation == RFDS_MITIGATION_OFF &&
+ boot_cpu_has_bug(X86_BUG_RFDS)) {
+ rfds_mitigation = RFDS_MITIGATION_VERW;
+ rfds_select_mitigation();
+ }
out:
if (boot_cpu_has_bug(X86_BUG_MDS))
pr_info("MDS: %s\n", mds_strings[mds_mitigation]);
@@ -521,6 +577,8 @@ static void __init md_clear_update_mitigation(void)
pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]);
else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
pr_info("MMIO Stale Data: Unknown: No mitigations\n");
+ if (boot_cpu_has_bug(X86_BUG_RFDS))
+ pr_info("Register File Data Sampling: %s\n", rfds_strings[rfds_mitigation]);
}

static void __init md_clear_select_mitigation(void)
@@ -528,11 +586,12 @@ static void __init md_clear_select_mitigation(void)
mds_select_mitigation();
taa_select_mitigation();
mmio_select_mitigation();
+ rfds_select_mitigation();

/*
- * As MDS, TAA and MMIO Stale Data mitigations are inter-related, update
- * and print their mitigation after MDS, TAA and MMIO Stale Data
- * mitigation selection is done.
+ * As these mitigations are inter-related and rely on VERW instruction
+ * to clear the microarchitural buffers, update and print their status
+ * after mitigation selection is done for each of these vulnerabilities.
*/
md_clear_update_mitigation();
}
@@ -2596,6 +2655,11 @@ static ssize_t mmio_stale_data_show_state(char *buf)
sched_smt_active() ? "vulnerable" : "disabled");
}

+static ssize_t rfds_show_state(char *buf)
+{
+ return sysfs_emit(buf, "%s\n", rfds_strings[rfds_mitigation]);
+}
+
static char *stibp_state(void)
{
if (spectre_v2_in_eibrs_mode(spectre_v2_enabled))
@@ -2757,6 +2821,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
case X86_BUG_SRSO:
return srso_show_state(buf);

+ case X86_BUG_RFDS:
+ return rfds_show_state(buf);
+
default:
break;
}
@@ -2831,4 +2898,9 @@ ssize_t cpu_show_spec_rstack_overflow(struct device *dev, struct device_attribut
{
return cpu_show_common(dev, attr, buf, X86_BUG_SRSO);
}
+
+ssize_t cpu_show_reg_file_data_sampling(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return cpu_show_common(dev, attr, buf, X86_BUG_RFDS);
+}
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 454cdf3418624..758938c94b41e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1248,6 +1248,8 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
#define SRSO BIT(5)
/* CPU is affected by GDS */
#define GDS BIT(6)
+/* CPU is affected by Register File Data Sampling */
+#define RFDS BIT(7)

static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(IVYBRIDGE, X86_STEPPING_ANY, SRBDS),
@@ -1275,9 +1277,18 @@ static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(TIGERLAKE, X86_STEPPING_ANY, GDS),
VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPING_ANY, MMIO | RETBLEED | GDS),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
+ VULNBL_INTEL_STEPPINGS(ALDERLAKE, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ALDERLAKE_L, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE_P, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE_S, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ALDERLAKE_N, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_D, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_PLUS, X86_STEPPING_ANY, RFDS),

VULNBL_AMD(0x15, RETBLEED),
VULNBL_AMD(0x16, RETBLEED),
@@ -1311,6 +1322,24 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
}

+static bool __init vulnerable_to_rfds(u64 ia32_cap)
+{
+ /* The "immunity" bit trumps everything else: */
+ if (ia32_cap & ARCH_CAP_RFDS_NO)
+ return false;
+
+ /*
+ * VMMs set ARCH_CAP_RFDS_CLEAR for processors not in the blacklist to
+ * indicate that mitigation is needed because guest is running on a
+ * vulnerable hardware or may migrate to such hardware:
+ */
+ if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ return true;
+
+ /* Only consult the blacklist when there is no enumeration: */
+ return cpu_matches(cpu_vuln_blacklist, RFDS);
+}
+
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
u64 ia32_cap = x86_read_arch_cap_msr();
@@ -1419,6 +1448,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_SRSO);
}

+ if (vulnerable_to_rfds(ia32_cap))
+ setup_force_cpu_bug(X86_BUG_RFDS);
+
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
return;

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index dab70a65377c8..31da94afe4f3d 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -589,6 +589,12 @@ ssize_t __weak cpu_show_spec_rstack_overflow(struct device *dev,
return sysfs_emit(buf, "Not affected\n");
}

+ssize_t __weak cpu_show_reg_file_data_sampling(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "Not affected\n");
+}
+
static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
@@ -602,6 +608,7 @@ static DEVICE_ATTR(mmio_stale_data, 0444, cpu_show_mmio_stale_data, NULL);
static DEVICE_ATTR(retbleed, 0444, cpu_show_retbleed, NULL);
static DEVICE_ATTR(gather_data_sampling, 0444, cpu_show_gds, NULL);
static DEVICE_ATTR(spec_rstack_overflow, 0444, cpu_show_spec_rstack_overflow, NULL);
+static DEVICE_ATTR(reg_file_data_sampling, 0444, cpu_show_reg_file_data_sampling, NULL);

static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_meltdown.attr,
@@ -617,6 +624,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_retbleed.attr,
&dev_attr_gather_data_sampling.attr,
&dev_attr_spec_rstack_overflow.attr,
+ &dev_attr_reg_file_data_sampling.attr,
NULL
};

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 008bfa68cfabc..4b06b1f1e267a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -74,6 +74,8 @@ extern ssize_t cpu_show_spec_rstack_overflow(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_gds(struct device *dev,
struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_reg_file_data_sampling(struct device *dev,
+ struct device_attribute *attr, char *buf);

extern __printf(4, 5)
struct device *cpu_device_create(struct device *parent, void *drvdata,
--
2.43.0


2024-03-13 17:14:24

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 46/71] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests

From: Pawan Gupta <[email protected]>

commit 2a0180129d726a4b953232175857d442651b55a0 upstream.

Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated
by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it
to guests so that they can deploy the mitigation.

RFDS_NO indicates that the system is not vulnerable to RFDS, export it
to guests so that they don't deploy the mitigation unnecessarily. When
the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize
RFDS_NO to the guest.

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/x86.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7144e51668136..688bc7b72eb66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1613,7 +1613,8 @@ static unsigned int num_msr_based_features;
ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
- ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO)
+ ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR)

static u64 kvm_get_arch_capabilities(void)
{
@@ -1650,6 +1651,8 @@ static u64 kvm_get_arch_capabilities(void)
data |= ARCH_CAP_SSB_NO;
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ data |= ARCH_CAP_RFDS_NO;

if (!boot_cpu_has(X86_FEATURE_RTM)) {
/*
--
2.43.0


2024-03-13 17:14:43

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 03/71] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able

From: Maciej Fijalkowski <[email protected]>

[ Upstream commit cbf996f52c4e658b3fb4349a869a62fd2d4c3c1c ]

Currently routines that are supposed to toggle state of ring pair do not
take care of associated interrupt with queue vector that these rings
belong to. This causes funky issues such as dead interface due to irq
misconfiguration, as per Pavel's report from Closes: tag.

Add a function responsible for disabling single IRQ in EIMC register and
call this as a very first thing when disabling ring pair during xsk_pool
setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
disable/enable NAPI as first/last thing when dealing with closing or
opening ring pair that xsk_pool is being configured on.

Reported-by: Pavel Vazharov <[email protected]>
Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
Fixes: 024aa5800f32 ("ixgbe: added Rx/Tx ring disable/enable functions")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++++++++++---
1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6dc554e810a17..086cc25730338 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2947,8 +2947,8 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
u64 qmask)
{
- u32 mask;
struct ixgbe_hw *hw = &adapter->hw;
+ u32 mask;

switch (hw->mac.type) {
case ixgbe_mac_82598EB:
@@ -10543,6 +10543,44 @@ static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
}

+/**
+ * ixgbe_irq_disable_single - Disable single IRQ vector
+ * @adapter: adapter structure
+ * @ring: ring index
+ **/
+static void ixgbe_irq_disable_single(struct ixgbe_adapter *adapter, u32 ring)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u64 qmask = BIT_ULL(ring);
+ u32 mask;
+
+ switch (adapter->hw.mac.type) {
+ case ixgbe_mac_82598EB:
+ mask = qmask & IXGBE_EIMC_RTX_QUEUE;
+ IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask);
+ break;
+ case ixgbe_mac_82599EB:
+ case ixgbe_mac_X540:
+ case ixgbe_mac_X550:
+ case ixgbe_mac_X550EM_x:
+ case ixgbe_mac_x550em_a:
+ mask = (qmask & 0xFFFFFFFF);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+ mask = (qmask >> 32);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+ break;
+ default:
+ break;
+ }
+ IXGBE_WRITE_FLUSH(&adapter->hw);
+ if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+ synchronize_irq(adapter->msix_entries[ring].vector);
+ else
+ synchronize_irq(adapter->pdev->irq);
+}
+
/**
* ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
* @adapter: adapter structure
@@ -10559,6 +10597,11 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];

+ ixgbe_irq_disable_single(adapter, ring);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_disable(&rx_ring->q_vector->napi);
+
ixgbe_disable_txr(adapter, tx_ring);
if (xdp_ring)
ixgbe_disable_txr(adapter, xdp_ring);
@@ -10567,9 +10610,6 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
if (xdp_ring)
synchronize_rcu();

- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_disable(&rx_ring->q_vector->napi);
-
ixgbe_clean_tx_ring(tx_ring);
if (xdp_ring)
ixgbe_clean_tx_ring(xdp_ring);
@@ -10597,9 +10637,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];

- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_enable(&rx_ring->q_vector->napi);
-
ixgbe_configure_tx_ring(adapter, tx_ring);
if (xdp_ring)
ixgbe_configure_tx_ring(adapter, xdp_ring);
@@ -10608,6 +10645,11 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
if (xdp_ring)
clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_enable(&rx_ring->q_vector->napi);
+ ixgbe_irq_enable_queues(adapter, BIT_ULL(ring));
+ IXGBE_WRITE_FLUSH(&adapter->hw);
}

/**
--
2.43.0


2024-03-13 17:14:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 48/71] blk-iocost: disable writeback throttling

From: Yu Kuai <[email protected]>

[ Upstream commit 8796acbc9a0eceeddd99eaef833bdda1241d39b9 ]

Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") disable
wbt for bfq, because different write-throttling heuristics should not
work together.

For the same reason, wbt and iocost should not work together as well,
unless admin really want to do that, dispite that performance is
affected.

Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-iocost.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index e6557024e3da8..3788774a7b729 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3281,9 +3281,11 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
blk_stat_enable_accounting(disk->queue);
blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue);
ioc->enabled = true;
+ wbt_disable_default(disk->queue);
} else {
blk_queue_flag_clear(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue);
ioc->enabled = false;
+ wbt_enable_default(disk->queue);
}

if (user) {
--
2.43.0


2024-03-13 17:15:08

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 49/71] elevator: remove redundant code in elv_unregister_queue()

From: Yu Kuai <[email protected]>

[ Upstream commit 6d9f4cf125585ebf0718abcf5ce9ca898877c6d2 ]

"elevator_queue *e" is already declared and initialized in the beginning
of elv_unregister_queue().

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/elevator.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index bd71f0fc4e4b6..20e70fd3f77f9 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -524,8 +524,6 @@ void elv_unregister_queue(struct request_queue *q)
lockdep_assert_held(&q->sysfs_lock);

if (e && e->registered) {
- struct elevator_queue *e = q->elevator;
-
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);

--
2.43.0


2024-03-13 17:15:32

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 51/71] elevator: add new field flags in struct elevator_queue

From: Yu Kuai <[email protected]>

[ Upstream commit 181d06637451b5348d746039478e71fa53dfbff6 ]

There are only one flag to indicate that elevator is registered currently,
prepare to add a flag to disable wbt if default elevator is bfq.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/elevator.c | 6 ++----
block/elevator.h | 4 +++-
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 20e70fd3f77f9..9e12706e8d8cb 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -512,7 +512,7 @@ int elv_register_queue(struct request_queue *q, bool uevent)
if (uevent)
kobject_uevent(&e->kobj, KOBJ_ADD);

- e->registered = 1;
+ set_bit(ELEVATOR_FLAG_REGISTERED, &e->flags);
}
return error;
}
@@ -523,11 +523,9 @@ void elv_unregister_queue(struct request_queue *q)

lockdep_assert_held(&q->sysfs_lock);

- if (e && e->registered) {
+ if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) {
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);
-
- e->registered = 0;
}
}

diff --git a/block/elevator.h b/block/elevator.h
index 3f0593b3bf9d3..ed574bf3e629e 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -100,10 +100,12 @@ struct elevator_queue
void *elevator_data;
struct kobject kobj;
struct mutex sysfs_lock;
- unsigned int registered:1;
+ unsigned long flags;
DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
};

+#define ELEVATOR_FLAG_REGISTERED 0
+
/*
* block elevator interface
*/
--
2.43.0


2024-03-13 17:15:46

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 52/71] blk-wbt: don't enable throttling if default elevator is bfq

From: Yu Kuai <[email protected]>

[ Upstream commit 671fae5e51297fc76b3758ca2edd514858734a6a ]

Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") tries to
disable wbt for bfq, it's done by calling wbt_disable_default() in
bfq_init_queue(). However, wbt is still enabled if default elevator is
bfq:

device_add_disk
elevator_init_mq
bfq_init_queue
wbt_disable_default -> done nothing

blk_register_queue
wbt_enable_default -> wbt is enabled

Fix the problem by adding a new flag ELEVATOR_FLAG_DISBALE_WBT, bfq
will set the flag in bfq_init_queue, and following wbt_enable_default()
won't enable wbt while the flag is set.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/bfq-iosched.c | 2 ++
block/blk-wbt.c | 11 ++++++++---
block/elevator.h | 3 ++-
3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 52eb79d60a3f3..e4699291aee23 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7059,6 +7059,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
#endif

blk_stat_disable_accounting(bfqd->queue);
+ clear_bit(ELEVATOR_FLAG_DISABLE_WBT, &e->flags);
wbt_enable_default(bfqd->queue);

kfree(bfqd);
@@ -7204,6 +7205,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
/* We dispatch from request queue wide instead of hw queue */
blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);

+ set_bit(ELEVATOR_FLAG_DISABLE_WBT, &eq->flags);
wbt_disable_default(q);
blk_stat_enable_accounting(q);

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index c5a8c10028a08..afb1782b4255e 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -27,6 +27,7 @@

#include "blk-wbt.h"
#include "blk-rq-qos.h"
+#include "elevator.h"

#define CREATE_TRACE_POINTS
#include <trace/events/wbt.h>
@@ -638,11 +639,15 @@ void wbt_set_write_cache(struct request_queue *q, bool write_cache_on)
*/
void wbt_enable_default(struct request_queue *q)
{
- struct rq_qos *rqos = wbt_rq_qos(q);
+ struct rq_qos *rqos;
+ bool disable_flag = q->elevator &&
+ test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags);

/* Throttling already enabled? */
+ rqos = wbt_rq_qos(q);
if (rqos) {
- if (RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
+ if (!disable_flag &&
+ RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
RQWB(rqos)->enable_state = WBT_STATE_ON_DEFAULT;
return;
}
@@ -651,7 +656,7 @@ void wbt_enable_default(struct request_queue *q)
if (!blk_queue_registered(q))
return;

- if (queue_is_mq(q))
+ if (queue_is_mq(q) && !disable_flag)
wbt_init(q);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
diff --git a/block/elevator.h b/block/elevator.h
index ed574bf3e629e..75382471222d1 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -104,7 +104,8 @@ struct elevator_queue
DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
};

-#define ELEVATOR_FLAG_REGISTERED 0
+#define ELEVATOR_FLAG_REGISTERED 0
+#define ELEVATOR_FLAG_DISABLE_WBT 1

/*
* block elevator interface
--
2.43.0


2024-03-13 17:16:10

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 54/71] blk-wbt: pass a gendisk to wbt_init

From: Christoph Hellwig <[email protected]>

[ Upstream commit 958f29654747a54f2272eb478e493eb97f492e06 ]

Pass a gendisk to wbt_init to prepare for phasing out usage of the
request_queue in the blk-cgroup code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-sysfs.c | 2 +-
block/blk-wbt.c | 5 +++--
block/blk-wbt.h | 4 ++--
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c59c4d3ee7a27..31f53ef01982d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -488,7 +488,7 @@ static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page,

rqos = wbt_rq_qos(q);
if (!rqos) {
- ret = wbt_init(q);
+ ret = wbt_init(q->disk);
if (ret)
return ret;
}
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 8d4f075f13e2f..95bec9244e9f3 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -658,7 +658,7 @@ void wbt_enable_default(struct gendisk *disk)
return;

if (queue_is_mq(q) && !disable_flag)
- wbt_init(q);
+ wbt_init(disk);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);

@@ -822,8 +822,9 @@ static struct rq_qos_ops wbt_rqos_ops = {
#endif
};

-int wbt_init(struct request_queue *q)
+int wbt_init(struct gendisk *disk)
{
+ struct request_queue *q = disk->queue;
struct rq_wb *rwb;
int i;
int ret;
diff --git a/block/blk-wbt.h b/block/blk-wbt.h
index 58c226fe33d48..8170439b89d6e 100644
--- a/block/blk-wbt.h
+++ b/block/blk-wbt.h
@@ -88,7 +88,7 @@ static inline unsigned int wbt_inflight(struct rq_wb *rwb)

#ifdef CONFIG_BLK_WBT

-int wbt_init(struct request_queue *);
+int wbt_init(struct gendisk *disk);
void wbt_disable_default(struct gendisk *disk);
void wbt_enable_default(struct gendisk *disk);

@@ -101,7 +101,7 @@ u64 wbt_default_latency_nsec(struct request_queue *);

#else

-static inline int wbt_init(struct request_queue *q)
+static inline int wbt_init(struct gendisk *disk)
{
return -EINVAL;
}
--
2.43.0


2024-03-13 17:16:10

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 08/71] net: sparx5: Fix use after free inside sparx5_del_mact_entry

From: Horatiu Vultur <[email protected]>

[ Upstream commit 89d72d4125e94aa3c2140fedd97ce07ba9e37674 ]

Based on the static analyzis of the code it looks like when an entry
from the MAC table was removed, the entry was still used after being
freed. More precise the vid of the mac_entry was used after calling
devm_kfree on the mac_entry.
The fix consists in first using the vid of the mac_entry to delete the
entry from the HW and after that to free it.

Fixes: b37a1bae742f ("net: sparx5: add mactable support")
Signed-off-by: Horatiu Vultur <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
index 4af285918ea2a..75868b3f548ec 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
@@ -347,10 +347,10 @@ int sparx5_del_mact_entry(struct sparx5 *sparx5,
list) {
if ((vid == 0 || mact_entry->vid == vid) &&
ether_addr_equal(addr, mact_entry->mac)) {
+ sparx5_mact_forget(sparx5, addr, mact_entry->vid);
+
list_del(&mact_entry->list);
devm_kfree(sparx5->dev, mact_entry);
-
- sparx5_mact_forget(sparx5, addr, mact_entry->vid);
}
}
mutex_unlock(&sparx5->mact_lock);
--
2.43.0


2024-03-13 17:16:27

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 55/71] blk-rq-qos: move rq_qos_add and rq_qos_del out of line

From: Christoph Hellwig <[email protected]>

[ Upstream commit b494f9c566ba5fe2cc8abe67fdeb0332c6b48d4b ]

These two functions are rather larger and not in a fast path, so move
them out of line.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-rq-qos.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
block/blk-rq-qos.h | 61 ++--------------------------------------------
2 files changed, 62 insertions(+), 59 deletions(-)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 88f0fe7dcf545..aae98dcb01ebe 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -294,3 +294,63 @@ void rq_qos_exit(struct request_queue *q)
rqos->ops->exit(rqos);
}
}
+
+int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
+{
+ /*
+ * No IO can be in-flight when adding rqos, so freeze queue, which
+ * is fine since we only support rq_qos for blk-mq queue.
+ *
+ * Reuse ->queue_lock for protecting against other concurrent
+ * rq_qos adding/deleting
+ */
+ blk_mq_freeze_queue(q);
+
+ spin_lock_irq(&q->queue_lock);
+ if (rq_qos_id(q, rqos->id))
+ goto ebusy;
+ rqos->next = q->rq_qos;
+ q->rq_qos = rqos;
+ spin_unlock_irq(&q->queue_lock);
+
+ blk_mq_unfreeze_queue(q);
+
+ if (rqos->ops->debugfs_attrs) {
+ mutex_lock(&q->debugfs_mutex);
+ blk_mq_debugfs_register_rqos(rqos);
+ mutex_unlock(&q->debugfs_mutex);
+ }
+
+ return 0;
+ebusy:
+ spin_unlock_irq(&q->queue_lock);
+ blk_mq_unfreeze_queue(q);
+ return -EBUSY;
+
+}
+
+void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
+{
+ struct rq_qos **cur;
+
+ /*
+ * See comment in rq_qos_add() about freezing queue & using
+ * ->queue_lock.
+ */
+ blk_mq_freeze_queue(q);
+
+ spin_lock_irq(&q->queue_lock);
+ for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
+ if (*cur == rqos) {
+ *cur = rqos->next;
+ break;
+ }
+ }
+ spin_unlock_irq(&q->queue_lock);
+
+ blk_mq_unfreeze_queue(q);
+
+ mutex_lock(&q->debugfs_mutex);
+ blk_mq_debugfs_unregister_rqos(rqos);
+ mutex_unlock(&q->debugfs_mutex);
+}
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 1ef1f7d4bc3cb..805eee8b031d0 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -85,65 +85,8 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
init_waitqueue_head(&rq_wait->wait);
}

-static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
-{
- /*
- * No IO can be in-flight when adding rqos, so freeze queue, which
- * is fine since we only support rq_qos for blk-mq queue.
- *
- * Reuse ->queue_lock for protecting against other concurrent
- * rq_qos adding/deleting
- */
- blk_mq_freeze_queue(q);
-
- spin_lock_irq(&q->queue_lock);
- if (rq_qos_id(q, rqos->id))
- goto ebusy;
- rqos->next = q->rq_qos;
- q->rq_qos = rqos;
- spin_unlock_irq(&q->queue_lock);
-
- blk_mq_unfreeze_queue(q);
-
- if (rqos->ops->debugfs_attrs) {
- mutex_lock(&q->debugfs_mutex);
- blk_mq_debugfs_register_rqos(rqos);
- mutex_unlock(&q->debugfs_mutex);
- }
-
- return 0;
-ebusy:
- spin_unlock_irq(&q->queue_lock);
- blk_mq_unfreeze_queue(q);
- return -EBUSY;
-
-}
-
-static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
-{
- struct rq_qos **cur;
-
- /*
- * See comment in rq_qos_add() about freezing queue & using
- * ->queue_lock.
- */
- blk_mq_freeze_queue(q);
-
- spin_lock_irq(&q->queue_lock);
- for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
- if (*cur == rqos) {
- *cur = rqos->next;
- break;
- }
- }
- spin_unlock_irq(&q->queue_lock);
-
- blk_mq_unfreeze_queue(q);
-
- mutex_lock(&q->debugfs_mutex);
- blk_mq_debugfs_unregister_rqos(rqos);
- mutex_unlock(&q->debugfs_mutex);
-}
+int rq_qos_add(struct request_queue *q, struct rq_qos *rqos);
+void rq_qos_del(struct request_queue *q, struct rq_qos *rqos);

typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
typedef void (cleanup_cb_t)(struct rq_wait *rqw, void *private_data);
--
2.43.0


2024-03-13 17:16:45

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 56/71] blk-rq-qos: make rq_qos_add and rq_qos_del more useful

From: Christoph Hellwig <[email protected]>

[ Upstream commit ce57b558604e68277d31ca5ce49ec4579a8618c5 ]

Switch to passing a gendisk, and make rq_qos_add initialize all required
fields and drop the not required q argument from rq_qos_del.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-iocost.c | 13 +++----------
block/blk-iolatency.c | 14 ++++----------
block/blk-rq-qos.c | 13 ++++++++++---
block/blk-rq-qos.h | 5 +++--
block/blk-wbt.c | 5 +----
5 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 72ca07f24b3c0..a8a7d2ce927b9 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2847,9 +2847,7 @@ static struct rq_qos_ops ioc_rqos_ops = {

static int blk_iocost_init(struct gendisk *disk)
{
- struct request_queue *q = disk->queue;
struct ioc *ioc;
- struct rq_qos *rqos;
int i, cpu, ret;

ioc = kzalloc(sizeof(*ioc), GFP_KERNEL);
@@ -2872,11 +2870,6 @@ static int blk_iocost_init(struct gendisk *disk)
local64_set(&ccs->rq_wait_ns, 0);
}

- rqos = &ioc->rqos;
- rqos->id = RQ_QOS_COST;
- rqos->ops = &ioc_rqos_ops;
- rqos->q = q;
-
spin_lock_init(&ioc->lock);
timer_setup(&ioc->timer, ioc_timer_fn, 0);
INIT_LIST_HEAD(&ioc->active_iocgs);
@@ -2900,17 +2893,17 @@ static int blk_iocost_init(struct gendisk *disk)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- ret = rq_qos_add(q, rqos);
+ ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
if (ret)
goto err_free_ioc;

- ret = blkcg_activate_policy(q, &blkcg_policy_iocost);
+ ret = blkcg_activate_policy(disk->queue, &blkcg_policy_iocost);
if (ret)
goto err_del_qos;
return 0;

err_del_qos:
- rq_qos_del(q, rqos);
+ rq_qos_del(&ioc->rqos);
err_free_ioc:
free_percpu(ioc->pcpu_stat);
kfree(ioc);
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 571fa95aafe96..c64cfec34ac37 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -758,24 +758,18 @@ static void blkiolatency_enable_work_fn(struct work_struct *work)

int blk_iolatency_init(struct gendisk *disk)
{
- struct request_queue *q = disk->queue;
struct blk_iolatency *blkiolat;
- struct rq_qos *rqos;
int ret;

blkiolat = kzalloc(sizeof(*blkiolat), GFP_KERNEL);
if (!blkiolat)
return -ENOMEM;

- rqos = &blkiolat->rqos;
- rqos->id = RQ_QOS_LATENCY;
- rqos->ops = &blkcg_iolatency_ops;
- rqos->q = q;
-
- ret = rq_qos_add(q, rqos);
+ ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+ &blkcg_iolatency_ops);
if (ret)
goto err_free;
- ret = blkcg_activate_policy(q, &blkcg_policy_iolatency);
+ ret = blkcg_activate_policy(disk->queue, &blkcg_policy_iolatency);
if (ret)
goto err_qos_del;

@@ -785,7 +779,7 @@ int blk_iolatency_init(struct gendisk *disk)
return 0;

err_qos_del:
- rq_qos_del(q, rqos);
+ rq_qos_del(&blkiolat->rqos);
err_free:
kfree(blkiolat);
return ret;
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index aae98dcb01ebe..14bee1bd76136 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -295,8 +295,15 @@ void rq_qos_exit(struct request_queue *q)
}
}

-int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
+ struct rq_qos_ops *ops)
{
+ struct request_queue *q = disk->queue;
+
+ rqos->q = q;
+ rqos->id = id;
+ rqos->ops = ops;
+
/*
* No IO can be in-flight when adding rqos, so freeze queue, which
* is fine since we only support rq_qos for blk-mq queue.
@@ -326,11 +333,11 @@ int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
spin_unlock_irq(&q->queue_lock);
blk_mq_unfreeze_queue(q);
return -EBUSY;
-
}

-void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
+void rq_qos_del(struct rq_qos *rqos)
{
+ struct request_queue *q = rqos->q;
struct rq_qos **cur;

/*
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 805eee8b031d0..22552785aa31e 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -85,8 +85,9 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
init_waitqueue_head(&rq_wait->wait);
}

-int rq_qos_add(struct request_queue *q, struct rq_qos *rqos);
-void rq_qos_del(struct request_queue *q, struct rq_qos *rqos);
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
+ struct rq_qos_ops *ops);
+void rq_qos_del(struct rq_qos *rqos);

typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
typedef void (cleanup_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 95bec9244e9f3..aec4e37c89c4a 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -842,9 +842,6 @@ int wbt_init(struct gendisk *disk)
for (i = 0; i < WBT_NUM_RWQ; i++)
rq_wait_init(&rwb->rq_wait[i]);

- rwb->rqos.id = RQ_QOS_WBT;
- rwb->rqos.ops = &wbt_rqos_ops;
- rwb->rqos.q = q;
rwb->last_comp = rwb->last_issue = jiffies;
rwb->win_nsec = RWB_WINDOW_NSEC;
rwb->enable_state = WBT_STATE_ON_DEFAULT;
@@ -857,7 +854,7 @@ int wbt_init(struct gendisk *disk)
/*
* Assign rwb and add the stats callback.
*/
- ret = rq_qos_add(q, &rwb->rqos);
+ ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
if (ret)
goto err_free;

--
2.43.0


2024-03-13 17:16:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 57/71] blk-rq-qos: constify rq_qos_ops

From: Christoph Hellwig <[email protected]>

[ Upstream commit 3963d84df7974b6687cb34bce3b9e0b2686f839c ]

These op vectors are constant, so mark them const.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-iocost.c | 2 +-
block/blk-iolatency.c | 2 +-
block/blk-rq-qos.c | 2 +-
block/blk-rq-qos.h | 4 ++--
block/blk-wbt.c | 2 +-
5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index a8a7d2ce927b9..78958c5bece08 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2836,7 +2836,7 @@ static void ioc_rqos_exit(struct rq_qos *rqos)
kfree(ioc);
}

-static struct rq_qos_ops ioc_rqos_ops = {
+static const struct rq_qos_ops ioc_rqos_ops = {
.throttle = ioc_rqos_throttle,
.merge = ioc_rqos_merge,
.done_bio = ioc_rqos_done_bio,
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index c64cfec34ac37..b0f8550f87cd2 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -651,7 +651,7 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
kfree(blkiolat);
}

-static struct rq_qos_ops blkcg_iolatency_ops = {
+static const struct rq_qos_ops blkcg_iolatency_ops = {
.throttle = blkcg_iolatency_throttle,
.done_bio = blkcg_iolatency_done_bio,
.exit = blkcg_iolatency_exit,
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 14bee1bd76136..8e83734cfe8db 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -296,7 +296,7 @@ void rq_qos_exit(struct request_queue *q)
}

int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
- struct rq_qos_ops *ops)
+ const struct rq_qos_ops *ops)
{
struct request_queue *q = disk->queue;

diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 22552785aa31e..2b7b668479f71 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -25,7 +25,7 @@ struct rq_wait {
};

struct rq_qos {
- struct rq_qos_ops *ops;
+ const struct rq_qos_ops *ops;
struct request_queue *q;
enum rq_qos_id id;
struct rq_qos *next;
@@ -86,7 +86,7 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
}

int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
- struct rq_qos_ops *ops);
+ const struct rq_qos_ops *ops);
void rq_qos_del(struct rq_qos *rqos);

typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index aec4e37c89c4a..d9398347b08d8 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -808,7 +808,7 @@ static const struct blk_mq_debugfs_attr wbt_debugfs_attrs[] = {
};
#endif

-static struct rq_qos_ops wbt_rqos_ops = {
+static const struct rq_qos_ops wbt_rqos_ops = {
.throttle = wbt_wait,
.issue = wbt_issue,
.track = wbt_track,
--
2.43.0


2024-03-13 17:17:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 12/71] net/ipv6: avoid possible UAF in ip6_route_mpath_notify()

From: Eric Dumazet <[email protected]>

[ Upstream commit 685f7d531264599b3f167f1e94bbd22f120e5fab ]

syzbot found another use-after-free in ip6_route_mpath_notify() [1]

Commit f7225172f25a ("net/ipv6: prevent use after free in
ip6_route_mpath_notify") was not able to fix the root cause.

We need to defer the fib6_info_release() calls after
ip6_route_mpath_notify(), in the cleanup phase.

[1]
BUG: KASAN: slab-use-after-free in rt6_fill_node+0x1460/0x1ac0
Read of size 4 at addr ffff88809a07fc64 by task syz-executor.2/23037

CPU: 0 PID: 23037 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-01035-gea7f3cfaa588 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x167/0x540 mm/kasan/report.c:488
kasan_report+0x142/0x180 mm/kasan/report.c:601
rt6_fill_node+0x1460/0x1ac0
inet6_rt_notify+0x13b/0x290 net/ipv6/route.c:6184
ip6_route_mpath_notify net/ipv6/route.c:5198 [inline]
ip6_route_multipath_add net/ipv6/route.c:5404 [inline]
inet6_rtm_newroute+0x1d0f/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f73dd87dda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f73de6550c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f73dd9ac050 RCX: 00007f73dd87dda9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f73dd8ca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f73dd9ac050 R15: 00007ffdbdeb7858
</TASK>

Allocated by task 23037:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:372 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:389
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:3981 [inline]
__kmalloc+0x22e/0x490 mm/slub.c:3994
kmalloc include/linux/slab.h:594 [inline]
kzalloc include/linux/slab.h:711 [inline]
fib6_info_alloc+0x2e/0xf0 net/ipv6/ip6_fib.c:155
ip6_route_info_create+0x445/0x12b0 net/ipv6/route.c:3758
ip6_route_multipath_add net/ipv6/route.c:5298 [inline]
inet6_rtm_newroute+0x744/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77

Freed by task 16:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x4e/0x60 mm/kasan/generic.c:640
poison_slab_object+0xa6/0xe0 mm/kasan/common.c:241
__kasan_slab_free+0x34/0x70 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kfree+0x14a/0x380 mm/slub.c:4409
rcu_do_batch kernel/rcu/tree.c:2190 [inline]
rcu_core+0xd76/0x1810 kernel/rcu/tree.c:2465
__do_softirq+0x2bb/0x942 kernel/softirq.c:553

Last potentially related work creation:
kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xae/0x100 mm/kasan/generic.c:586
__call_rcu_common kernel/rcu/tree.c:2715 [inline]
call_rcu+0x167/0xa80 kernel/rcu/tree.c:2829
fib6_info_release include/net/ip6_fib.h:341 [inline]
ip6_route_multipath_add net/ipv6/route.c:5344 [inline]
inet6_rtm_newroute+0x114d/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77

The buggy address belongs to the object at ffff88809a07fc00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 100 bytes inside of
freed 512-byte region [ffff88809a07fc00, ffff88809a07fe00)

The buggy address belongs to the physical page:
page:ffffea0002681f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9a07c
head:ffffea0002681f00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888014c41c80 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 23028, tgid 23027 (syz-executor.4), ts 2340253595219, free_ts 2339107097036
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x255/0x680 mm/page_alloc.c:4567
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2190
allocate_slab mm/slub.c:2354 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2407
___slab_alloc+0xd17/0x13e0 mm/slub.c:3540
__slab_alloc mm/slub.c:3625 [inline]
__slab_alloc_node mm/slub.c:3678 [inline]
slab_alloc_node mm/slub.c:3850 [inline]
__do_kmalloc_node mm/slub.c:3980 [inline]
__kmalloc+0x2e0/0x490 mm/slub.c:3994
kmalloc include/linux/slab.h:594 [inline]
kzalloc include/linux/slab.h:711 [inline]
new_dir fs/proc/proc_sysctl.c:956 [inline]
get_subdir fs/proc/proc_sysctl.c:1000 [inline]
sysctl_mkdir_p fs/proc/proc_sysctl.c:1295 [inline]
__register_sysctl_table+0xb30/0x1440 fs/proc/proc_sysctl.c:1376
neigh_sysctl_register+0x416/0x500 net/core/neighbour.c:3859
devinet_sysctl_register+0xaf/0x1f0 net/ipv4/devinet.c:2644
inetdev_init+0x296/0x4d0 net/ipv4/devinet.c:286
inetdev_event+0x338/0x15c0 net/ipv4/devinet.c:1555
notifier_call_chain+0x18f/0x3b0 kernel/notifier.c:93
call_netdevice_notifiers_extack net/core/dev.c:1987 [inline]
call_netdevice_notifiers net/core/dev.c:2001 [inline]
register_netdevice+0x15b2/0x1a20 net/core/dev.c:10340
br_dev_newlink+0x27/0x100 net/bridge/br_netlink.c:1563
rtnl_newlink_create net/core/rtnetlink.c:3497 [inline]
__rtnl_newlink net/core/rtnetlink.c:3717 [inline]
rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3730
page last free pid 11583 tgid 11583 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x968/0xa90 mm/page_alloc.c:2346
free_unref_page+0x37/0x3f0 mm/page_alloc.c:2486
kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:415
apply_to_pte_range mm/memory.c:2619 [inline]
apply_to_pmd_range mm/memory.c:2663 [inline]
apply_to_pud_range mm/memory.c:2699 [inline]
apply_to_p4d_range mm/memory.c:2735 [inline]
__apply_to_page_range+0x8ec/0xe40 mm/memory.c:2769
kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:532
__purge_vmap_area_lazy+0x163f/0x1a10 mm/vmalloc.c:1770
drain_vmap_area_work+0x40/0xd0 mm/vmalloc.c:1804
process_one_work kernel/workqueue.c:2633 [inline]
process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
kthread+0x2ef/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242

Memory state around the buggy address:
ffff88809a07fb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88809a07fb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88809a07fc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88809a07fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88809a07fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv6/route.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7f65dc750feb8..887599d351b8d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5335,19 +5335,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = NULL;
list_for_each_entry(nh, &rt6_nh_list, next) {
err = __ip6_ins_rt(nh->fib6_info, info, extack);
- fib6_info_release(nh->fib6_info);
-
- if (!err) {
- /* save reference to last route successfully inserted */
- rt_last = nh->fib6_info;
-
- /* save reference to first route for notification */
- if (!rt_notif)
- rt_notif = nh->fib6_info;
- }

- /* nh->fib6_info is used or freed at this point, reset to NULL*/
- nh->fib6_info = NULL;
if (err) {
if (replace && nhn)
NL_SET_ERR_MSG_MOD(extack,
@@ -5355,6 +5343,12 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = nh;
goto add_errout;
}
+ /* save reference to last route successfully inserted */
+ rt_last = nh->fib6_info;
+
+ /* save reference to first route for notification */
+ if (!rt_notif)
+ rt_notif = nh->fib6_info;

/* Because each route is added like a single route we remove
* these flags after the first nexthop: if there is a collision,
@@ -5415,8 +5409,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,

cleanup:
list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
- if (nh->fib6_info)
- fib6_info_release(nh->fib6_info);
+ fib6_info_release(nh->fib6_info);
list_del(&nh->next);
kfree(nh);
}
--
2.43.0


2024-03-13 17:17:26

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 25/71] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay

From: Jason Xing <[email protected]>

[ Upstream commit 43547d8699439a67b78d6bb39015113f7aa360fd ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 7428ea436e318..ee6621c0d2e45 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -459,7 +459,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->n2 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
- msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
nr->window = sysctl_netrom_transport_requested_window_size;
--
2.43.0


2024-03-13 17:17:34

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 18/71] erofs: apply proper VMA alignment for memory mapped files on THP

From: Gao Xiang <[email protected]>

[ Upstream commit 4127caee89612a84adedd78c9453089138cd5afe ]

There are mainly two reasons that thp_get_unmapped_area() should be
used for EROFS as other filesystems:

- It's needed to enable PMD mappings as a FSDAX filesystem, see
commit 74d2fad1334d ("thp, dax: add thp_get_unmapped_area for pmd
mappings");

- It's useful together with large folios and
CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
(e.g. shared libraries) even without FSDAX. See commit 1854bc6e2420
("mm/readahead: Align file mappings for non-DAX").

Fixes: 06252e9ce05b ("erofs: dax support for non-tailpacking regular file")
Fixes: ce529cc25b18 ("erofs: enable large folios for iomap mode")
Fixes: e6687b89225e ("erofs: enable large folios for fscache mode")
Reviewed-by: Jingbo Xu <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
fs/erofs/data.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index b32801d716f89..9d20e5d23ae0b 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -440,4 +440,5 @@ const struct file_operations erofs_file_fops = {
.read_iter = erofs_file_read_iter,
.mmap = erofs_file_mmap,
.splice_read = generic_file_splice_read,
+ .get_unmapped_area = thp_get_unmapped_area,
};
--
2.43.0


2024-03-13 17:17:45

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 33/71] ASoC: codecs: wcd938x: fix headphones volume controls

From: Johan Hovold <[email protected]>

[ Upstream commit 4d0e8bdfa4a57099dc7230952a460903f2e2f8de ]

The lowest headphones volume setting does not mute so the leave the TLV
mute flag unset.

This is specifically needed to let the sound server use the lowest gain
setting.

Fixes: c03226ba15fe ("ASoC: codecs: wcd938x: fix dB range for HPHL and HPHR")
Cc: <[email protected]> # 6.5
Cc: Srinivas Kandagatla <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
sound/soc/codecs/wcd938x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c
index e80be4e4fa8b4..555b74e7172d8 100644
--- a/sound/soc/codecs/wcd938x.c
+++ b/sound/soc/codecs/wcd938x.c
@@ -210,7 +210,7 @@ struct wcd938x_priv {
};

static const SNDRV_CTL_TLVD_DECLARE_DB_MINMAX(ear_pa_gain, 600, -1800);
-static const DECLARE_TLV_DB_SCALE(line_gain, -3000, 150, -3000);
+static const DECLARE_TLV_DB_SCALE(line_gain, -3000, 150, 0);
static const SNDRV_CTL_TLVD_DECLARE_DB_MINMAX(analog_gain, 0, 3000);

struct wcd938x_mbhc_zdet_param {
--
2.43.0


2024-03-13 17:18:04

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 61/71] drm/amd/display: Fix MST Null Ptr for RV

From: Fangzhi Zuo <[email protected]>

[ Upstream commit e6a7df96facdcf5b1f71eb3ec26f2f9f6ad61e57 ]

The change try to fix below error specific to RV platform:

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 917 Comm: sway Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
Hardware name: LENOVO 20NKS01Y00/20NKS01Y00, BIOS R12ET61W(1.31 ) 07/28/2022
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? plist_add+0xbe/0x100
? exc_page_fault+0x7c/0x180
? asm_exc_page_fault+0x26/0x30
? drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
? drm_dp_atomic_find_time_slots+0x28/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
compute_mst_dsc_configs_for_link+0x2ff/0xa40 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
? fill_plane_buffer_attributes+0x419/0x510 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
amdgpu_dm_atomic_check+0xecd/0x1190 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
drm_atomic_check_only+0x5c5/0xa40
drm_mode_atomic_ioctl+0x76e/0xbc0
? _copy_to_user+0x25/0x30
? drm_ioctl+0x296/0x4b0
? __pfx_drm_mode_atomic_ioctl+0x10/0x10
drm_ioctl_kernel+0xcd/0x170
drm_ioctl+0x26d/0x4b0
? __pfx_drm_mode_atomic_ioctl+0x10/0x10
amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
__x64_sys_ioctl+0x94/0xd0
do_syscall_64+0x60/0x90
? do_syscall_64+0x6c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f4dad17f76f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c>
RSP: 002b:00007ffd9ae859f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055e255a55900 RCX: 00007f4dad17f76f
RDX: 00007ffd9ae85a90 RSI: 00000000c03864bc RDI: 000000000000000b
RBP: 00007ffd9ae85a90 R08: 0000000000000003 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c03864bc
R13: 000000000000000b R14: 000055e255a7fc60 R15: 000055e255a01eb0
</TASK>
Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg joydev mousedev bnep >
typec libphy k10temp ipmi_msghandler roles i2c_scmi acpi_cpufreq mac_hid nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_mas>
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0

With a second DP monitor connected, drm_atomic_state in dm atomic check
sequence does not include the connector state for the old/existing/first
DP monitor. In such case, dsc determination policy would hit a null ptr
when it tries to iterate the old/existing stream that does not have a
valid connector state attached to it. When that happens, dm atomic check
should call drm_atomic_get_connector_state for a new connector state.
Existing dm has already done that, except for RV due to it does not have
official support of dsc where .num_dsc is not defined in dcn10 resource
cap, that prevent from getting drm_atomic_get_connector_state called.
So, skip dsc determination policy for ASICs that don't have DSC support.

Cc: [email protected] # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2314
Reviewed-by: Wayne Lin <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index bea49befdcacc..a6c6f286a5988 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10123,11 +10123,13 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
}

#if defined(CONFIG_DRM_AMD_DC_DCN)
- ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
- if (ret) {
- DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
- ret = -EINVAL;
- goto fail;
+ if (dc_resource_is_dsc_encoding_supported(dc)) {
+ ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
+ if (ret) {
+ DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
+ ret = -EINVAL;
+ goto fail;
+ }
}

ret = dm_update_mst_vcpi_slots_for_dsc(state, dm_state->context, vars);
--
2.43.0


2024-03-13 17:18:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 13/71] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program

From: Toke Høiland-Jørgensen <[email protected]>

[ Upstream commit 2487007aa3b9fafbd2cb14068f49791ce1d7ede5 ]

When running an XDP program that is attached to a cpumap entry, we don't
initialise the xdp_rxq_info data structure being used in the xdp_buff
that backs the XDP program invocation. Tobias noticed that this leads to
random values being returned as the xdp_md->rx_queue_index value for XDP
programs running in a cpumap.

This means we're basically returning the contents of the uninitialised
memory, which is bad. Fix this by zero-initialising the rxq data
structure before running the XDP program.

Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Reported-by: Tobias Böhm <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/bpf/cpumap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 08a8e81027289..0508937048137 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -222,7 +222,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
void **frames, int n,
struct xdp_cpumap_stats *stats)
{
- struct xdp_rxq_info rxq;
+ struct xdp_rxq_info rxq = {};
struct xdp_buff xdp;
int i, nframes = 0;

--
2.43.0


2024-03-13 17:18:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 17/71] netfilter: nf_conntrack_h323: Add protection for bmp length out of range

From: Lena Wang <[email protected]>

[ Upstream commit 767146637efc528b5e3d31297df115e85a2fd362 ]

UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
that are out of bounds for their data type.

vmlinux get_bitmap(b=75) + 712
<net/netfilter/nf_conntrack_h323_asn1.c:0>
vmlinux decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
<net/netfilter/nf_conntrack_h323_asn1.c:592>
vmlinux decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
<net/netfilter/nf_conntrack_h323_asn1.c:576>
vmlinux decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux DecodeRasMessage() + 304
<net/netfilter/nf_conntrack_h323_asn1.c:833>
vmlinux ras_help() + 684
<net/netfilter/nf_conntrack_h323_main.c:1728>
vmlinux nf_confirm() + 188
<net/netfilter/nf_conntrack_proto.c:137>

Due to abnormal data in skb->data, the extension bitmap length
exceeds 32 when decoding ras message then uses the length to make
a shift operation. It will change into negative after several loop.
UBSAN load could detect a negative shift as an undefined behaviour
and reports exception.
So we add the protection to avoid the length exceeding 32. Or else
it will return out of range error and stop decoding.

Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Signed-off-by: Lena Wang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_conntrack_h323_asn1.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index e697a824b0018..540d97715bd23 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -533,6 +533,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
/* Get fields bitmap */
if (nf_h323_error_boundary(bs, 0, f->sz))
return H323_ERROR_BOUND;
+ if (f->sz > 32)
+ return H323_ERROR_RANGE;
bmp = get_bitmap(bs, f->sz);
if (base)
*(unsigned int *)base = bmp;
@@ -589,6 +591,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
bmp2_len = get_bits(bs, 7) + 1;
if (nf_h323_error_boundary(bs, 0, bmp2_len))
return H323_ERROR_BOUND;
+ if (bmp2_len > 32)
+ return H323_ERROR_RANGE;
bmp2 = get_bitmap(bs, bmp2_len);
bmp |= bmp2 >> f->sz;
if (base)
--
2.43.0


2024-03-13 17:18:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 27/71] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout

From: Jason Xing <[email protected]>

[ Upstream commit f99b494b40431f0ca416859f2345746199398e2b ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 88941b66631fc..5472e79cde830 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -461,7 +461,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t4 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
- msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_no_activity_timeout));
nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);

nr->bpqext = 1;
--
2.43.0


2024-03-13 17:18:22

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 19/71] netrom: Fix a data-race around sysctl_netrom_default_path_quality

From: Jason Xing <[email protected]>

[ Upstream commit 958d6145a6d9ba9e075c921aead8753fb91c9101 ]

We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index baea3cbd76ca5..6f709fdffc11f 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -153,7 +153,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
nr_neigh->digipeat = NULL;
nr_neigh->ax25 = NULL;
nr_neigh->dev = dev;
- nr_neigh->quality = sysctl_netrom_default_path_quality;
+ nr_neigh->quality = READ_ONCE(sysctl_netrom_default_path_quality);
nr_neigh->locked = 0;
nr_neigh->count = 0;
nr_neigh->number = nr_neigh_no++;
--
2.43.0


2024-03-13 17:19:20

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 66/71] fs/proc: do_task_stat: use __for_each_thread()

From: Oleg Nesterov <[email protected]>

[ Upstream commit 7904e53ed5a20fc678c01d5d1b07ec486425bb6a ]

do/while_each_thread should be avoided when possible.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: 7601df8031fd ("fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats")
Signed-off-by: Sasha Levin <[email protected]>
---
fs/proc/array.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 1b0d78dfd20f9..bcb645627991e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -526,12 +526,13 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,

/* add up live thread stats at the group level */
if (whole) {
- struct task_struct *t = task;
- do {
+ struct task_struct *t;
+
+ __for_each_thread(sig, t) {
min_flt += t->min_flt;
maj_flt += t->maj_flt;
gtime += task_gtime(t);
- } while_each_thread(task, t);
+ }

min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
--
2.43.0


2024-03-13 17:19:30

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 22/71] netrom: Fix a data-race around sysctl_netrom_transport_timeout

From: Jason Xing <[email protected]>

[ Upstream commit 60a7a152abd494ed4f69098cf0f322e6bb140612 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ec5747969f964..3c6567af2ba47 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -453,7 +453,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr_init_timers(sk);

nr->t1 =
- msecs_to_jiffies(sysctl_netrom_transport_timeout);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
nr->t2 =
msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
nr->n2 =
--
2.43.0


2024-03-13 17:19:30

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 32/71] KVM: s390: vsie: fix race during shadow creation

From: Christian Borntraeger <[email protected]>

[ Upstream commit fe752331d4b361d43cfd0b89534b4b2176057c32 ]

Right now it is possible to see gmap->private being zero in
kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the
fact that we add gmap->private == kvm after creation:

static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
struct vsie_page *vsie_page)
{
[...]
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;

Let children inherit the private field of the parent.

Reported-by: Marc Hartmayer <[email protected]>
Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
Cc: <[email protected]>
Cc: David Hildenbrand <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
arch/s390/kvm/vsie.c | 1 -
arch/s390/mm/gmap.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index b2dbf08a961e5..d90c818a9ae71 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1216,7 +1216,6 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
- gmap->private = vcpu->kvm;
vcpu->kvm->stat.gmap_shadow_create++;
WRITE_ONCE(vsie_page->gmap, gmap);
return 0;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 243f673fa6515..662cf23a1b44b 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1675,6 +1675,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
return ERR_PTR(-ENOMEM);
new->mm = parent->mm;
new->parent = gmap_get(parent);
+ new->private = parent->private;
new->orig_asce = asce;
new->edat_level = edat_level;
new->initialized = false;
--
2.43.0


2024-03-13 17:19:34

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 23/71] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries

From: Jason Xing <[email protected]>

[ Upstream commit e799299aafed417cc1f32adccb2a0e5268b3f6d5 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 3c6567af2ba47..be404ace98786 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -457,7 +457,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t2 =
msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
nr->n2 =
- msecs_to_jiffies(sysctl_netrom_transport_maximum_tries);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
nr->idle =
--
2.43.0


2024-03-13 17:19:37

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 67/71] fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats

From: Oleg Nesterov <[email protected]>

[ Upstream commit 7601df8031fd67310af891897ef6cc0df4209305 ]

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/proc/array.c | 58 +++++++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index bcb645627991e..d210b2f8b7ed5 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -467,13 +467,13 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
int permitted;
struct mm_struct *mm;
unsigned long long start_time;
- unsigned long cmin_flt = 0, cmaj_flt = 0;
- unsigned long min_flt = 0, maj_flt = 0;
- u64 cutime, cstime, utime, stime;
- u64 cgtime, gtime;
+ unsigned long cmin_flt, cmaj_flt, min_flt, maj_flt;
+ u64 cutime, cstime, cgtime, utime, stime, gtime;
unsigned long rsslim = 0;
unsigned long flags;
int exit_code = task->exit_code;
+ struct signal_struct *sig = task->signal;
+ unsigned int seq = 1;

state = *get_task_state(task);
vsize = eip = esp = 0;
@@ -501,12 +501,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,

sigemptyset(&sigign);
sigemptyset(&sigcatch);
- cutime = cstime = 0;
- cgtime = gtime = 0;

if (lock_task_sighand(task, &flags)) {
- struct signal_struct *sig = task->signal;
-
if (sig->tty) {
struct pid *pgrp = tty_get_pgrp(sig->tty);
tty_pgrp = pid_nr_ns(pgrp, ns);
@@ -517,27 +513,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
num_threads = get_nr_threads(task);
collect_sigign_sigcatch(task, &sigign, &sigcatch);

- cmin_flt = sig->cmin_flt;
- cmaj_flt = sig->cmaj_flt;
- cutime = sig->cutime;
- cstime = sig->cstime;
- cgtime = sig->cgtime;
rsslim = READ_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);

- /* add up live thread stats at the group level */
if (whole) {
- struct task_struct *t;
-
- __for_each_thread(sig, t) {
- min_flt += t->min_flt;
- maj_flt += t->maj_flt;
- gtime += task_gtime(t);
- }
-
- min_flt += sig->min_flt;
- maj_flt += sig->maj_flt;
- gtime += sig->gtime;
-
if (sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_STOP_STOPPED))
exit_code = sig->group_exit_code;
}
@@ -552,6 +530,34 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
if (permitted && (!whole || num_threads < 2))
wchan = !task_is_running(task);

+ do {
+ seq++; /* 2 on the 1st/lockless path, otherwise odd */
+ flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+
+ cmin_flt = sig->cmin_flt;
+ cmaj_flt = sig->cmaj_flt;
+ cutime = sig->cutime;
+ cstime = sig->cstime;
+ cgtime = sig->cgtime;
+
+ if (whole) {
+ struct task_struct *t;
+
+ min_flt = sig->min_flt;
+ maj_flt = sig->maj_flt;
+ gtime = sig->gtime;
+
+ rcu_read_lock();
+ __for_each_thread(sig, t) {
+ min_flt += t->min_flt;
+ maj_flt += t->maj_flt;
+ gtime += task_gtime(t);
+ }
+ rcu_read_unlock();
+ }
+ } while (need_seqretry(&sig->stats_lock, seq));
+ done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
+
if (whole) {
thread_group_cputime_adjusted(task, &utime, &stime);
} else {
--
2.43.0


2024-03-13 17:20:04

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 69/71] blk-wbt: fix that wbt can't be disabled by default

From: Yu Kuai <[email protected]>

[ Upstream commit 8a2b20a997a3779ae9fcae268f2959eb82ec05a1 ]

commit b11d31ae01e6 ("blk-wbt: remove unnecessary check in
wbt_enable_default()") removes the checking of CONFIG_BLK_WBT_MQ by
mistake, which is used to control enable or disable wbt by default.

Fix the problem by adding back the checking. This patch also do a litter
cleanup to make related code more readable.

Fixes: b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()")
Reported-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/lkml/CAKXUXMzfKq_J9nKHGyr5P5rvUETY4B-fxoQD4sO+NYjFOfVtZA@mail.gmail.com/t/
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-wbt.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index fcacdff8af93b..526fb12c3e4cf 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -640,14 +640,16 @@ void wbt_enable_default(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
struct rq_qos *rqos;
- bool disable_flag = q->elevator &&
- test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags);
+ bool enable = IS_ENABLED(CONFIG_BLK_WBT_MQ);
+
+ if (q->elevator &&
+ test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags))
+ enable = false;

/* Throttling already enabled? */
rqos = wbt_rq_qos(q);
if (rqos) {
- if (!disable_flag &&
- RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
+ if (enable && RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
RQWB(rqos)->enable_state = WBT_STATE_ON_DEFAULT;
return;
}
@@ -656,7 +658,7 @@ void wbt_enable_default(struct gendisk *disk)
if (!blk_queue_registered(q))
return;

- if (queue_is_mq(q) && !disable_flag)
+ if (queue_is_mq(q) && enable)
wbt_init(disk);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
--
2.43.0


2024-03-13 17:20:19

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 21/71] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser

From: Jason Xing <[email protected]>

[ Upstream commit 119cae5ea3f9e35cdada8e572cc067f072fa825a ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_subr.c | 5 +++--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/netrom/nr_dev.c b/net/netrom/nr_dev.c
index 3aaac4a22b387..2c34389c3ce6f 100644
--- a/net/netrom/nr_dev.c
+++ b/net/netrom/nr_dev.c
@@ -81,7 +81,7 @@ static int nr_header(struct sk_buff *skb, struct net_device *dev,
buff[6] |= AX25_SSSID_SPARE;
buff += AX25_ADDR_LEN;

- *buff++ = sysctl_netrom_network_ttl_initialiser;
+ *buff++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);

*buff++ = NR_PROTO_IP;
*buff++ = NR_PROTO_IP;
diff --git a/net/netrom/nr_out.c b/net/netrom/nr_out.c
index 44929657f5b71..5e531394a724b 100644
--- a/net/netrom/nr_out.c
+++ b/net/netrom/nr_out.c
@@ -204,7 +204,7 @@ void nr_transmit_buffer(struct sock *sk, struct sk_buff *skb)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;

- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);

if (!nr_route_frame(skb, NULL)) {
kfree_skb(skb);
diff --git a/net/netrom/nr_subr.c b/net/netrom/nr_subr.c
index e2d2af924cff4..c3bbd5880850b 100644
--- a/net/netrom/nr_subr.c
+++ b/net/netrom/nr_subr.c
@@ -182,7 +182,8 @@ void nr_write_internal(struct sock *sk, int frametype)
*dptr++ = nr->my_id;
*dptr++ = frametype;
*dptr++ = nr->window;
- if (nr->bpqext) *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ if (nr->bpqext)
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
break;

case NR_DISCREQ:
@@ -236,7 +237,7 @@ void __nr_transmit_reply(struct sk_buff *skb, int mine, unsigned char cmdflags)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;

- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);

if (mine) {
*dptr++ = 0;
--
2.43.0


2024-03-13 17:21:28

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 60/71] drm/amd/display: Wrong colorimetry workaround

From: Ma Hanghong <[email protected]>

[ Upstream commit b1a98cf89a695d36c414653634ea7ba91b6e701f ]

[Why]
For FreeSync HDR, native color space flag in AMD VSIF(BT.709) should be
used when intepreting content and color space flag in VSC or AVI
infoFrame should be ignored. However, it turned out some userspace
application still use color flag in VSC or AVI infoFrame which is
incorrect.

[How]
Transfer function is used when building the VSC and AVI infoFrame. Set
colorimetry to BT.709 when all the following match:

1. Pixel format is YCbCr;
2. In FreeSync 2 HDR, color is COLOR_SPACE_2020_YCBCR;
3. Transfer function is TRANSFER_FUNC_GAMMA_22;

Tested-by: Mark Broadworth <[email protected]>
Reviewed-by: Krunoslav Kovac <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Ma Hanghong <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Stable-dep-of: e6a7df96facd ("drm/amd/display: Fix MST Null Ptr for RV")
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 ++++-
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 6 ++++++
drivers/gpu/drm/amd/display/modules/inc/mod_info_packet.h | 3 ++-
.../gpu/drm/amd/display/modules/info_packet/info_packet.c | 6 +++++-
4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index da16048bf1004..bea49befdcacc 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5938,6 +5938,7 @@ create_stream_for_sink(struct amdgpu_dm_connector *aconnector,
bool scale = dm_state ? (dm_state->scaling != RMX_OFF) : false;
int mode_refresh;
int preferred_refresh = 0;
+ enum color_transfer_func tf = TRANSFER_FUNC_UNKNOWN;
#if defined(CONFIG_DRM_AMD_DC_DCN)
struct dsc_dec_dpcd_caps dsc_caps;
#endif
@@ -6071,7 +6072,9 @@ create_stream_for_sink(struct amdgpu_dm_connector *aconnector,
if (stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED)
stream->use_vsc_sdp_for_colorimetry = true;
}
- mod_build_vsc_infopacket(stream, &stream->vsc_infopacket, stream->output_color_space);
+ if (stream->out_transfer_func->tf == TRANSFER_FUNCTION_GAMMA22)
+ tf = TRANSFER_FUNC_GAMMA_22;
+ mod_build_vsc_infopacket(stream, &stream->vsc_infopacket, stream->output_color_space, tf);
aconnector->psr_skip_count = AMDGPU_DM_PSR_ENTRY_DELAY;

}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 66923f51037a3..e2f80cd0ca8cb 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -3038,6 +3038,12 @@ static void set_avi_info_frame(
hdmi_info.bits.C0_C1 = COLORIMETRY_EXTENDED;
}

+ if (pixel_encoding && color_space == COLOR_SPACE_2020_YCBCR &&
+ stream->out_transfer_func->tf == TRANSFER_FUNCTION_GAMMA22) {
+ hdmi_info.bits.EC0_EC2 = 0;
+ hdmi_info.bits.C0_C1 = COLORIMETRY_ITU709;
+ }
+
/* TODO: un-hardcode aspect ratio */
aspect = stream->timing.aspect_ratio;

diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_info_packet.h b/drivers/gpu/drm/amd/display/modules/inc/mod_info_packet.h
index 1d8b746b02f24..edf5845f6a1f7 100644
--- a/drivers/gpu/drm/amd/display/modules/inc/mod_info_packet.h
+++ b/drivers/gpu/drm/amd/display/modules/inc/mod_info_packet.h
@@ -35,7 +35,8 @@ struct mod_vrr_params;

void mod_build_vsc_infopacket(const struct dc_stream_state *stream,
struct dc_info_packet *info_packet,
- enum dc_color_space cs);
+ enum dc_color_space cs,
+ enum color_transfer_func tf);

void mod_build_hf_vsif_infopacket(const struct dc_stream_state *stream,
struct dc_info_packet *info_packet);
diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
index 27ceba9d6d658..69691058ab898 100644
--- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
+++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
@@ -132,7 +132,8 @@ enum ColorimetryYCCDP {

void mod_build_vsc_infopacket(const struct dc_stream_state *stream,
struct dc_info_packet *info_packet,
- enum dc_color_space cs)
+ enum dc_color_space cs,
+ enum color_transfer_func tf)
{
unsigned int vsc_packet_revision = vsc_packet_undefined;
unsigned int i;
@@ -382,6 +383,9 @@ void mod_build_vsc_infopacket(const struct dc_stream_state *stream,
colorimetryFormat = ColorimetryYCC_DP_AdobeYCC;
else if (cs == COLOR_SPACE_2020_YCBCR)
colorimetryFormat = ColorimetryYCC_DP_ITU2020YCbCr;
+
+ if (cs == COLOR_SPACE_2020_YCBCR && tf == TRANSFER_FUNC_GAMMA_22)
+ colorimetryFormat = ColorimetryYCC_DP_ITU709;
break;

default:
--
2.43.0


2024-03-13 17:21:31

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 26/71] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size

From: Jason Xing <[email protected]>

[ Upstream commit a2e706841488f474c06e9b33f71afc947fb3bf56 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ee6621c0d2e45..88941b66631fc 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -462,7 +462,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
- nr->window = sysctl_netrom_transport_requested_window_size;
+ nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);

nr->bpqext = 1;
nr->state = NR_STATE_0;
--
2.43.0


2024-03-13 17:21:32

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 29/71] netrom: Fix a data-race around sysctl_netrom_link_fails_count

From: Jason Xing <[email protected]>

[ Upstream commit bc76645ebdd01be9b9994dac39685a3d0f6f7985 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 89e12e6eea2ef..70480869ad1c5 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -728,7 +728,7 @@ void nr_link_failed(ax25_cb *ax25, int reason)
nr_neigh->ax25 = NULL;
ax25_cb_put(ax25);

- if (++nr_neigh->failed < sysctl_netrom_link_fails_count) {
+ if (++nr_neigh->failed < READ_ONCE(sysctl_netrom_link_fails_count)) {
nr_neigh_put(nr_neigh);
return;
}
--
2.43.0


2024-03-13 17:21:51

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 64/71] getrusage: use __for_each_thread()

From: Oleg Nesterov <[email protected]>

[ Upstream commit 13b7bc60b5353371460a203df6c38ccd38ad7a3a ]

do/while_each_thread should be avoided when possible.

Plus this change allows to avoid lock_task_sighand(), we can use rcu
and/or sig->stats_lock instead.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: f7ec1cd5cc7e ("getrusage: use sig->stats_lock rather than lock_task_sighand()")
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/sys.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 2646047fe5513..04102538cf43f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1822,10 +1822,8 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_oublock += sig->oublock;
if (maxrss < sig->maxrss)
maxrss = sig->maxrss;
- t = p;
- do {
+ __for_each_thread(sig, t)
accumulate_thread_rusage(t, r);
- } while_each_thread(p, t);
break;

default:
--
2.43.0


2024-03-13 17:21:51

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 30/71] netrom: Fix data-races around sysctl_net_busy_read

From: Jason Xing <[email protected]>

[ Upstream commit d380ce70058a4ccddc3e5f5c2063165dc07672c6 ]

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netrom/af_netrom.c | 2 +-
net/netrom/nr_in.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 5472e79cde830..f0879295de110 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -954,7 +954,7 @@ int nr_rx_frame(struct sk_buff *skb, struct net_device *dev)
* G8PZT's Xrouter which is sending packets with command type 7
* as an extension of the protocol.
*/
- if (sysctl_netrom_reset_circuit &&
+ if (READ_ONCE(sysctl_netrom_reset_circuit) &&
(frametype != NR_RESET || flags != 0))
nr_transmit_reset(skb, 1);

diff --git a/net/netrom/nr_in.c b/net/netrom/nr_in.c
index 2f084b6f69d7e..97944db6b5ac6 100644
--- a/net/netrom/nr_in.c
+++ b/net/netrom/nr_in.c
@@ -97,7 +97,7 @@ static int nr_state1_machine(struct sock *sk, struct sk_buff *skb,
break;

case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;

@@ -128,7 +128,7 @@ static int nr_state2_machine(struct sock *sk, struct sk_buff *skb,
break;

case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;

@@ -262,7 +262,7 @@ static int nr_state3_machine(struct sock *sk, struct sk_buff *skb, int frametype
break;

case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;

--
2.43.0


2024-03-13 17:22:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 58/71] blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos

From: Christoph Hellwig <[email protected]>

[ Upstream commit ba91c849fa50dbc6519cf7808177b3a9b7f6bc97 ]

This is what about half of the users already want, and it's only going to
grow more.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-iocost.c | 12 ++++++------
block/blk-iolatency.c | 14 +++++++-------
block/blk-mq-debugfs.c | 10 ++++------
block/blk-rq-qos.c | 4 ++--
block/blk-rq-qos.h | 2 +-
block/blk-wbt.c | 16 +++++++---------
6 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 78958c5bece08..ab5830ba23e0f 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -670,7 +670,7 @@ static struct ioc *q_to_ioc(struct request_queue *q)

static const char __maybe_unused *ioc_name(struct ioc *ioc)
{
- struct gendisk *disk = ioc->rqos.q->disk;
+ struct gendisk *disk = ioc->rqos.disk;

if (!disk)
return "<unknown>";
@@ -809,11 +809,11 @@ static int ioc_autop_idx(struct ioc *ioc)
u64 now_ns;

/* rotational? */
- if (!blk_queue_nonrot(ioc->rqos.q))
+ if (!blk_queue_nonrot(ioc->rqos.disk->queue))
return AUTOP_HDD;

/* handle SATA SSDs w/ broken NCQ */
- if (blk_queue_depth(ioc->rqos.q) == 1)
+ if (blk_queue_depth(ioc->rqos.disk->queue) == 1)
return AUTOP_SSD_QD1;

/* use one of the normal ssd sets */
@@ -2653,7 +2653,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
if (use_debt) {
iocg_incur_debt(iocg, abs_cost, &now);
if (iocg_kick_delay(iocg, &now))
- blkcg_schedule_throttle(rqos->q->disk,
+ blkcg_schedule_throttle(rqos->disk,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
iocg_unlock(iocg, ioc_locked, &flags);
return;
@@ -2754,7 +2754,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
if (likely(!list_empty(&iocg->active_list))) {
iocg_incur_debt(iocg, abs_cost, &now);
if (iocg_kick_delay(iocg, &now))
- blkcg_schedule_throttle(rqos->q->disk,
+ blkcg_schedule_throttle(rqos->disk,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
} else {
iocg_commit_bio(iocg, bio, abs_cost, cost);
@@ -2825,7 +2825,7 @@ static void ioc_rqos_exit(struct rq_qos *rqos)
{
struct ioc *ioc = rqos_to_ioc(rqos);

- blkcg_deactivate_policy(rqos->q, &blkcg_policy_iocost);
+ blkcg_deactivate_policy(rqos->disk->queue, &blkcg_policy_iocost);

spin_lock_irq(&ioc->lock);
ioc->running = IOC_STOP;
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index b0f8550f87cd2..268e6653b5a62 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -292,7 +292,7 @@ static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
unsigned use_delay = atomic_read(&lat_to_blkg(iolat)->use_delay);

if (use_delay)
- blkcg_schedule_throttle(rqos->q->disk, use_memdelay);
+ blkcg_schedule_throttle(rqos->disk, use_memdelay);

/*
* To avoid priority inversions we want to just take a slot if we are
@@ -330,7 +330,7 @@ static void scale_cookie_change(struct blk_iolatency *blkiolat,
struct child_latency_info *lat_info,
bool up)
{
- unsigned long qd = blkiolat->rqos.q->nr_requests;
+ unsigned long qd = blkiolat->rqos.disk->queue->nr_requests;
unsigned long scale = scale_amount(qd, up);
unsigned long old = atomic_read(&lat_info->scale_cookie);
unsigned long max_scale = qd << 1;
@@ -370,7 +370,7 @@ static void scale_cookie_change(struct blk_iolatency *blkiolat,
*/
static void scale_change(struct iolatency_grp *iolat, bool up)
{
- unsigned long qd = iolat->blkiolat->rqos.q->nr_requests;
+ unsigned long qd = iolat->blkiolat->rqos.disk->queue->nr_requests;
unsigned long scale = scale_amount(qd, up);
unsigned long old = iolat->rq_depth.max_depth;

@@ -647,7 +647,7 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)

del_timer_sync(&blkiolat->timer);
flush_work(&blkiolat->enable_work);
- blkcg_deactivate_policy(rqos->q, &blkcg_policy_iolatency);
+ blkcg_deactivate_policy(rqos->disk->queue, &blkcg_policy_iolatency);
kfree(blkiolat);
}

@@ -666,7 +666,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)

rcu_read_lock();
blkg_for_each_descendant_pre(blkg, pos_css,
- blkiolat->rqos.q->root_blkg) {
+ blkiolat->rqos.disk->queue->root_blkg) {
struct iolatency_grp *iolat;
struct child_latency_info *lat_info;
unsigned long flags;
@@ -750,9 +750,9 @@ static void blkiolatency_enable_work_fn(struct work_struct *work)
*/
enabled = atomic_read(&blkiolat->enable_cnt);
if (enabled != blkiolat->enabled) {
- blk_mq_freeze_queue(blkiolat->rqos.q);
+ blk_mq_freeze_queue(blkiolat->rqos.disk->queue);
blkiolat->enabled = enabled;
- blk_mq_unfreeze_queue(blkiolat->rqos.q);
+ blk_mq_unfreeze_queue(blkiolat->rqos.disk->queue);
}
}

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7675e663df365..c152276736832 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -813,9 +813,9 @@ static const char *rq_qos_id_to_name(enum rq_qos_id id)

void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
- lockdep_assert_held(&rqos->q->debugfs_mutex);
+ lockdep_assert_held(&rqos->disk->queue->debugfs_mutex);

- if (!rqos->q->debugfs_dir)
+ if (!rqos->disk->queue->debugfs_dir)
return;
debugfs_remove_recursive(rqos->debugfs_dir);
rqos->debugfs_dir = NULL;
@@ -823,7 +823,7 @@ void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)

void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
- struct request_queue *q = rqos->q;
+ struct request_queue *q = rqos->disk->queue;
const char *dir_name = rq_qos_id_to_name(rqos->id);

lockdep_assert_held(&q->debugfs_mutex);
@@ -835,9 +835,7 @@ void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
q->rqos_debugfs_dir = debugfs_create_dir("rqos",
q->debugfs_dir);

- rqos->debugfs_dir = debugfs_create_dir(dir_name,
- rqos->q->rqos_debugfs_dir);
-
+ rqos->debugfs_dir = debugfs_create_dir(dir_name, q->rqos_debugfs_dir);
debugfs_create_files(rqos->debugfs_dir, rqos, rqos->ops->debugfs_attrs);
}

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 8e83734cfe8db..d8cc820a365e3 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -300,7 +300,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
{
struct request_queue *q = disk->queue;

- rqos->q = q;
+ rqos->disk = disk;
rqos->id = id;
rqos->ops = ops;

@@ -337,7 +337,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,

void rq_qos_del(struct rq_qos *rqos)
{
- struct request_queue *q = rqos->q;
+ struct request_queue *q = rqos->disk->queue;
struct rq_qos **cur;

/*
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 2b7b668479f71..b02a1a3d33a89 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -26,7 +26,7 @@ struct rq_wait {

struct rq_qos {
const struct rq_qos_ops *ops;
- struct request_queue *q;
+ struct gendisk *disk;
enum rq_qos_id id;
struct rq_qos *next;
#ifdef CONFIG_BLK_DEBUG_FS
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index d9398347b08d8..e9206b1406e76 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -98,7 +98,7 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
*/
static bool wb_recent_wait(struct rq_wb *rwb)
{
- struct bdi_writeback *wb = &rwb->rqos.q->disk->bdi->wb;
+ struct bdi_writeback *wb = &rwb->rqos.disk->bdi->wb;

return time_before(jiffies, wb->dirty_sleep + HZ);
}
@@ -235,7 +235,7 @@ enum {

static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
{
- struct backing_dev_info *bdi = rwb->rqos.q->disk->bdi;
+ struct backing_dev_info *bdi = rwb->rqos.disk->bdi;
struct rq_depth *rqd = &rwb->rq_depth;
u64 thislat;

@@ -288,7 +288,7 @@ static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)

static void rwb_trace_step(struct rq_wb *rwb, const char *msg)
{
- struct backing_dev_info *bdi = rwb->rqos.q->disk->bdi;
+ struct backing_dev_info *bdi = rwb->rqos.disk->bdi;
struct rq_depth *rqd = &rwb->rq_depth;

trace_wbt_step(bdi, msg, rqd->scale_step, rwb->cur_win_nsec,
@@ -358,13 +358,12 @@ static void wb_timer_fn(struct blk_stat_callback *cb)
unsigned int inflight = wbt_inflight(rwb);
int status;

- if (!rwb->rqos.q->disk)
+ if (!rwb->rqos.disk)
return;

status = latency_exceeded(rwb, cb->stat);

- trace_wbt_timer(rwb->rqos.q->disk->bdi, status, rqd->scale_step,
- inflight);
+ trace_wbt_timer(rwb->rqos.disk->bdi, status, rqd->scale_step, inflight);

/*
* If we exceeded the latency target, step down. If we did not,
@@ -689,16 +688,15 @@ static int wbt_data_dir(const struct request *rq)

static void wbt_queue_depth_changed(struct rq_qos *rqos)
{
- RQWB(rqos)->rq_depth.queue_depth = blk_queue_depth(rqos->q);
+ RQWB(rqos)->rq_depth.queue_depth = blk_queue_depth(rqos->disk->queue);
wbt_update_limits(RQWB(rqos));
}

static void wbt_exit(struct rq_qos *rqos)
{
struct rq_wb *rwb = RQWB(rqos);
- struct request_queue *q = rqos->q;

- blk_stat_remove_callback(q, rwb->cb);
+ blk_stat_remove_callback(rqos->disk->queue, rwb->cb);
blk_stat_free_callback(rwb->cb);
kfree(rwb);
}
--
2.43.0


2024-03-13 17:22:10

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 59/71] blk-wbt: Fix detection of dirty-throttled tasks

From: Jan Kara <[email protected]>

[ Upstream commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5 ]

The detection of dirty-throttled tasks in blk-wbt has been subtly broken
since its beginning in 2016. Namely if we are doing cgroup writeback and
the throttled task is not in the root cgroup, balance_dirty_pages() will
set dirty_sleep for the non-root bdi_writeback structure. However
blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback
structure. Thus detection of recently throttled tasks is not working in
this case (we noticed this when we switched to cgroup v2 and suddently
writeback was slow).

Since blk-wbt has no easy way to get to proper bdi_writeback and
furthermore its intention has always been to work on the whole device
rather than on individual cgroups, just move the dirty_sleep timestamp
from bdi_writeback to backing_dev_info. That fixes the checking for
recently throttled task and saves memory for everybody as a bonus.

CC: [email protected]
Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()")
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[axboe: fixup indentation errors]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-wbt.c | 4 ++--
include/linux/backing-dev-defs.h | 7 +++++--
mm/backing-dev.c | 2 +-
mm/page-writeback.c | 2 +-
4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index e9206b1406e76..fcacdff8af93b 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -98,9 +98,9 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
*/
static bool wb_recent_wait(struct rq_wb *rwb)
{
- struct bdi_writeback *wb = &rwb->rqos.disk->bdi->wb;
+ struct backing_dev_info *bdi = rwb->rqos.disk->bdi;

- return time_before(jiffies, wb->dirty_sleep + HZ);
+ return time_before(jiffies, bdi->last_bdp_sleep + HZ);
}

static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index ae12696ec492c..2ad261082bba5 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -141,8 +141,6 @@ struct bdi_writeback {
struct delayed_work dwork; /* work item used for writeback */
struct delayed_work bw_dwork; /* work item used for bandwidth estimate */

- unsigned long dirty_sleep; /* last wait */
-
struct list_head bdi_node; /* anchored at bdi->wb_list */

#ifdef CONFIG_CGROUP_WRITEBACK
@@ -179,6 +177,11 @@ struct backing_dev_info {
* any dirty wbs, which is depended upon by bdi_has_dirty().
*/
atomic_long_t tot_write_bandwidth;
+ /*
+ * Jiffies when last process was dirty throttled on this bdi. Used by
+ * blk-wbt.
+ */
+ unsigned long last_bdp_sleep;

struct bdi_writeback wb; /* the root writeback info for this bdi */
struct list_head wb_list; /* list of all wbs */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index bf5525c2e561a..c070ff9ef9cf3 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -305,7 +305,6 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
INIT_LIST_HEAD(&wb->work_list);
INIT_DELAYED_WORK(&wb->dwork, wb_workfn);
INIT_DELAYED_WORK(&wb->bw_dwork, wb_update_bandwidth_workfn);
- wb->dirty_sleep = jiffies;

err = fprop_local_init_percpu(&wb->completions, gfp);
if (err)
@@ -793,6 +792,7 @@ int bdi_init(struct backing_dev_info *bdi)
INIT_LIST_HEAD(&bdi->bdi_list);
INIT_LIST_HEAD(&bdi->wb_list);
init_waitqueue_head(&bdi->wb_waitq);
+ bdi->last_bdp_sleep = jiffies;

return cgwb_bdi_init(bdi);
}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d3e9d12860b9f..9046d1f1b408e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1809,7 +1809,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
break;
}
__set_current_state(TASK_KILLABLE);
- wb->dirty_sleep = now;
+ bdi->last_bdp_sleep = jiffies;
io_schedule_timeout(pause);

current->dirty_paused_when = now + pause;
--
2.43.0


2024-03-13 17:22:25

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 71/71] Linux 6.1.82-rc1

Signed-off-by: Sasha Levin <[email protected]>
---
Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index e13df565a1cb6..7b56c31cc79ba 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 6
PATCHLEVEL = 1
-SUBLEVEL = 81
-EXTRAVERSION =
+SUBLEVEL = 82
+EXTRAVERSION = -rc1
NAME = Curry Ramen

# *DOCUMENTATION*
--
2.43.0


2024-03-13 17:22:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 34/71] drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions

From: Srinivasan Shanmugam <[email protected]>

[ Upstream commit a58371d632ebab9ea63f10893a6b6731196b6f8d ]

The 'status' variable in 'core_link_read_dpcd()' &
'core_link_write_dpcd()' was uninitialized.

Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'.
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'.

Cc: [email protected]
Cc: Jerry Zuo <[email protected]>
Cc: Jun Lei <[email protected]>
Cc: Wayne Lin <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/amd/display/dc/core/dc_link_dpcd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpcd.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpcd.c
index af110bf9470fa..aefca9756dbe8 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpcd.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpcd.c
@@ -202,7 +202,7 @@ enum dc_status core_link_read_dpcd(
uint32_t extended_size;
/* size of the remaining partitioned address space */
uint32_t size_left_to_read;
- enum dc_status status;
+ enum dc_status status = DC_ERROR_UNEXPECTED;
/* size of the next partition to be read from */
uint32_t partition_size;
uint32_t data_index = 0;
@@ -231,7 +231,7 @@ enum dc_status core_link_write_dpcd(
{
uint32_t partition_size;
uint32_t data_index = 0;
- enum dc_status status;
+ enum dc_status status = DC_ERROR_UNEXPECTED;

while (size) {
partition_size = dpcd_get_next_partition_size(address, size);
--
2.43.0


2024-03-13 17:23:15

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 50/71] blk-wbt: remove unnecessary check in wbt_enable_default()

From: Yu Kuai <[email protected]>

[ Upstream commit b11d31ae01e6b0762b28e645ad6718a12faa8d14 ]

If CONFIG_BLK_WBT_MQ is disabled, wbt_init() won't do anything.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-wbt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index c293e08b301ff..c5a8c10028a08 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -651,7 +651,7 @@ void wbt_enable_default(struct request_queue *q)
if (!blk_queue_registered(q))
return;

- if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
+ if (queue_is_mq(q))
wbt_init(q);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
--
2.43.0


2024-03-13 17:23:17

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 38/71] selftests/mm: switch to bash from sh

From: Muhammad Usama Anjum <[email protected]>

[ Upstream commit bc29036e1da1cf66e5f8312649aeec2d51ea3d86 ]

Running charge_reserved_hugetlb.sh generates errors if sh is set to
dash:

/charge_reserved_hugetlb.sh: 9: [[: not found
/charge_reserved_hugetlb.sh: 19: [[: not found
/charge_reserved_hugetlb.sh: 27: [[: not found
/charge_reserved_hugetlb.sh: 37: [[: not found
/charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected

Switch to using /bin/bash instead of /bin/sh. Make the switch for
write_hugetlb_memory.sh as well which is called from
charge_reserved_hugetlb.sh.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: David Laight <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/vm/write_hugetlb_memory.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
index 0899019a7fcb4..e14bdd4455f2d 100644
--- a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0

# Kselftest framework requirement - SKIP code is 4.
diff --git a/tools/testing/selftests/vm/write_hugetlb_memory.sh b/tools/testing/selftests/vm/write_hugetlb_memory.sh
index 70a02301f4c27..3d2d2eb9d6fff 100644
--- a/tools/testing/selftests/vm/write_hugetlb_memory.sh
+++ b/tools/testing/selftests/vm/write_hugetlb_memory.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0

set -e
--
2.43.0


2024-03-13 17:23:21

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 68/71] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)

From: Oleg Nesterov <[email protected]>

[ Upstream commit c1be35a16b2f1fe21f4f26f9de030ad6eaaf6a25 ]

After the recent changes nobody use siglock to read the values protected
by stats_lock, we can kill spin_lock_irq(&current->sighand->siglock) and
update the comment.

With this patch only __exit_signal() and thread_group_start_cputime() take
stats_lock under siglock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/exit.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index bccfa4218356e..c95fffc625fcd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1146,17 +1146,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* and nobody can change them.
*
* psig->stats_lock also protects us from our sub-threads
- * which can reap other children at the same time. Until
- * we change k_getrusage()-like users to rely on this lock
- * we have to take ->siglock as well.
+ * which can reap other children at the same time.
*
* We use thread_group_cputime_adjusted() to get times for
* the thread group, which consolidates times for all threads
* in the group including the group leader.
*/
thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- spin_lock_irq(&current->sighand->siglock);
- write_seqlock(&psig->stats_lock);
+ write_seqlock_irq(&psig->stats_lock);
psig->cutime += tgutime + sig->cutime;
psig->cstime += tgstime + sig->cstime;
psig->cgtime += task_gtime(p) + sig->gtime + sig->cgtime;
@@ -1179,8 +1176,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
- write_sequnlock(&psig->stats_lock);
- spin_unlock_irq(&current->sighand->siglock);
+ write_sequnlock_irq(&psig->stats_lock);
}

if (wo->wo_rusage)
--
2.43.0


2024-03-13 17:23:39

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 70/71] blk-iocost: Pass gendisk to ioc_refresh_params

From: Breno Leitao <[email protected]>

[ Upstream commit e33b93650fc5364f773985a3e961e24349330d97 ]

Current kernel (d2980d8d826554fa6981d621e569a453787472f8) crashes
when blk_iocost_init for `nvme1` disk.

BUG: kernel NULL pointer dereference, address: 0000000000000050
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page

blk_iocost_init (include/asm-generic/qspinlock.h:128
include/linux/spinlock.h:203
include/linux/spinlock_api_smp.h:158
include/linux/spinlock.h:400
block/blk-iocost.c:2884)
ioc_qos_write (block/blk-iocost.c:3198)
? kretprobe_perf_func (kernel/trace/trace_kprobe.c:1566)
? kernfs_fop_write_iter (include/linux/slab.h:584 fs/kernfs/file.c:311)
? __kmem_cache_alloc_node (mm/slab.h:? mm/slub.c:3452 mm/slub.c:3491)
? _copy_from_iter (arch/x86/include/asm/uaccess_64.h:46
arch/x86/include/asm/uaccess_64.h:52
lib/iov_iter.c:183 lib/iov_iter.c:628)
? kretprobe_dispatcher (kernel/trace/trace_kprobe.c:1693)
cgroup_file_write (kernel/cgroup/cgroup.c:4061)
kernfs_fop_write_iter (fs/kernfs/file.c:334)
vfs_write (include/linux/fs.h:1849 fs/read_write.c:491
fs/read_write.c:584)
ksys_write (fs/read_write.c:637)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

This happens because ioc_refresh_params() is being called without
a properly initialized ioc->rqos, which is happening later in the callee
side.

ioc_refresh_params() -> ioc_autop_idx() tries to access
ioc->rqos.disk->queue but ioc->rqos.disk is NULL, causing the BUG above.

Create function, called ioc_refresh_params_disk(), that is similar to
ioc_refresh_params() but where the "struct gendisk" could be passed as
an explicit argument. This function will be called when ioc->rqos.disk
is not initialized.

Fixes: ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful")

Signed-off-by: Breno Leitao <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
block/blk-iocost.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index ab5830ba23e0f..0d4bc9d8f2cac 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -801,7 +801,11 @@ static void ioc_refresh_period_us(struct ioc *ioc)
ioc_refresh_margins(ioc);
}

-static int ioc_autop_idx(struct ioc *ioc)
+/*
+ * ioc->rqos.disk isn't initialized when this function is called from
+ * the init path.
+ */
+static int ioc_autop_idx(struct ioc *ioc, struct gendisk *disk)
{
int idx = ioc->autop_idx;
const struct ioc_params *p = &autop[idx];
@@ -809,11 +813,11 @@ static int ioc_autop_idx(struct ioc *ioc)
u64 now_ns;

/* rotational? */
- if (!blk_queue_nonrot(ioc->rqos.disk->queue))
+ if (!blk_queue_nonrot(disk->queue))
return AUTOP_HDD;

/* handle SATA SSDs w/ broken NCQ */
- if (blk_queue_depth(ioc->rqos.disk->queue) == 1)
+ if (blk_queue_depth(disk->queue) == 1)
return AUTOP_SSD_QD1;

/* use one of the normal ssd sets */
@@ -902,14 +906,19 @@ static void ioc_refresh_lcoefs(struct ioc *ioc)
&c[LCOEF_WPAGE], &c[LCOEF_WSEQIO], &c[LCOEF_WRANDIO]);
}

-static bool ioc_refresh_params(struct ioc *ioc, bool force)
+/*
+ * struct gendisk is required as an argument because ioc->rqos.disk
+ * is not properly initialized when called from the init path.
+ */
+static bool ioc_refresh_params_disk(struct ioc *ioc, bool force,
+ struct gendisk *disk)
{
const struct ioc_params *p;
int idx;

lockdep_assert_held(&ioc->lock);

- idx = ioc_autop_idx(ioc);
+ idx = ioc_autop_idx(ioc, disk);
p = &autop[idx];

if (idx == ioc->autop_idx && !force)
@@ -938,6 +947,11 @@ static bool ioc_refresh_params(struct ioc *ioc, bool force)
return true;
}

+static bool ioc_refresh_params(struct ioc *ioc, bool force)
+{
+ return ioc_refresh_params_disk(ioc, force, ioc->rqos.disk);
+}
+
/*
* When an iocg accumulates too much vtime or gets deactivated, we throw away
* some vtime, which lowers the overall device utilization. As the exact amount
@@ -2884,7 +2898,7 @@ static int blk_iocost_init(struct gendisk *disk)

spin_lock_irq(&ioc->lock);
ioc->autop_idx = AUTOP_INVALID;
- ioc_refresh_params(ioc, true);
+ ioc_refresh_params_disk(ioc, true, disk);
spin_unlock_irq(&ioc->lock);

/*
--
2.43.0


2024-03-13 17:23:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 41/71] xhci: handle isoc Babble and Buffer Overrun events properly

From: Michal Pecio <[email protected]>

[ Upstream commit 7c4650ded49e5b88929ecbbb631efb8b0838e811 ]

xHCI 4.9 explicitly forbids assuming that the xHC has released its
ownership of a multi-TRB TD when it reports an error on one of the
early TRBs. Yet the driver makes such assumption and releases the TD,
allowing the remaining TRBs to be freed or overwritten by new TDs.

The xHC should also report completion of the final TRB due to its IOC
flag being set by us, regardless of prior errors. This event cannot
be recognized if the TD has already been freed earlier, resulting in
"Transfer event TRB DMA ptr not part of current TD" error message.

Fix this by reusing the logic for processing isoc Transaction Errors.
This also handles hosts which fail to report the final completion.

Fix transfer length reporting on Babble errors. They may be caused by
device malfunction, no guarantee that the buffer has been filled.

Signed-off-by: Michal Pecio <[email protected]>
Cc: [email protected]
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/usb/host/xhci-ring.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e4441a71368e5..239b5edee3268 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2381,9 +2381,13 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
case COMP_BANDWIDTH_OVERRUN_ERROR:
frame->status = -ECOMM;
break;
- case COMP_ISOCH_BUFFER_OVERRUN:
case COMP_BABBLE_DETECTED_ERROR:
+ sum_trbs_for_length = true;
+ fallthrough;
+ case COMP_ISOCH_BUFFER_OVERRUN:
frame->status = -EOVERFLOW;
+ if (ep_trb != td->last_trb)
+ td->error_mid_td = true;
break;
case COMP_INCOMPATIBLE_DEVICE_ERROR:
case COMP_STALL_ERROR:
--
2.43.0


2024-03-13 17:23:44

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 47/71] selftests: mptcp: decrease BW in simult flows

From: "Matthieu Baerts (NGI0)" <[email protected]>

[ Upstream commit 5e2f3c65af47e527ccac54060cf909e3306652ff ]

When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.

To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.

Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space")
Cc: [email protected]
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-6-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 4a417f9d51d67..06ad0510469e3 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -301,10 +301,10 @@ done

setup
run_test 10 10 0 0 "balanced bwidth"
-run_test 10 10 1 50 "balanced bwidth with unbalanced delay"
+run_test 10 10 1 25 "balanced bwidth with unbalanced delay"

# we still need some additional infrastructure to pass the following test-cases
-run_test 30 10 0 0 "unbalanced bwidth"
-run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay"
-run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 0 0 "unbalanced bwidth"
+run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
exit $ret
--
2.43.0


2024-03-13 17:23:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 44/71] Documentation/hw-vuln: Add documentation for RFDS

From: Pawan Gupta <[email protected]>

commit 4e42765d1be01111df0c0275bbaf1db1acef346e upstream.

Add the documentation for transient execution vulnerability Register
File Data Sampling (RFDS) that affects Intel Atom CPUs.

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
2 files changed, 105 insertions(+)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index 6828102baaa7a..3e4a14e38b49e 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
cross-thread-rsb.rst
gather_data_sampling.rst
srso
+ reg-file-data-sampling
diff --git a/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
new file mode 100644
index 0000000000000..0585d02b9a6cb
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
@@ -0,0 +1,104 @@
+==================================
+Register File Data Sampling (RFDS)
+==================================
+
+Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
+only affects Intel Atom parts(also branded as E-cores). RFDS may allow
+a malicious actor to infer data values previously used in floating point
+registers, vector registers, or integer registers. RFDS does not provide the
+ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.
+
+Affected Processors
+===================
+Below is the list of affected Intel processors [#f1]_:
+
+ =================== ============
+ Common name Family_Model
+ =================== ============
+ ATOM_GOLDMONT 06_5CH
+ ATOM_GOLDMONT_D 06_5FH
+ ATOM_GOLDMONT_PLUS 06_7AH
+ ATOM_TREMONT_D 06_86H
+ ATOM_TREMONT 06_96H
+ ALDERLAKE 06_97H
+ ALDERLAKE_L 06_9AH
+ ATOM_TREMONT_L 06_9CH
+ RAPTORLAKE 06_B7H
+ RAPTORLAKE_P 06_BAH
+ ATOM_GRACEMONT 06_BEH
+ RAPTORLAKE_S 06_BFH
+ =================== ============
+
+As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
+RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
+vulnerable in Linux because they share the same family/model with an affected
+part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
+CPUID.HYBRID. This information could be used to distinguish between the
+affected and unaffected parts, but it is deemed not worth adding complexity as
+the reporting is fixed automatically when these parts enumerate RFDS_NO.
+
+Mitigation
+==========
+Intel released a microcode update that enables software to clear sensitive
+information using the VERW instruction. Like MDS, RFDS deploys the same
+mitigation strategy to force the CPU to clear the affected buffers before an
+attacker can extract the secrets. This is achieved by using the otherwise
+unused and obsolete VERW instruction in combination with a microcode update.
+The microcode clears the affected CPU buffers when the VERW instruction is
+executed.
+
+Mitigation points
+-----------------
+VERW is executed by the kernel before returning to user space, and by KVM
+before VMentry. None of the affected cores support SMT, so VERW is not required
+at C-state transitions.
+
+New bits in IA32_ARCH_CAPABILITIES
+----------------------------------
+Newer processors and microcode update on existing affected processors added new
+bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
+vulnerability and mitigation capability:
+
+- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
+- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
+ microcode that clears the affected buffers on VERW execution.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The kernel command line allows to control RFDS mitigation at boot time with the
+parameter "reg_file_data_sampling=". The valid arguments are:
+
+ ========== =================================================================
+ on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
+ on exit to userspace and before entering a VM.
+ off Disables mitigation.
+ ========== =================================================================
+
+Mitigation default is selected by CONFIG_MITIGATION_RFDS.
+
+Mitigation status information
+-----------------------------
+The Linux kernel provides a sysfs interface to enumerate the current
+vulnerability status of the system: whether the system is vulnerable, and
+which mitigations are active. The relevant sysfs file is:
+
+ /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
+
+The possible values in this file are:
+
+ .. list-table::
+
+ * - 'Not affected'
+ - The processor is not vulnerable
+ * - 'Vulnerable'
+ - The processor is vulnerable, but no mitigation enabled
+ * - 'Vulnerable: No microcode'
+ - The processor is vulnerable but microcode is not updated.
+ * - 'Mitigation: Clear Register File'
+ - The processor is vulnerable and the CPU buffer clearing mitigation is
+ enabled.
+
+References
+----------
+.. [#f1] Affected Processors
+ https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
--
2.43.0


2024-03-13 17:24:20

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 53/71] blk-wbt: pass a gendisk to wbt_{enable,disable}_default

From: Christoph Hellwig <[email protected]>

[ Upstream commit 04aad37be1a88de6a1919996a615437ac74de479 ]

Pass a gendisk to wbt_enable_default and wbt_disable_default to
prepare for phasing out usage of the request_queue in the blk-cgroup
code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
Signed-off-by: Sasha Levin <[email protected]>
---
block/bfq-iosched.c | 4 ++--
block/blk-iocost.c | 4 ++--
block/blk-sysfs.c | 2 +-
block/blk-wbt.c | 7 ++++---
block/blk-wbt.h | 8 ++++----
5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e4699291aee23..84b4763b2b223 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7060,7 +7060,7 @@ static void bfq_exit_queue(struct elevator_queue *e)

blk_stat_disable_accounting(bfqd->queue);
clear_bit(ELEVATOR_FLAG_DISABLE_WBT, &e->flags);
- wbt_enable_default(bfqd->queue);
+ wbt_enable_default(bfqd->queue->disk);

kfree(bfqd);
}
@@ -7206,7 +7206,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);

set_bit(ELEVATOR_FLAG_DISABLE_WBT, &eq->flags);
- wbt_disable_default(q);
+ wbt_disable_default(q->disk);
blk_stat_enable_accounting(q);

return 0;
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 3788774a7b729..72ca07f24b3c0 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3281,11 +3281,11 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
blk_stat_enable_accounting(disk->queue);
blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue);
ioc->enabled = true;
- wbt_disable_default(disk->queue);
+ wbt_disable_default(disk);
} else {
blk_queue_flag_clear(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue);
ioc->enabled = false;
- wbt_enable_default(disk->queue);
+ wbt_enable_default(disk);
}

if (user) {
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index a82bdec923b21..c59c4d3ee7a27 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -837,7 +837,7 @@ int blk_register_queue(struct gendisk *disk)
goto put_dev;

blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
- wbt_enable_default(q);
+ wbt_enable_default(disk);
blk_throtl_register(disk);

/* Now everything is ready and send out KOBJ_ADD uevent */
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index afb1782b4255e..8d4f075f13e2f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -637,8 +637,9 @@ void wbt_set_write_cache(struct request_queue *q, bool write_cache_on)
/*
* Enable wbt if defaults are configured that way
*/
-void wbt_enable_default(struct request_queue *q)
+void wbt_enable_default(struct gendisk *disk)
{
+ struct request_queue *q = disk->queue;
struct rq_qos *rqos;
bool disable_flag = q->elevator &&
test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags);
@@ -705,9 +706,9 @@ static void wbt_exit(struct rq_qos *rqos)
/*
* Disable wbt, if enabled by default.
*/
-void wbt_disable_default(struct request_queue *q)
+void wbt_disable_default(struct gendisk *disk)
{
- struct rq_qos *rqos = wbt_rq_qos(q);
+ struct rq_qos *rqos = wbt_rq_qos(disk->queue);
struct rq_wb *rwb;
if (!rqos)
return;
diff --git a/block/blk-wbt.h b/block/blk-wbt.h
index 7e44eccc676dd..58c226fe33d48 100644
--- a/block/blk-wbt.h
+++ b/block/blk-wbt.h
@@ -89,8 +89,8 @@ static inline unsigned int wbt_inflight(struct rq_wb *rwb)
#ifdef CONFIG_BLK_WBT

int wbt_init(struct request_queue *);
-void wbt_disable_default(struct request_queue *);
-void wbt_enable_default(struct request_queue *);
+void wbt_disable_default(struct gendisk *disk);
+void wbt_enable_default(struct gendisk *disk);

u64 wbt_get_min_lat(struct request_queue *q);
void wbt_set_min_lat(struct request_queue *q, u64 val);
@@ -105,10 +105,10 @@ static inline int wbt_init(struct request_queue *q)
{
return -EINVAL;
}
-static inline void wbt_disable_default(struct request_queue *q)
+static inline void wbt_disable_default(struct gendisk *disk)
{
}
-static inline void wbt_enable_default(struct request_queue *q)
+static inline void wbt_enable_default(struct gendisk *disk)
{
}
static inline void wbt_set_write_cache(struct request_queue *q, bool wc)
--
2.43.0


2024-03-13 17:25:18

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 42/71] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit

From: Friedrich Vock <[email protected]>

[ Upstream commit 7330256268664ea0a7dd5b07a3fed363093477dd ]

Allows us to detect subsequent IH ring buffer overflows as well.

Cc: Joshua Ashton <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: [email protected]
Signed-off-by: Friedrich Vock <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/cz_ih.c | 5 +++++
drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 5 +++++
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/si_ih.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 6 ++++++
9 files changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index df385ffc97683..6578ca1b90afa 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -204,6 +204,12 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
tmp = RREG32(mmIH_RB_CNTL);
tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
WREG32(mmIH_RB_CNTL, tmp);
+
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
+ WREG32(mmIH_RB_CNTL, tmp);
}
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
index b8c47e0cf37ad..c19681492efa7 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
@@ -216,6 +216,11 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev,
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32(mmIH_RB_CNTL, tmp);

+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32(mmIH_RB_CNTL, tmp);

out:
return (wptr & ih->ptr_mask);
diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
index aecad530b10a6..2c02ae69883d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
@@ -215,6 +215,11 @@ static u32 iceland_ih_get_wptr(struct amdgpu_device *adev,
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32(mmIH_RB_CNTL, tmp);

+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32(mmIH_RB_CNTL, tmp);

out:
return (wptr & ih->ptr_mask);
diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
index 7cd79a3844b24..657e4ca6f9dd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
@@ -417,6 +417,12 @@ static u32 ih_v6_0_get_wptr(struct amdgpu_device *adev,
tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
out:
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index eec13cb5bf758..84e8e8b008ef6 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -442,6 +442,12 @@ static u32 navi10_ih_get_wptr(struct amdgpu_device *adev,
tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
out:
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c b/drivers/gpu/drm/amd/amdgpu/si_ih.c
index 9a24f17a57502..cada9f300a7f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c
@@ -119,6 +119,12 @@ static u32 si_ih_get_wptr(struct amdgpu_device *adev,
tmp = RREG32(IH_RB_CNTL);
tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
WREG32(IH_RB_CNTL, tmp);
+
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
+ WREG32(IH_RB_CNTL, tmp);
}
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
index b08905d1c00f0..07a5d95be07f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
@@ -219,6 +219,12 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device *adev,
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32(mmIH_RB_CNTL, tmp);

+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32(mmIH_RB_CNTL, tmp);
+
out:
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 1e83db0c5438d..74c94df423455 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -373,6 +373,12 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);

+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
out:
return (wptr & ih->ptr_mask);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 59dfca093155c..f1ba76c35cd6e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -424,6 +424,12 @@ static u32 vega20_ih_get_wptr(struct amdgpu_device *adev,
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);

+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+ * can be detected.
+ */
+ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+ WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
out:
return (wptr & ih->ptr_mask);
}
--
2.43.0


2024-03-13 17:25:43

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 40/71] xhci: process isoc TD properly when there was a transaction error mid TD.

From: Mathias Nyman <[email protected]>

[ Upstream commit 5372c65e1311a16351ef03dd096ff576e6477674 ]

The last TRB of a isoc TD might not trigger an event if there was
an error event for a TRB mid TD. This is seen on a NEC Corporation
uPD720200 USB 3.0 Host

After an error mid a multi-TRB TD the xHC should according to xhci 4.9.1
generate events for passed TRBs with IOC flag set if it proceeds to the
next TD. This event is either a copy of the original error, or a
"success" transfer event.

If that event is missing then the driver and xHC host get out of sync as
the driver is still expecting a transfer event for that first TD, while
xHC host is already sending events for the next TD in the list.
This leads to
"Transfer event TRB DMA ptr not part of current TD" messages.

As a solution we tag the isoc TDs that get error events mid TD.
If an event doesn't match the first TD, then check if the tag is
set, and event points to the next TD.
In that case give back the fist TD and process the next TD normally

Make sure TD status and transferred length stay valid in both cases
with and without final TD completion event.

Reported-by: Michał Pecio <[email protected]>
Closes: https://lore.kernel.org/linux-usb/20240112235205.1259f60c@foxbook/
Tested-by: Michał Pecio <[email protected]>
Cc: [email protected]
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/usb/host/xhci-ring.c | 74 +++++++++++++++++++++++++++++-------
drivers/usb/host/xhci.h | 1 +
2 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1239e06dfe411..e4441a71368e5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2363,6 +2363,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
/* handle completion code */
switch (trb_comp_code) {
case COMP_SUCCESS:
+ /* Don't overwrite status if TD had an error, see xHCI 4.9.1 */
+ if (td->error_mid_td)
+ break;
if (remaining) {
frame->status = short_framestatus;
if (xhci->quirks & XHCI_TRUST_TX_LENGTH)
@@ -2388,8 +2391,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
break;
case COMP_USB_TRANSACTION_ERROR:
frame->status = -EPROTO;
+ sum_trbs_for_length = true;
if (ep_trb != td->last_trb)
- return 0;
+ td->error_mid_td = true;
break;
case COMP_STOPPED:
sum_trbs_for_length = true;
@@ -2409,6 +2413,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
break;
}

+ if (td->urb_length_set)
+ goto finish_td;
+
if (sum_trbs_for_length)
frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
ep_trb_len - remaining;
@@ -2417,6 +2424,14 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,

td->urb->actual_length += frame->actual_length;

+finish_td:
+ /* Don't give back TD yet if we encountered an error mid TD */
+ if (td->error_mid_td && ep_trb != td->last_trb) {
+ xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
+ td->urb_length_set = true;
+ return 0;
+ }
+
return finish_td(xhci, ep, ep_ring, td, trb_comp_code);
}

@@ -2801,17 +2816,51 @@ static int handle_tx_event(struct xhci_hcd *xhci,
}

if (!ep_seg) {
- if (!ep->skip ||
- !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
- /* Some host controllers give a spurious
- * successful event after a short transfer.
- * Ignore it.
- */
- if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
- ep_ring->last_td_was_short) {
- ep_ring->last_td_was_short = false;
- goto cleanup;
+
+ if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+ skip_isoc_td(xhci, td, ep, status);
+ goto cleanup;
+ }
+
+ /*
+ * Some hosts give a spurious success event after a short
+ * transfer. Ignore it.
+ */
+ if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+ ep_ring->last_td_was_short) {
+ ep_ring->last_td_was_short = false;
+ goto cleanup;
+ }
+
+ /*
+ * xhci 4.10.2 states isoc endpoints should continue
+ * processing the next TD if there was an error mid TD.
+ * So host like NEC don't generate an event for the last
+ * isoc TRB even if the IOC flag is set.
+ * xhci 4.9.1 states that if there are errors in mult-TRB
+ * TDs xHC should generate an error for that TRB, and if xHC
+ * proceeds to the next TD it should genete an event for
+ * any TRB with IOC flag on the way. Other host follow this.
+ * So this event might be for the next TD.
+ */
+ if (td->error_mid_td &&
+ !list_is_last(&td->td_list, &ep_ring->td_list)) {
+ struct xhci_td *td_next = list_next_entry(td, td_list);
+
+ ep_seg = trb_in_td(xhci, td_next->start_seg, td_next->first_trb,
+ td_next->last_trb, ep_trb_dma, false);
+ if (ep_seg) {
+ /* give back previous TD, start handling new */
+ xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+ ep_ring->dequeue = td->last_trb;
+ ep_ring->deq_seg = td->last_trb_seg;
+ inc_deq(xhci, ep_ring);
+ xhci_td_cleanup(xhci, td, ep_ring, td->status);
+ td = td_next;
}
+ }
+
+ if (!ep_seg) {
/* HC is busted, give up! */
xhci_err(xhci,
"ERROR Transfer event TRB DMA ptr not "
@@ -2823,9 +2872,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
ep_trb_dma, true);
return -ESHUTDOWN;
}
-
- skip_isoc_td(xhci, td, ep, status);
- goto cleanup;
}
if (trb_comp_code == COMP_SHORT_PACKET)
ep_ring->last_td_was_short = true;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 1354310cb37b1..fc25a5b09710c 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1570,6 +1570,7 @@ struct xhci_td {
struct xhci_segment *bounce_seg;
/* actual_length of the URB has already been set */
bool urb_length_set;
+ bool error_mid_td;
unsigned int num_trbs;
};

--
2.43.0


2024-03-13 17:26:07

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 07/71] geneve: make sure to pull inner header in geneve_rx()

From: Eric Dumazet <[email protected]>

[ Upstream commit 1ca1ba465e55b9460e4e75dec9fff31e708fec74 ]

syzbot triggered a bug in geneve_rx() [1]

Issue is similar to the one I fixed in commit 8d975c15c0cd
("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")

We have to save skb->network_header in a temporary variable
in order to be able to recompute the network_header pointer
after a pskb_inet_may_pull() call.

pskb_inet_may_pull() makes sure the needed headers are in skb->head.

[1]
BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline]
BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
geneve_rx drivers/net/geneve.c:279 [inline]
geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108
udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186
udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346
__udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422
udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604
ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:461 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5534 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
process_backlog+0x480/0x8b0 net/core/dev.c:5976
__napi_poll+0xe3/0x980 net/core/dev.c:6576
napi_poll net/core/dev.c:6645 [inline]
net_rx_action+0x8b8/0x1870 net/core/dev.c:6778
__do_softirq+0x1b7/0x7c5 kernel/softirq.c:553
do_softirq+0x9a/0xf0 kernel/softirq.c:454
__local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
__dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
slab_post_alloc_hook mm/slub.c:3819 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
__alloc_skb+0x352/0x790 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1296 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Reported-and-tested-by: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/geneve.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index f393e454f45ca..3f8da6f0b25ce 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -221,7 +221,7 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
struct genevehdr *gnvh = geneve_hdr(skb);
struct metadata_dst *tun_dst = NULL;
unsigned int len;
- int err = 0;
+ int nh, err = 0;
void *oiph;

if (ip_tunnel_collect_metadata() || gs->collect_md) {
@@ -272,9 +272,23 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
skb->pkt_type = PACKET_HOST;
}

- oiph = skb_network_header(skb);
+ /* Save offset of outer header relative to skb->head,
+ * because we are going to reset the network header to the inner header
+ * and might change skb->head.
+ */
+ nh = skb_network_header(skb) - skb->head;
+
skb_reset_network_header(skb);

+ if (!pskb_inet_may_pull(skb)) {
+ DEV_STATS_INC(geneve->dev, rx_length_errors);
+ DEV_STATS_INC(geneve->dev, rx_errors);
+ goto drop;
+ }
+
+ /* Get the outer header. */
+ oiph = skb->head + nh;
+
if (geneve_get_sk_family(gs) == AF_INET)
err = IP_ECN_decapsulate(oiph, skb);
#if IS_ENABLED(CONFIG_IPV6)
--
2.43.0


2024-03-13 17:26:11

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 11/71] igc: avoid returning frame twice in XDP_REDIRECT

From: Florian Kauer <[email protected]>

[ Upstream commit ef27f655b438bed4c83680e4f01e1cde2739854b ]

When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.

However, this is the responsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.

In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.

There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
transmitted and release them inside igc_xdp_xmit.
While it might work technically, it is not what
the return value is meant to represent (i.e. the
number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
support non-consecutively dropped packets.
Besides being complex, it likely has a negative
performance impact without a significant gain
since it is anyway unlikely that the next frame
can be transmitted if the previous one was dropped.

The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds. It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.

#!/bin/bash
INTERFACE=enp4s0
INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`

sudo ip link add dev veth1 type veth peer name veth2
sudo ip link set up $INTERFACE
sudo ip link set up veth1
sudo ip link set up veth2

cat << EOF > redirect.bpf.c

SEC("prog")
int redirect(struct xdp_md *ctx)
{
return bpf_redirect($INTERFACE_IDX, 0);
}

char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
sudo ip link set veth2 xdp obj redirect.bpf.o

cat << EOF > pass.bpf.c

SEC("prog")
int pass(struct xdp_md *ctx)
{
return XDP_PASS;
}

char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
sudo ip link set $INTERFACE xdp obj pass.bpf.o

cat << EOF > trafgen.cfg

{
/* Ethernet Header */
0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
const16(ETH_P_IP),

/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum

/* UDP Header */
10, 0, 1, 1, # IP Src - adapt as needed
10, 0, 1, 2, # IP Dest - adapt as needed
const16(6666), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum

/* Payload */
fill('W', 1000),
}
EOF

sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp

Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 4b6f882b380dc..e052f49cc08d7 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6330,7 +6330,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
int cpu = smp_processor_id();
struct netdev_queue *nq;
struct igc_ring *ring;
- int i, drops;
+ int i, nxmit;

if (unlikely(!netif_carrier_ok(dev)))
return -ENETDOWN;
@@ -6346,16 +6346,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
/* Avoid transmit queue timeout since we share it with the slow path */
txq_trans_cond_update(nq);

- drops = 0;
+ nxmit = 0;
for (i = 0; i < num_frames; i++) {
int err;
struct xdp_frame *xdpf = frames[i];

err = igc_xdp_init_tx_descriptor(ring, xdpf);
- if (err) {
- xdp_return_frame_rx_napi(xdpf);
- drops++;
- }
+ if (err)
+ break;
+ nxmit++;
}

if (flags & XDP_XMIT_FLUSH)
@@ -6363,7 +6362,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,

__netif_tx_unlock(nq);

- return num_frames - drops;
+ return nxmit;
}

static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
--
2.43.0


2024-03-13 17:26:24

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 43/71] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set

From: Pawan Gupta <[email protected]>

commit e95df4ec0c0c9791941f112db699fae794b9862a upstream.

Currently MMIO Stale Data mitigation for CPUs not affected by MDS/TAA is
to only deploy VERW at VMentry by enabling mmio_stale_data_clear static
branch. No mitigation is needed for kernel->user transitions. If such
CPUs are also affected by RFDS, its mitigation may set
X86_FEATURE_CLEAR_CPU_BUF to deploy VERW at kernel->user and VMentry.
This could result in duplicate VERW at VMentry.

Fix this by disabling mmio_stale_data_clear static branch when
X86_FEATURE_CLEAR_CPU_BUF is enabled.

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d1895930e6eb8..c66f6eb40afb1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -421,6 +421,13 @@ static void __init mmio_select_mitigation(void)
if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
boot_cpu_has(X86_FEATURE_RTM)))
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
+
+ /*
+ * X86_FEATURE_CLEAR_CPU_BUF could be enabled by other VERW based
+ * mitigations, disable KVM-only mitigation in that case.
+ */
+ if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
+ static_branch_disable(&mmio_stale_data_clear);
else
static_branch_enable(&mmio_stale_data_clear);

@@ -497,8 +504,11 @@ static void __init md_clear_update_mitigation(void)
taa_mitigation = TAA_MITIGATION_VERW;
taa_select_mitigation();
}
- if (mmio_mitigation == MMIO_MITIGATION_OFF &&
- boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
+ /*
+ * MMIO_MITIGATION_OFF is not checked here so that mmio_stale_data_clear
+ * gets updated correctly as per X86_FEATURE_CLEAR_CPU_BUF state.
+ */
+ if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
mmio_mitigation = MMIO_MITIGATION_VERW;
mmio_select_mitigation();
}
--
2.43.0


2024-03-13 17:28:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 62/71] getrusage: add the "signal_struct *sig" local variable

From: Oleg Nesterov <[email protected]>

[ Upstream commit c7ac8231ace9b07306d0299969e42073b189c70a ]

No functional changes, cleanup/preparation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: daa694e41375 ("getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()")
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/sys.c | 37 +++++++++++++++++++------------------
1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index c85e1abf7b7c7..177155ba50cd3 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1779,6 +1779,7 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
u64 tgutime, tgstime, utime, stime;
unsigned long maxrss = 0;
+ struct signal_struct *sig = p->signal;

memset((char *)r, 0, sizeof (*r));
utime = stime = 0;
@@ -1786,7 +1787,7 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
if (who == RUSAGE_THREAD) {
task_cputime_adjusted(current, &utime, &stime);
accumulate_thread_rusage(p, r);
- maxrss = p->signal->maxrss;
+ maxrss = sig->maxrss;
goto out;
}

@@ -1796,15 +1797,15 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
switch (who) {
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
- utime = p->signal->cutime;
- stime = p->signal->cstime;
- r->ru_nvcsw = p->signal->cnvcsw;
- r->ru_nivcsw = p->signal->cnivcsw;
- r->ru_minflt = p->signal->cmin_flt;
- r->ru_majflt = p->signal->cmaj_flt;
- r->ru_inblock = p->signal->cinblock;
- r->ru_oublock = p->signal->coublock;
- maxrss = p->signal->cmaxrss;
+ utime = sig->cutime;
+ stime = sig->cstime;
+ r->ru_nvcsw = sig->cnvcsw;
+ r->ru_nivcsw = sig->cnivcsw;
+ r->ru_minflt = sig->cmin_flt;
+ r->ru_majflt = sig->cmaj_flt;
+ r->ru_inblock = sig->cinblock;
+ r->ru_oublock = sig->coublock;
+ maxrss = sig->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1814,14 +1815,14 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
thread_group_cputime_adjusted(p, &tgutime, &tgstime);
utime += tgutime;
stime += tgstime;
- r->ru_nvcsw += p->signal->nvcsw;
- r->ru_nivcsw += p->signal->nivcsw;
- r->ru_minflt += p->signal->min_flt;
- r->ru_majflt += p->signal->maj_flt;
- r->ru_inblock += p->signal->inblock;
- r->ru_oublock += p->signal->oublock;
- if (maxrss < p->signal->maxrss)
- maxrss = p->signal->maxrss;
+ r->ru_nvcsw += sig->nvcsw;
+ r->ru_nivcsw += sig->nivcsw;
+ r->ru_minflt += sig->min_flt;
+ r->ru_majflt += sig->maj_flt;
+ r->ru_inblock += sig->inblock;
+ r->ru_oublock += sig->oublock;
+ if (maxrss < sig->maxrss)
+ maxrss = sig->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
--
2.43.0


2024-03-13 17:28:29

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 63/71] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()

From: Oleg Nesterov <[email protected]>

[ Upstream commit daa694e4137571b4ebec330f9a9b4d54aa8b8089 ]

Patch series "getrusage: use sig->stats_lock", v2.

This patch (of 2):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock. With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Dylan Hatch <[email protected]>
Tested-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/sys.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 177155ba50cd3..2646047fe5513 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1778,17 +1778,19 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
struct task_struct *t;
unsigned long flags;
u64 tgutime, tgstime, utime, stime;
- unsigned long maxrss = 0;
+ unsigned long maxrss;
+ struct mm_struct *mm;
struct signal_struct *sig = p->signal;

- memset((char *)r, 0, sizeof (*r));
+ memset(r, 0, sizeof(*r));
utime = stime = 0;
+ maxrss = 0;

if (who == RUSAGE_THREAD) {
task_cputime_adjusted(current, &utime, &stime);
accumulate_thread_rusage(p, r);
maxrss = sig->maxrss;
- goto out;
+ goto out_thread;
}

if (!lock_task_sighand(p, &flags))
@@ -1812,9 +1814,6 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
fallthrough;

case RUSAGE_SELF:
- thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- utime += tgutime;
- stime += tgstime;
r->ru_nvcsw += sig->nvcsw;
r->ru_nivcsw += sig->nivcsw;
r->ru_minflt += sig->min_flt;
@@ -1834,19 +1833,24 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
}
unlock_task_sighand(p, &flags);

-out:
- r->ru_utime = ns_to_kernel_old_timeval(utime);
- r->ru_stime = ns_to_kernel_old_timeval(stime);
+ if (who == RUSAGE_CHILDREN)
+ goto out_children;

- if (who != RUSAGE_CHILDREN) {
- struct mm_struct *mm = get_task_mm(p);
+ thread_group_cputime_adjusted(p, &tgutime, &tgstime);
+ utime += tgutime;
+ stime += tgstime;

- if (mm) {
- setmax_mm_hiwater_rss(&maxrss, mm);
- mmput(mm);
- }
+out_thread:
+ mm = get_task_mm(p);
+ if (mm) {
+ setmax_mm_hiwater_rss(&maxrss, mm);
+ mmput(mm);
}
+
+out_children:
r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
+ r->ru_utime = ns_to_kernel_old_timeval(utime);
+ r->ru_stime = ns_to_kernel_old_timeval(stime);
}

SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru)
--
2.43.0


2024-03-13 17:29:09

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 10/71] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()

From: Rand Deeb <[email protected]>

[ Upstream commit 06e456a05d669ca30b224b8ed962421770c1496c ]

The function ice_bridge_setlink() may encounter a NULL pointer dereference
if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
in nla_for_each_nested(). To address this issue, add a check to ensure that
br_spec is not NULL before proceeding with the nested attribute iteration.

Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Signed-off-by: Rand Deeb <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index ab46cfca4028d..3117f65253b37 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7681,6 +7681,8 @@ ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
pf_sw = pf->first_sw;
/* find the attribute in the netlink message */
br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+ if (!br_spec)
+ return -EINVAL;

nla_for_each_nested(attr, br_spec, rem) {
__u16 mode;
--
2.43.0


2024-03-13 17:29:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH 6.1 65/71] getrusage: use sig->stats_lock rather than lock_task_sighand()

From: Oleg Nesterov <[email protected]>

[ Upstream commit f7ec1cd5cc7ef3ad964b677ba82b8b77f1c93009 ]

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change getrusage() to use sig->stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.

TODO:
- Change do_task_stat() to use sig->stats_lock too, then we can
remove spin_lock_irq(siglock) in wait_task_zombie().

- Turn sig->stats_lock into seqcount_rwlock_t, this way the
readers in the slow mode won't exclude each other. See
https://lore.kernel.org/all/[email protected]/

- stats_lock has to disable irqs because ->siglock can be taken
in irq context, it would be very nice to change __exit_signal()
to avoid the siglock->stats_lock dependency.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Dylan Hatch <[email protected]>
Tested-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/sys.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 04102538cf43f..d06eda1387b69 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1781,7 +1781,9 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long maxrss;
struct mm_struct *mm;
struct signal_struct *sig = p->signal;
+ unsigned int seq = 0;

+retry:
memset(r, 0, sizeof(*r));
utime = stime = 0;
maxrss = 0;
@@ -1793,8 +1795,7 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
goto out_thread;
}

- if (!lock_task_sighand(p, &flags))
- return;
+ flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);

switch (who) {
case RUSAGE_BOTH:
@@ -1822,14 +1823,23 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_oublock += sig->oublock;
if (maxrss < sig->maxrss)
maxrss = sig->maxrss;
+
+ rcu_read_lock();
__for_each_thread(sig, t)
accumulate_thread_rusage(t, r);
+ rcu_read_unlock();
+
break;

default:
BUG();
}
- unlock_task_sighand(p, &flags);
+
+ if (need_seqretry(&sig->stats_lock, seq)) {
+ seq = 1;
+ goto retry;
+ }
+ done_seqretry_irqrestore(&sig->stats_lock, seq, flags);

if (who == RUSAGE_CHILDREN)
goto out_children;
--
2.43.0


2024-03-13 20:20:35

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
>
Hello,

Kernel hangs during early boot. No console messages, nothing in pstore.

Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.

This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set

CONFIG_MITIGATION_RFDS=n

in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).

Greetings,

Mateusz


2024-03-13 21:34:18

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

W dniu 13.03.2024 o 21:13, Mateusz Jończyk pisze:
> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>> This is the start of the stable review cycle for the 6.1.82 release.
>> There are 71 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>> or in the git tree and branch at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>> and the diffstat can be found below.
>>
>> Thanks,
>> Sasha
>>
> Hello,
>
> Kernel hangs during early boot. No console messages, nothing in pstore.
>
> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>
> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>
> CONFIG_MITIGATION_RFDS=n
>
> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>
> Greetings,
>
> Mateusz
>
I have tested with the following patches reverted:

                [PATCH 6.1 43/71] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
                [PATCH 6.1 44/71] Documentation/hw-vuln: Add documentation for RFDS
                [PATCH 6.1 45/71] x86/rfds: Mitigate Register File Data Sampling (RFDS)
                [PATCH 6.1 46/71] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests

and the kernel still doesn't boot.

I have realized that in Kconfig, I have CONFIG_DEBUG_PREEMPT enabled for some weird reason.
I alsa have CONFIG_EARLY_PRINTK_USB_XDBC=y, which may cause error messages not to
appear on screen (according to Kconfig description). Will test with both options disabled.

Greetings,

Mateusz


2024-03-14 14:44:53

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Wed, 13 Mar 2024 at 22:10, Sasha Levin <[email protected]> wrote:
>
>
> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 6.1.82-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-6.1.y
* git commit: 27d7d4053a11de9e6fbdbc6b6cc3f8f5937e4f2b
* git describe: v6.1.81-71-g27d7d4053a11
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.81-71-g27d7d4053a11

## Test Regressions (compared to v6.1.81)

## Metric Regressions (compared to v6.1.81)

## Test Fixes (compared to v6.1.81)

## Metric Fixes (compared to v6.1.81)

## Test result summary
total: 134217, pass: 114741, fail: 1989, skip: 17348, xfail: 139

## Build Summary
* arc: 5 total, 5 passed, 0 failed
* arm: 139 total, 139 passed, 0 failed
* arm64: 41 total, 41 passed, 0 failed
* i386: 31 total, 31 passed, 0 failed
* mips: 26 total, 26 passed, 0 failed
* parisc: 4 total, 4 passed, 0 failed
* powerpc: 36 total, 34 passed, 2 failed
* riscv: 11 total, 11 passed, 0 failed
* s390: 16 total, 16 passed, 0 failed
* sh: 10 total, 10 passed, 0 failed
* sparc: 8 total, 8 passed, 0 failed
* x86_64: 37 total, 36 passed, 1 failed

## Test suites summary
* boot
* kselftest-android
* kselftest-arm64
* kselftest-breakpoints
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-drivers-dma-buf
* kselftest-efivarfs
* kselftest-exec
* kselftest-filesystems
* kselftest-filesystems-binderfs
* kselftest-filesystems-epoll
* kselftest-firmware
* kselftest-fpu
* kselftest-ftrace
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mm
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-net-forwarding
* kselftest-net-mptcp
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-user_events
* kselftest-vDSO
* kselftest-watchdog
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libgpiod
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-smoketest
* ltp-syscalls
* ltp-tracing
* perf
* rcutorture

--
Linaro LKFT
https://lkft.linaro.org

2024-03-14 20:46:01

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On 3/13/24 09:38, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on
BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian


2024-03-14 21:13:06

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

W dniu 13.03.2024 o 22:27, Mateusz Jończyk pisze:
> W dniu 13.03.2024 o 21:13, Mateusz Jończyk pisze:
>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>> This is the start of the stable review cycle for the 6.1.82 release.
>>> There are 71 patches in this series, all will be posted as a response
>>> to this one. If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>> Anything received after that time might be too late.
>>>
>>> The whole patch series can be found in one patch at:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>> or in the git tree and branch at:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>> and the diffstat can be found below.
>>>
>>> Thanks,
>>> Sasha
>>>
>> Hello,
>>
>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>
>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>
>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>
>> CONFIG_MITIGATION_RFDS=n
>>
>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>
>> Greetings,
>>
>> Mateusz
>>
> [snip]

Bisected down to

commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
Author: Christoph Hellwig <[email protected]>
Date:   Fri Feb 3 16:03:54 2023 +0100

    blk-rq-qos: make rq_qos_add and rq_qos_del more useful
    
    [ Upstream commit ce57b558604e68277d31ca5ce49ec4579a8618c5 ]
    
    Switch to passing a gendisk, and make rq_qos_add initialize all required
    fields and drop the not required q argument from rq_qos_del.
    
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Andreas Herrmann <[email protected]>
    Acked-by: Tejun Heo <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Stable-dep-of: f814bdda774c ("blk-wbt: Fix detection of dirty-throttled tasks")
    Signed-off-by: Sasha Levin <[email protected]>

 block/blk-iocost.c    | 13 +++----------
 block/blk-iolatency.c | 14 ++++----------
 block/blk-rq-qos.c    | 13 ++++++++++---
 block/blk-rq-qos.h    |  5 +++--
 block/blk-wbt.c       |  5 +----
 5 files changed, 21 insertions(+), 29 deletions(-)

Greetings,

Mateusz


2024-03-14 22:05:13

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On 3/14/24 3:12 PM, Mateusz Jończyk wrote:
> W dniu 13.03.2024 o 22:27, Mateusz Jończyk pisze:
>> W dniu 13.03.2024 o 21:13, Mateusz Jończyk pisze:
>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>> There are 71 patches in this series, all will be posted as a response
>>>> to this one. If anyone has any issues with these being applied, please
>>>> let me know.
>>>>
>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>> Anything received after that time might be too late.
>>>>
>>>> The whole patch series can be found in one patch at:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>> or in the git tree and branch at:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>> and the diffstat can be found below.
>>>>
>>>> Thanks,
>>>> Sasha
>>>>
>>> Hello,
>>>
>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>
>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>
>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>
>>> CONFIG_MITIGATION_RFDS=n
>>>
>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>
>>> Greetings,
>>>
>>> Mateusz
>>>
>> [snip]
>
> Bisected down to
>
> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
> Author: Christoph Hellwig <[email protected]>
> Date:   Fri Feb 3 16:03:54 2023 +0100
>
>     blk-rq-qos: make rq_qos_add and rq_qos_del more useful

Do you have:

commit e33b93650fc5364f773985a3e961e24349330d97
Author: Breno Leitao <[email protected]>
Date: Tue Feb 28 03:16:54 2023 -0800

blk-iocost: Pass gendisk to ioc_refresh_params

in there?

--
Jens Axboe



2024-03-14 22:36:01

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>On 3/14/24 3:12 PM, Mateusz Jończyk wrote:
>> W dniu 13.03.2024 o 22:27, Mateusz Jończyk pisze:
>>> W dniu 13.03.2024 o 21:13, Mateusz Jończyk pisze:
>>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>>> There are 71 patches in this series, all will be posted as a response
>>>>> to this one. If anyone has any issues with these being applied, please
>>>>> let me know.
>>>>>
>>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>> Anything received after that time might be too late.
>>>>>
>>>>> The whole patch series can be found in one patch at:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>> or in the git tree and branch at:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>> and the diffstat can be found below.
>>>>>
>>>>> Thanks,
>>>>> Sasha
>>>>>
>>>> Hello,
>>>>
>>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>
>>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>
>>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>
>>>> CONFIG_MITIGATION_RFDS=n
>>>>
>>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>
>>>> Greetings,
>>>>
>>>> Mateusz
>>>>
>>> [snip]
>>
>> Bisected down to
>>
>> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>> Author: Christoph Hellwig <[email protected]>
>> Date:   Fri Feb 3 16:03:54 2023 +0100
>>
>>     blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>
>Do you have:
>
>commit e33b93650fc5364f773985a3e961e24349330d97
>Author: Breno Leitao <[email protected]>
>Date: Tue Feb 28 03:16:54 2023 -0800
>
> blk-iocost: Pass gendisk to ioc_refresh_params
>
>in there?

It's not in the 6.1 tree, do we need it?

--
Thanks,
Sasha

2024-03-14 22:41:06

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On 3/14/24 4:35 PM, Sasha Levin wrote:
> On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>> On 3/14/24 3:12 PM, Mateusz Jo?czyk wrote:
>>> W dniu 13.03.2024 o 22:27, Mateusz Jo?czyk pisze:
>>>> W dniu 13.03.2024 o 21:13, Mateusz Jo?czyk pisze:
>>>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>>>> There are 71 patches in this series, all will be posted as a response
>>>>>> to this one. If anyone has any issues with these being applied, please
>>>>>> let me know.
>>>>>>
>>>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>>> Anything received after that time might be too late.
>>>>>>
>>>>>> The whole patch series can be found in one patch at:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>>> or in the git tree and branch at:
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>>> and the diffstat can be found below.
>>>>>>
>>>>>> Thanks,
>>>>>> Sasha
>>>>>>
>>>>> Hello,
>>>>>
>>>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>>
>>>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>>
>>>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>>
>>>>> CONFIG_MITIGATION_RFDS=n
>>>>>
>>>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Mateusz
>>>>>
>>>> [snip]
>>>
>>> Bisected down to
>>>
>>> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>>> Author: Christoph Hellwig <[email protected]>
>>> Date: Fri Feb 3 16:03:54 2023 +0100
>>>
>>> blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>>
>> Do you have:
>>
>> commit e33b93650fc5364f773985a3e961e24349330d97
>> Author: Breno Leitao <[email protected]>
>> Date: Tue Feb 28 03:16:54 2023 -0800
>>
>> blk-iocost: Pass gendisk to ioc_refresh_params
>>
>> in there?
>
> It's not in the 6.1 tree, do we need it?

If the bisected commit is in there, then yes we need it. It's marked as
fixes that, so puzzled why it isn't in there?

--
Jens Axboe


2024-03-13 20:05:16

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

Hi!

> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

CIP testing did not find any problems here:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6.1.y

Tested-by: Pavel Machek (CIP) <[email protected]>

It would be good to quote sha1 of the release, so that we could check
it against our build system.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Attachments:
(No filename) (761.00 B)
signature.asc (201.00 B)
Download all attachments

2024-03-15 12:15:58

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Thu, Mar 14, 2024 at 04:40:53PM -0600, Jens Axboe wrote:
>On 3/14/24 4:35 PM, Sasha Levin wrote:
>> On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>>> On 3/14/24 3:12 PM, Mateusz Jo?czyk wrote:
>>>> W dniu 13.03.2024 o 22:27, Mateusz Jo?czyk pisze:
>>>>> W dniu 13.03.2024 o 21:13, Mateusz Jo?czyk pisze:
>>>>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>>>>> There are 71 patches in this series, all will be posted as a response
>>>>>>> to this one. If anyone has any issues with these being applied, please
>>>>>>> let me know.
>>>>>>>
>>>>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>>>> Anything received after that time might be too late.
>>>>>>>
>>>>>>> The whole patch series can be found in one patch at:
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>>>> or in the git tree and branch at:
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>>>> and the diffstat can be found below.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sasha
>>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>>>
>>>>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>>>
>>>>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>>>
>>>>>> CONFIG_MITIGATION_RFDS=n
>>>>>>
>>>>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> Mateusz
>>>>>>
>>>>> [snip]
>>>>
>>>> Bisected down to
>>>>
>>>> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>>>> Author: Christoph Hellwig <[email protected]>
>>>> Date: Fri Feb 3 16:03:54 2023 +0100
>>>>
>>>> blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>>>
>>> Do you have:
>>>
>>> commit e33b93650fc5364f773985a3e961e24349330d97
>>> Author: Breno Leitao <[email protected]>
>>> Date: Tue Feb 28 03:16:54 2023 -0800
>>>
>>> blk-iocost: Pass gendisk to ioc_refresh_params
>>>
>>> in there?
>>
>> It's not in the 6.1 tree, do we need it?
>
>If the bisected commit is in there, then yes we need it. It's marked as
>fixes that, so puzzled why it isn't in there?

Sorry, I take it back - both e33b93650fc5 ("blk-iocost: Pass gendisk to
ioc_refresh_params") and d3d517a95e83 ("blk-rq-qos: make rq_qos_add and
rq_qos_del more useful") are currently in the 6.1 tree.

--
Thanks,
Sasha

2024-03-15 14:43:05

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Fri, Mar 15, 2024 at 08:14:57AM -0400, Sasha Levin wrote:
>On Thu, Mar 14, 2024 at 04:40:53PM -0600, Jens Axboe wrote:
>>On 3/14/24 4:35 PM, Sasha Levin wrote:
>>>On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>>>>On 3/14/24 3:12 PM, Mateusz Jo?czyk wrote:
>>>>>W dniu 13.03.2024 o 22:27, Mateusz Jo?czyk pisze:
>>>>>>W dniu 13.03.2024 o 21:13, Mateusz Jo?czyk pisze:
>>>>>>>W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>>>>>This is the start of the stable review cycle for the 6.1.82 release.
>>>>>>>>There are 71 patches in this series, all will be posted as a response
>>>>>>>>to this one. If anyone has any issues with these being applied, please
>>>>>>>>let me know.
>>>>>>>>
>>>>>>>>Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>>>>>Anything received after that time might be too late.
>>>>>>>>
>>>>>>>>The whole patch series can be found in one patch at:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>>>>>or in the git tree and branch at:
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>>>>>and the diffstat can be found below.
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>Sasha
>>>>>>>>
>>>>>>>Hello,
>>>>>>>
>>>>>>>Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>>>>
>>>>>>>Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>>>>
>>>>>>>This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>>>>
>>>>>>>CONFIG_MITIGATION_RFDS=n
>>>>>>>
>>>>>>>in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>>>>
>>>>>>>Greetings,
>>>>>>>
>>>>>>>Mateusz
>>>>>>>
>>>>>>[snip]
>>>>>
>>>>>Bisected down to
>>>>>
>>>>>commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>>>>>Author: Christoph Hellwig <[email protected]>
>>>>>Date: Fri Feb 3 16:03:54 2023 +0100
>>>>>
>>>>> blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>>>>
>>>>Do you have:
>>>>
>>>>commit e33b93650fc5364f773985a3e961e24349330d97
>>>>Author: Breno Leitao <[email protected]>
>>>>Date: Tue Feb 28 03:16:54 2023 -0800
>>>>
>>>> blk-iocost: Pass gendisk to ioc_refresh_params
>>>>
>>>>in there?
>>>
>>>It's not in the 6.1 tree, do we need it?
>>
>>If the bisected commit is in there, then yes we need it. It's marked as
>>fixes that, so puzzled why it isn't in there?
>
>Sorry, I take it back - both e33b93650fc5 ("blk-iocost: Pass gendisk to
>ioc_refresh_params") and d3d517a95e83 ("blk-rq-qos: make rq_qos_add and
>rq_qos_del more useful") are currently in the 6.1 tree.

I'll go ahead and drop the backport of f814bdda774c ("blk-wbt: Fix
detection of dirty-throttled tasks") as well as the dependencies (which
is where this issue bisected to), and all follow-up fixes.

We can revisit this for the next release.

--
Thanks,
Sasha

2024-03-15 14:51:03

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On 3/15/24 8:42 AM, Sasha Levin wrote:
> On Fri, Mar 15, 2024 at 08:14:57AM -0400, Sasha Levin wrote:
>> On Thu, Mar 14, 2024 at 04:40:53PM -0600, Jens Axboe wrote:
>>> On 3/14/24 4:35 PM, Sasha Levin wrote:
>>>> On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>>>>> On 3/14/24 3:12 PM, Mateusz Jo?czyk wrote:
>>>>>> W dniu 13.03.2024 o 22:27, Mateusz Jo?czyk pisze:
>>>>>>> W dniu 13.03.2024 o 21:13, Mateusz Jo?czyk pisze:
>>>>>>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>>>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>>>>>>> There are 71 patches in this series, all will be posted as a response
>>>>>>>>> to this one. If anyone has any issues with these being applied, please
>>>>>>>>> let me know.
>>>>>>>>>
>>>>>>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>>>>>> Anything received after that time might be too late.
>>>>>>>>>
>>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>>>>>> or in the git tree and branch at:
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>>>>>> and the diffstat can be found below.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sasha
>>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>>>>>
>>>>>>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>>>>>
>>>>>>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>>>>>
>>>>>>>> CONFIG_MITIGATION_RFDS=n
>>>>>>>>
>>>>>>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>>>>>
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Mateusz
>>>>>>>>
>>>>>>> [snip]
>>>>>>
>>>>>> Bisected down to
>>>>>>
>>>>>> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>>>>>> Author: Christoph Hellwig <[email protected]>
>>>>>> Date: Fri Feb 3 16:03:54 2023 +0100
>>>>>>
>>>>>> blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>>>>>
>>>>> Do you have:
>>>>>
>>>>> commit e33b93650fc5364f773985a3e961e24349330d97
>>>>> Author: Breno Leitao <[email protected]>
>>>>> Date: Tue Feb 28 03:16:54 2023 -0800
>>>>>
>>>>> blk-iocost: Pass gendisk to ioc_refresh_params
>>>>>
>>>>> in there?
>>>>
>>>> It's not in the 6.1 tree, do we need it?
>>>
>>> If the bisected commit is in there, then yes we need it. It's marked as
>>> fixes that, so puzzled why it isn't in there?
>>
>> Sorry, I take it back - both e33b93650fc5 ("blk-iocost: Pass gendisk to
>> ioc_refresh_params") and d3d517a95e83 ("blk-rq-qos: make rq_qos_add and
>> rq_qos_del more useful") are currently in the 6.1 tree.

I didn't see e33b93650fc5 in there, but maybe it was part of the series
that this is about.

> I'll go ahead and drop the backport of f814bdda774c ("blk-wbt: Fix
> detection of dirty-throttled tasks") as well as the dependencies (which
> is where this issue bisected to), and all follow-up fixes.
>
> We can revisit this for the next release.

Sounds reasonable.

--
Jens Axboe


2024-03-15 15:35:22

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Wed, Mar 13, 2024 at 12:38:46PM -0400, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

Tested-by: Mark Brown <[email protected]>


Attachments:
(No filename) (348.00 B)
signature.asc (499.00 B)
Download all attachments

2024-03-15 19:33:02

by Ron Economos

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On 3/15/24 7:49 AM, Jens Axboe wrote:
> On 3/15/24 8:42 AM, Sasha Levin wrote:
>> On Fri, Mar 15, 2024 at 08:14:57AM -0400, Sasha Levin wrote:
>>> On Thu, Mar 14, 2024 at 04:40:53PM -0600, Jens Axboe wrote:
>>>> On 3/14/24 4:35 PM, Sasha Levin wrote:
>>>>> On Thu, Mar 14, 2024 at 04:04:59PM -0600, Jens Axboe wrote:
>>>>>> On 3/14/24 3:12 PM, Mateusz Jo?czyk wrote:
>>>>>>> W dniu 13.03.2024 o 22:27, Mateusz Jo?czyk pisze:
>>>>>>>> W dniu 13.03.2024 o 21:13, Mateusz Jo?czyk pisze:
>>>>>>>>> W dniu 13.03.2024 o 17:38, Sasha Levin pisze:
>>>>>>>>>> This is the start of the stable review cycle for the 6.1.82 release.
>>>>>>>>>> There are 71 patches in this series, all will be posted as a response
>>>>>>>>>> to this one. If anyone has any issues with these being applied, please
>>>>>>>>>> let me know.
>>>>>>>>>>
>>>>>>>>>> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
>>>>>>>>>> Anything received after that time might be too late.
>>>>>>>>>>
>>>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
>>>>>>>>>> or in the git tree and branch at:
>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
>>>>>>>>>> and the diffstat can be found below.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sasha
>>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Kernel hangs during early boot. No console messages, nothing in pstore.
>>>>>>>>>
>>>>>>>>> Tested on a HP 17-by0001nw laptop with an Intel Kaby Lake CPU (Intel i3-7020U) and Ubuntu 20.04.
>>>>>>>>>
>>>>>>>>> This CPU is not affected by RFDS (at least according to the Kconfig message), so I have set
>>>>>>>>>
>>>>>>>>> CONFIG_MITIGATION_RFDS=n
>>>>>>>>>
>>>>>>>>> in Kconfig. I do not have any updated microcode (if any will be provided at all for this CPU).
>>>>>>>>>
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> Mateusz
>>>>>>>>>
>>>>>>>> [snip]
>>>>>>> Bisected down to
>>>>>>>
>>>>>>> commit d3d517a95e83a7d89e1ff511da1a0a31c9234155
>>>>>>> Author: Christoph Hellwig <[email protected]>
>>>>>>> Date: Fri Feb 3 16:03:54 2023 +0100
>>>>>>>
>>>>>>> blk-rq-qos: make rq_qos_add and rq_qos_del more useful
>>>>>> Do you have:
>>>>>>
>>>>>> commit e33b93650fc5364f773985a3e961e24349330d97
>>>>>> Author: Breno Leitao <[email protected]>
>>>>>> Date: Tue Feb 28 03:16:54 2023 -0800
>>>>>>
>>>>>> blk-iocost: Pass gendisk to ioc_refresh_params
>>>>>>
>>>>>> in there?
>>>>> It's not in the 6.1 tree, do we need it?
>>>> If the bisected commit is in there, then yes we need it. It's marked as
>>>> fixes that, so puzzled why it isn't in there?
>>> Sorry, I take it back - both e33b93650fc5 ("blk-iocost: Pass gendisk to
>>> ioc_refresh_params") and d3d517a95e83 ("blk-rq-qos: make rq_qos_add and
>>> rq_qos_del more useful") are currently in the 6.1 tree.
> I didn't see e33b93650fc5 in there, but maybe it was part of the series
> that this is about.
>
>> I'll go ahead and drop the backport of f814bdda774c ("blk-wbt: Fix
>> detection of dirty-throttled tasks") as well as the dependencies (which
>> is where this issue bisected to), and all follow-up fixes.
>>
>> We can revisit this for the next release.
> Sounds reasonable.
>
Seeing this on RISC-V also. Here's the oops.

[    2.030135] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000050
[    2.038233] Oops [#1]
[    2.040420] Modules linked in:
[    2.043461] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.82-rc1 #2
[    2.049717] Hardware name: SiFive HiFive Unmatched A00 (DT)
[    2.055276] epc : wbt_queue_depth_changed+0x18/0xb6
[    2.060138]  ra : wbt_init+0x118/0x198
[    2.063871] epc : ffffffff8050e942 ra : ffffffff8050f16e sp :
ffffffd880087c80
[    2.071087]  gp : ffffffff81a3e3a8 tp : ffffffd87ffe8d40 t0 :
ffffffdbfed53668
[    2.078294]  t1 : 0000000000000000 t2 : 0000000000000000 s0 :
ffffffd880087ca0
[    2.085507]  s1 : ffffffd8800fa458 a0 : ffffffd8800fa458 a1 :
ffffffff813442a8
[    2.092714]  a2 : ffffffff81b148e0 a3 : 00000000001e8480 a4 :
0000000000000000
[    2.099923]  a5 : 0000000000000000 a6 : 0000000000000000 a7 :
0000000000000000
[    2.107131]  s2 : ffffffd88086a800 s3 : ffffffd880870000 s4 :
ffffffd8808702c0
[    2.114340]  s5 : ffffffd880870088 s6 : ffffffd8808702a0 s7 :
ffffffff80e6d580
[    2.121550]  s8 : 0000000000000008 s9 : ffffffff80c00106 s10:
0000000000000000
[    2.128759]  s11: 0000000000000000 t3 : 0000000000000000 t4 :
0000000000000000
[    2.135967]  t5 : 0000000000000000 t6 : 0000000000000000
[    2.141264] status: 0000000200000120 badaddr: 0000000000000050 cause:
000000000000000d
[    2.149171] [<ffffffff8050e942>] wbt_queue_depth_changed+0x18/0xb6
[    2.155337] [<ffffffff8050f16e>] wbt_init+0x118/0x198
[    2.160371] [<ffffffff8050f25a>] wbt_enable_default+0x6c/0x90
[    2.166104] [<ffffffff804d7822>] blk_register_queue+0x17c/0x1b2
[    2.172012] [<ffffffff804ec420>] device_add_disk+0x1f6/0x36c
[    2.177657] [<ffffffff806c3e7a>] loop_add+0x2a0/0x31a
[    2.182696] [<ffffffff80c36452>] loop_init+0x10c/0x138
[    2.187819] [<ffffffff8000293e>] do_one_initcall+0x5a/0x1e2
[    2.193381] [<ffffffff80c018c2>] kernel_init_freeable+0x28c/0x308
[    2.199460] [<ffffffff80af7e0c>] kernel_init+0x32/0x16e
[    2.204669] [<ffffffff80003ed4>] ret_from_exception+0x0/0x16
[    2.210372] ---[ end trace 0000000000000000 ]---
[    2.214956] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    2.222572] SMP: stopping secondary CPUs
[    2.226493] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---


2024-03-15 10:37:46

by Shreeya Patel

[permalink] [raw]
Subject: Re: [PATCH 6.1 00/71] 6.1.82-rc1 review

On Wednesday, March 13, 2024 22:08 IST, Sasha Levin <[email protected]> wrote:

>
> This is the start of the stable review cycle for the 6.1.82 release.
> There are 71 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:39:56 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.1.y&id2=v6.1.81
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>

KernelCI report for stable-rc/linux-6.1.y report for this week :-

## stable-rc HEAD for linux-6.1.y:
Date: 2024-03-13
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=27d7d4053a11de9e6fbdbc6b6cc3f8f5937e4f2b

## Build failures:
No build failures seen for the stable-rc/linux-6.1.y commit head \o/

## Boot failures:
No **new** boot failures seen for the stable-rc/linux-6.1.y commit head \o/

Tested-by: kernelci.org bot <[email protected]>

Thanks,
Shreeya Patel