Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> Linux version 6.5.0-rc1+ - up to 831fe284d8275987596b7d640518dddba5735f61
>
> [ 7.312665] ------------[ cut here ]------------
> [ 7.312671] WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453 __alloc_pages+0x329/0x340
> [ 7.312683] Modules linked in: exfat ntfs3 bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic ecc iwlmvm mac80211 libarc4 snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio cfg80211 snd_hda_intel snd_hda_codec snd_hwdep intel_rapl_msr x86_pkg_temp_thermal intel_powerclamp snd_hda_core tpm_tis intel_rapl_common snd_intel_dspcfg tpm_tis_core idma64 rfkill tpm_crb tpm rng_core pkcs8_key_parser fuse dmi_sysfs
> [ 7.312725] CPU: 8 PID: 659 Comm: kworker/8:2 Not tainted 6.5.0-rc1 #1
> [ 7.312729] Hardware name: Intel(R) Client Systems NUC12WSKi7/NUC12WSBi7, BIOS WSADL357.0087.2023.0306.1931 03/06/2023
> [ 7.312731] Workqueue: events cfg80211_wiphy_work [cfg80211]
> [ 7.312785] RIP: 0010:__alloc_pages+0x329/0x340
> [ 7.312791] Code: a8 44 89 e6 89 df c6 45 c8 00 4c 89 6d b0 41 89 de e8 0b ef ff ff 49 89 c7 e9 90 fe ff ff 80 e3 3f eb c0 c6 05 69 49 f7 09 01 <0f> 0b eb 98 e8 1e ec 6d 01 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> [ 7.312794] RSP: 0018:ffffad8e80defbf8 EFLAGS: 00010246
> [ 7.312798] RAX: 0000000000000000 RBX: 0000000000040dc0 RCX: 0000000000000000
> [ 7.312800] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000040dc0
> [ 7.312802] RBP: ffffad8e80defc50 R08: 0000000000000005 R09: 0000000000000018
> [ 7.312804] R10: ffff938899d4a800 R11: ffff938891a9c800 R12: 0000000000000034
> [ 7.312806] R13: 0000000000000000 R14: ffffffffc02b150d R15: fffffffffffffc90
> [ 7.312808] FS: 0000000000000000(0000) GS:ffff938fb6600000(0000) knlGS:0000000000000000
> [ 7.312811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7.312813] CR2: 00007f540c94b710 CR3: 000000059502c000 CR4: 0000000000f50ea0
> [ 7.312815] PKRU: 55555554
> [ 7.312817] Call Trace:
> [ 7.312819] <TASK>
> [ 7.312822] ? show_regs+0x69/0x80
> [ 7.312827] ? __warn+0x89/0x140
> [ 7.312833] ? __alloc_pages+0x329/0x340
> [ 7.312837] ? report_bug+0x15d/0x180
> [ 7.312843] ? handle_bug+0x42/0x80
> [ 7.312848] ? exc_invalid_op+0x1c/0x70
> [ 7.312852] ? asm_exc_invalid_op+0x1f/0x30
> [ 7.312857] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> [ 7.312911] ? __alloc_pages+0x329/0x340
> [ 7.312916] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> [ 7.312968] __kmalloc_large_node+0x7e/0x170
> [ 7.312973] __kmalloc+0xb9/0x120
> [ 7.312976] cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> [ 7.313028] ? sched_clock_noinstr+0xd/0x20
> [ 7.313034] ? sched_clock_cpu+0x14/0x190
> [ 7.313040] ? raw_spin_rq_lock_nested+0x12/0x20
> [ 7.313046] ___cfg80211_scan_done+0x1e0/0x250 [cfg80211]
> [ 7.313099] __cfg80211_scan_done+0x23/0x30 [cfg80211]
> [ 7.313153] cfg80211_wiphy_work+0xae/0xd0 [cfg80211]
> [ 7.313201] process_one_work+0x1f1/0x3e0
> [ 7.313204] worker_thread+0x51/0x3f0
> [ 7.313207] ? _raw_spin_unlock_irqrestore+0x26/0x30
> [ 7.313212] ? __pfx_worker_thread+0x10/0x10
> [ 7.313217] kthread+0xdb/0x110
> [ 7.313222] ? __pfx_kthread+0x10/0x10
> [ 7.313226] ret_from_fork+0x3e/0x60
> [ 7.313229] ? __pfx_kthread+0x10/0x10
> [ 7.313233] ret_from_fork_asm+0x1b/0x30
> [ 7.313237] </TASK>
> [ 7.313239] ---[ end trace 0000000000000000 ]---
See Bugzilla for the full thread.
Anyway, I'm adding it to regzbot to make sure it doesn't fall through
cracks unnoticed:
#regzbot introduced: v6.5-rc1..831fe284d82759 https://bugzilla.kernel.org/show_bug.cgi?id=217675
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217675
--
An old man doll... just what I always wanted! - Clara
On Sun, Jul 16, 2023 at 06:10:44PM +0700, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
Maybe you could try doing some work on this bug before just spamming
people with it?
if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
return NULL;
This is the page allocator telling the caller that they've asked for an
unreasonably large allocation.
Now, this bug is actually interesting to the MM because the caller
called kmalloc() with a ridiculous size. Arguable kmalloc should
protect callers from themselves (alloc_pages() is more low level
and can presume its users know what they're doing).
Vlastimil, what do you think? Something like ...
+++ b/mm/slab_common.c
@@ -1119,6 +1119,8 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
void *ptr = NULL;
unsigned int order = get_order(size);
+ if (order > MAX_ORDER)
+ return NULL;
if (unlikely(flags & GFP_SLAB_BUG_MASK))
flags = kmalloc_fix_flags(flags);
Hi Sanjaya,
Can you please check if the attached patch fixes your issue?
Thanks in advance,
Ilan.
> -----Original Message-----
> From: Bagas Sanjaya <[email protected]>
> Sent: Sunday, July 16, 2023 14:11
> To: Rudi Heitbaum <[email protected]>; Johannes Berg
> <[email protected]>; David S. Miller <[email protected]>; Eric
> Dumazet <[email protected]>; Jakub Kicinski <[email protected]>;
> Paolo Abeni <[email protected]>; Kirill A. Shutemov
> <[email protected]>; Michael Ellerman
> <[email protected]>; Andrew Morton <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>; Linux
> Regressions <[email protected]>; Linux Memory Management List
> <[email protected]>; Linux Networking <[email protected]>; Linux
> Wireless <[email protected]>
> Subject: Fwd: mm/page_alloc.c:4453 with cfg80211_wiphy_work [cfg80211]
>
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > Linux version 6.5.0-rc1+ - up to
> > 831fe284d8275987596b7d640518dddba5735f61
> >
> > [ 7.312665] ------------[ cut here ]------------
> > [ 7.312671] WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453
> __alloc_pages+0x329/0x340
> > [ 7.312683] Modules linked in: exfat ntfs3 bnep btusb btrtl btbcm btintel
> btmtk bluetooth ecdh_generic ecc iwlmvm mac80211 libarc4
> snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio cfg80211 snd_hda_intel
> snd_hda_codec snd_hwdep intel_rapl_msr x86_pkg_temp_thermal
> intel_powerclamp snd_hda_core tpm_tis intel_rapl_common
> snd_intel_dspcfg tpm_tis_core idma64 rfkill tpm_crb tpm rng_core
> pkcs8_key_parser fuse dmi_sysfs
> > [ 7.312725] CPU: 8 PID: 659 Comm: kworker/8:2 Not tainted 6.5.0-rc1 #1
> > [ 7.312729] Hardware name: Intel(R) Client Systems
> NUC12WSKi7/NUC12WSBi7, BIOS WSADL357.0087.2023.0306.1931 03/06/2023
> > [ 7.312731] Workqueue: events cfg80211_wiphy_work [cfg80211]
> > [ 7.312785] RIP: 0010:__alloc_pages+0x329/0x340
> > [ 7.312791] Code: a8 44 89 e6 89 df c6 45 c8 00 4c 89 6d b0 41 89 de e8 0b ef
> ff ff 49 89 c7 e9 90 fe ff ff 80 e3 3f eb c0 c6 05 69 49 f7 09 01 <0f> 0b eb 98 e8
> 1e ec 6d 01 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> > [ 7.312794] RSP: 0018:ffffad8e80defbf8 EFLAGS: 00010246
> > [ 7.312798] RAX: 0000000000000000 RBX: 0000000000040dc0 RCX:
> 0000000000000000
> > [ 7.312800] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000040dc0
> > [ 7.312802] RBP: ffffad8e80defc50 R08: 0000000000000005 R09:
> 0000000000000018
> > [ 7.312804] R10: ffff938899d4a800 R11: ffff938891a9c800 R12:
> 0000000000000034
> > [ 7.312806] R13: 0000000000000000 R14: ffffffffc02b150d R15:
> fffffffffffffc90
> > [ 7.312808] FS: 0000000000000000(0000) GS:ffff938fb6600000(0000)
> knlGS:0000000000000000
> > [ 7.312811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7.312813] CR2: 00007f540c94b710 CR3: 000000059502c000 CR4:
> 0000000000f50ea0
> > [ 7.312815] PKRU: 55555554
> > [ 7.312817] Call Trace:
> > [ 7.312819] <TASK>
> > [ 7.312822] ? show_regs+0x69/0x80
> > [ 7.312827] ? __warn+0x89/0x140
> > [ 7.312833] ? __alloc_pages+0x329/0x340
> > [ 7.312837] ? report_bug+0x15d/0x180
> > [ 7.312843] ? handle_bug+0x42/0x80
> > [ 7.312848] ? exc_invalid_op+0x1c/0x70
> > [ 7.312852] ? asm_exc_invalid_op+0x1f/0x30
> > [ 7.312857] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > [ 7.312911] ? __alloc_pages+0x329/0x340
> > [ 7.312916] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > [ 7.312968] __kmalloc_large_node+0x7e/0x170
> > [ 7.312973] __kmalloc+0xb9/0x120
> > [ 7.312976] cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > [ 7.313028] ? sched_clock_noinstr+0xd/0x20
> > [ 7.313034] ? sched_clock_cpu+0x14/0x190
> > [ 7.313040] ? raw_spin_rq_lock_nested+0x12/0x20
> > [ 7.313046] ___cfg80211_scan_done+0x1e0/0x250 [cfg80211]
> > [ 7.313099] __cfg80211_scan_done+0x23/0x30 [cfg80211]
> > [ 7.313153] cfg80211_wiphy_work+0xae/0xd0 [cfg80211]
> > [ 7.313201] process_one_work+0x1f1/0x3e0
> > [ 7.313204] worker_thread+0x51/0x3f0
> > [ 7.313207] ? _raw_spin_unlock_irqrestore+0x26/0x30
> > [ 7.313212] ? __pfx_worker_thread+0x10/0x10
> > [ 7.313217] kthread+0xdb/0x110
> > [ 7.313222] ? __pfx_kthread+0x10/0x10
> > [ 7.313226] ret_from_fork+0x3e/0x60
> > [ 7.313229] ? __pfx_kthread+0x10/0x10
> > [ 7.313233] ret_from_fork_asm+0x1b/0x30
> > [ 7.313237] </TASK>
> > [ 7.313239] ---[ end trace 0000000000000000 ]---
>
> See Bugzilla for the full thread.
>
> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
> unnoticed:
>
> #regzbot introduced: v6.5-rc1..831fe284d82759
> https://bugzilla.kernel.org/show_bug.cgi?id=217675
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217675
>
> --
> An old man doll... just what I always wanted! - Clara
On 7/16/23 13:28, Matthew Wilcox wrote:
> On Sun, Jul 16, 2023 at 06:10:44PM +0700, Bagas Sanjaya wrote:
>> Hi,
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> Maybe you could try doing some work on this bug before just spamming
> people with it?
>
> if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
> return NULL;
>
> This is the page allocator telling the caller that they've asked for an
> unreasonably large allocation.
>
> Now, this bug is actually interesting to the MM because the caller
> called kmalloc() with a ridiculous size. Arguable kmalloc should
> protect callers from themselves (alloc_pages() is more low level
> and can presume its users know what they're doing).
>
> Vlastimil, what do you think? Something like ...
Hmm should be more robust to check size > KMALLOC_MAX_SIZE before even doing
get_order(size). Ultimately it checks the same limit.
But I'm unsure about just returning NULL. I think warn_on_once might be
useful even there - in case a bug is introduced/exposed, even a
inexperienced user will be easily able to report sufficient information wich
a WARN and its stacktrace, even if the callsite's alloc check doesn't
provide it in an obvious way?
> +++ b/mm/slab_common.c
> @@ -1119,6 +1119,8 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
> void *ptr = NULL;
> unsigned int order = get_order(size);
>
> + if (order > MAX_ORDER)
> + return NULL;
> if (unlikely(flags & GFP_SLAB_BUG_MASK))
> flags = kmalloc_fix_flags(flags);
>
>
>
> From 5a6fcc4b2a91608917697f8446ec3e10b2ce9178 Mon Sep 17 00:00:00 2001
> From: Ilan Peer <[email protected]>
> Date: Tue, 18 Jul 2023 00:33:04 +0300
> Subject: [PATCH] wifi: cfg80211: Fix return value in scan logic
>
> As cfg80211_parse_colocated_ap() is not expected to return a negative
> value return 0 and not a negative value if cfg80211_calc_short_ssid()
> fails.
>
> This bug was introduced in commit c8cb5b854b40f
> ('nl80211/cfg80211: support 6 GHz scanning').
>
> Signed-off-by: Ilan Peer <[email protected]>
> ---
> net/wireless/scan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/wireless/scan.c b/net/wireless/scan.c
>index 8bf00caf5d29..0cf1ce7b6934 100644
>--- a/net/wireless/scan.c
>+++ b/net/wireless/scan.c
>@@ -657,7 +657,7 @@ static int cfg80211_parse_colocated_ap(const struct cfg80211_bss_ies *ies,
>
> ret = cfg80211_calc_short_ssid(ies, &ssid_elem, &s_ssid_tmp);
> if (ret)
>- return ret;
>+ return 0;
>
> for_each_element_id(elem, WLAN_EID_REDUCED_NEIGHBOR_REPORT,
> ies->data, ies->len) {
>--
>2.25.1
Hi Ilan,
I can confirm that this fixes the error in my dmesg.
Regards
Rudi
Tested-by: Rudi Heitbaum <[email protected]>
On Mon, Jul 17, 2023 at 01:41:43PM +0000, Peer, Ilan wrote:
> Hi Sanjaya,
>
> Can you please check if the attached patch fixes your issue?
>
> Thanks in advance,
>
> Ilan.
>
> > -----Original Message-----
> > From: Bagas Sanjaya <[email protected]>
> > Sent: Sunday, July 16, 2023 14:11
> > To: Rudi Heitbaum <[email protected]>; Johannes Berg
> > <[email protected]>; David S. Miller <[email protected]>; Eric
> > Dumazet <[email protected]>; Jakub Kicinski <[email protected]>;
> > Paolo Abeni <[email protected]>; Kirill A. Shutemov
> > <[email protected]>; Michael Ellerman
> > <[email protected]>; Andrew Morton <[email protected]>
> > Cc: Linux Kernel Mailing List <[email protected]>; Linux
> > Regressions <[email protected]>; Linux Memory Management List
> > <[email protected]>; Linux Networking <[email protected]>; Linux
> > Wireless <[email protected]>
> > Subject: Fwd: mm/page_alloc.c:4453 with cfg80211_wiphy_work [cfg80211]
> >
> > Hi,
> >
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> > > Linux version 6.5.0-rc1+ - up to
> > > 831fe284d8275987596b7d640518dddba5735f61
> > >
> > > [ 7.312665] ------------[ cut here ]------------
> > > [ 7.312671] WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453
> > __alloc_pages+0x329/0x340
> > > [ 7.312683] Modules linked in: exfat ntfs3 bnep btusb btrtl btbcm btintel
> > btmtk bluetooth ecdh_generic ecc iwlmvm mac80211 libarc4
> > snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio cfg80211 snd_hda_intel
> > snd_hda_codec snd_hwdep intel_rapl_msr x86_pkg_temp_thermal
> > intel_powerclamp snd_hda_core tpm_tis intel_rapl_common
> > snd_intel_dspcfg tpm_tis_core idma64 rfkill tpm_crb tpm rng_core
> > pkcs8_key_parser fuse dmi_sysfs
> > > [ 7.312725] CPU: 8 PID: 659 Comm: kworker/8:2 Not tainted 6.5.0-rc1 #1
> > > [ 7.312729] Hardware name: Intel(R) Client Systems
> > NUC12WSKi7/NUC12WSBi7, BIOS WSADL357.0087.2023.0306.1931 03/06/2023
> > > [ 7.312731] Workqueue: events cfg80211_wiphy_work [cfg80211]
> > > [ 7.312785] RIP: 0010:__alloc_pages+0x329/0x340
> > > [ 7.312791] Code: a8 44 89 e6 89 df c6 45 c8 00 4c 89 6d b0 41 89 de e8 0b ef
> > ff ff 49 89 c7 e9 90 fe ff ff 80 e3 3f eb c0 c6 05 69 49 f7 09 01 <0f> 0b eb 98 e8
> > 1e ec 6d 01 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> > > [ 7.312794] RSP: 0018:ffffad8e80defbf8 EFLAGS: 00010246
> > > [ 7.312798] RAX: 0000000000000000 RBX: 0000000000040dc0 RCX:
> > 0000000000000000
> > > [ 7.312800] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 0000000000040dc0
> > > [ 7.312802] RBP: ffffad8e80defc50 R08: 0000000000000005 R09:
> > 0000000000000018
> > > [ 7.312804] R10: ffff938899d4a800 R11: ffff938891a9c800 R12:
> > 0000000000000034
> > > [ 7.312806] R13: 0000000000000000 R14: ffffffffc02b150d R15:
> > fffffffffffffc90
> > > [ 7.312808] FS: 0000000000000000(0000) GS:ffff938fb6600000(0000)
> > knlGS:0000000000000000
> > > [ 7.312811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 7.312813] CR2: 00007f540c94b710 CR3: 000000059502c000 CR4:
> > 0000000000f50ea0
> > > [ 7.312815] PKRU: 55555554
> > > [ 7.312817] Call Trace:
> > > [ 7.312819] <TASK>
> > > [ 7.312822] ? show_regs+0x69/0x80
> > > [ 7.312827] ? __warn+0x89/0x140
> > > [ 7.312833] ? __alloc_pages+0x329/0x340
> > > [ 7.312837] ? report_bug+0x15d/0x180
> > > [ 7.312843] ? handle_bug+0x42/0x80
> > > [ 7.312848] ? exc_invalid_op+0x1c/0x70
> > > [ 7.312852] ? asm_exc_invalid_op+0x1f/0x30
> > > [ 7.312857] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > > [ 7.312911] ? __alloc_pages+0x329/0x340
> > > [ 7.312916] ? cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > > [ 7.312968] __kmalloc_large_node+0x7e/0x170
> > > [ 7.312973] __kmalloc+0xb9/0x120
> > > [ 7.312976] cfg80211_scan_6ghz+0x12d/0xcb0 [cfg80211]
> > > [ 7.313028] ? sched_clock_noinstr+0xd/0x20
> > > [ 7.313034] ? sched_clock_cpu+0x14/0x190
> > > [ 7.313040] ? raw_spin_rq_lock_nested+0x12/0x20
> > > [ 7.313046] ___cfg80211_scan_done+0x1e0/0x250 [cfg80211]
> > > [ 7.313099] __cfg80211_scan_done+0x23/0x30 [cfg80211]
> > > [ 7.313153] cfg80211_wiphy_work+0xae/0xd0 [cfg80211]
> > > [ 7.313201] process_one_work+0x1f1/0x3e0
> > > [ 7.313204] worker_thread+0x51/0x3f0
> > > [ 7.313207] ? _raw_spin_unlock_irqrestore+0x26/0x30
> > > [ 7.313212] ? __pfx_worker_thread+0x10/0x10
> > > [ 7.313217] kthread+0xdb/0x110
> > > [ 7.313222] ? __pfx_kthread+0x10/0x10
> > > [ 7.313226] ret_from_fork+0x3e/0x60
> > > [ 7.313229] ? __pfx_kthread+0x10/0x10
> > > [ 7.313233] ret_from_fork_asm+0x1b/0x30
> > > [ 7.313237] </TASK>
> > > [ 7.313239] ---[ end trace 0000000000000000 ]---
> >
> > See Bugzilla for the full thread.
> >
> > Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
> > unnoticed:
> >
> > #regzbot introduced: v6.5-rc1..831fe284d82759
> > https://bugzilla.kernel.org/show_bug.cgi?id=217675
> >
> > Thanks.
> >
> > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217675
> >
> > --
> > An old man doll... just what I always wanted! - Clara
Hi,
> Hi Ilan,
>
> I can confirm that this fixes the error in my dmesg.
>
> Regards
> Rudi
>
Thanks for the update. I'll push a patch to the wireless mailing list.
Regards,
Ilan.