2022-12-15 14:49:55

by Mikhail Gavrilov

[permalink] [raw]
Subject: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

Hi,
The kernel 6.2 preparation cycle has begun and yesterday after the
kernel was updated on my Fedora Rawhide all audio devices disappeared.

The backtrace of the issue looks like:
[ 133.033269] page:00000000e4a2c44b refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x207490
[ 133.033353] head:00000000e4a2c44b order:2 compound_mapcount:0
subpages_mapcount:0 compound_pincount:0
[ 133.033360] flags: 0x17ffffc0010000(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 133.033369] raw: 0017ffffc0010000 0000000000000000 dead000000000122
0000000000000000
[ 133.033376] raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
[ 133.033381] page dumped because: VM_BUG_ON_PAGE(PageCompound(page))
[ 133.033392] ------------[ cut here ]------------
[ 133.033397] kernel BUG at mm/page_alloc.c:3592!
[ 133.033406] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 133.033410] CPU: 22 PID: 1673 Comm: wireplumber Tainted: G W
L ------- ---
6.2.0-0.rc0.20221214gite2ca6ba6ba01.3.fc38.x86_64 #1
[ 133.033415] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[ 133.033417] RIP: 0010:split_page+0xa2/0x160
[ 133.033425] Code: 00 48 83 c7 40 48 39 d7 75 d7 0f 1f 44 00 00 89
ee 48 89 df 5b 5d e9 2d fe 06 00 48 c7 c6 d8 ca 9a 95 48 89 df e8 8e
77 fc ff <0f> 0b 48 89 f8 f7 c7 ff 0f 00 00 0f 85 7a ff ff ff 48 8b 17
f7 c2
[ 133.033428] RSP: 0018:ffff9f5645177b98 EFLAGS: 00010286
[ 133.033432] RAX: 0000000000000037 RBX: ffffeb89c81d2400 RCX: 0000000000000000
[ 133.033435] RDX: 0000000000000001 RSI: ffffffff959f0673 RDI: 00000000ffffffff
[ 133.033438] RBP: 0000000000000002 R08: 0000000000000000 R09: ffff9f5645177a08
[ 133.033440] R10: 0000000000000003 R11: ffff8d032e2fffe8 R12: 0000000000000007
[ 133.033442] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000001
[ 133.033445] FS: 00007f7e55702800(0000) GS:ffff8d02e8200000(0000)
knlGS:0000000000000000
[ 133.033448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.033450] CR2: 00007f7e556cb000 CR3: 00000001f604e000 CR4: 0000000000350ee0
[ 133.033453] Call Trace:
[ 133.033455] <TASK>
[ 133.033458] __iommu_dma_alloc_noncontiguous.constprop.0+0x2de/0x3e0
[ 133.033468] ? rcu_read_lock_sched_held+0x3f/0x80
[ 133.033475] iommu_dma_alloc_noncontiguous+0x66/0xb0
[ 133.033481] dma_alloc_noncontiguous+0x54/0x1a0
[ 133.033489] snd_dma_noncontig_alloc+0x25/0x120 [snd_pcm]
[ 133.033505] snd_dma_sg_wc_alloc+0x13/0xb0 [snd_pcm]
[ 133.033519] snd_dma_alloc_dir_pages+0x50/0x90 [snd_pcm]
[ 133.033532] do_alloc_pages+0x49/0xa0 [snd_pcm]
[ 133.033546] snd_pcm_lib_malloc_pages+0xf1/0x1e0 [snd_pcm]
[ 133.033560] snd_pcm_hw_params+0x57f/0x620 [snd_pcm]
[ 133.033576] snd_pcm_common_ioctl+0x1e4/0x12a0 [snd_pcm]
[ 133.033595] snd_pcm_ioctl+0x23/0x40 [snd_pcm]
[ 133.033607] __x64_sys_ioctl+0x90/0xd0
[ 133.033613] do_syscall_64+0x5b/0x80
[ 133.033618] ? do_syscall_64+0x67/0x80
[ 133.033622] ? lockdep_hardirqs_on+0x7d/0x100
[ 133.033627] ? do_syscall_64+0x67/0x80
[ 133.033630] ? do_syscall_64+0x67/0x80
[ 133.033633] ? do_syscall_64+0x67/0x80
[ 133.033636] ? do_syscall_64+0x67/0x80
[ 133.033640] ? lockdep_hardirqs_on+0x7d/0x100
[ 133.033644] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 133.033648] RIP: 0033:0x7f7e55b5f65f
[ 133.033671] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[ 133.033674] RSP: 002b:00007ffd24c51ec0 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 133.033678] RAX: ffffffffffffffda RBX: 00007ffd24c520f0 RCX: 00007f7e55b5f65f
[ 133.033681] RDX: 00007ffd24c520f0 RSI: 00000000c2604111 RDI: 0000000000000023
[ 133.033683] RBP: 0000556c04c4ff60 R08: 0000000000000000 R09: 0000000000000000
[ 133.033685] R10: 0000000000000004 R11: 0000000000000246 R12: 0000556c04c4fee0
[ 133.033688] R13: 00007ffd24c52360 R14: 00007ffd24c527b0 R15: 00007ffd24c520f0
[ 133.033696] </TASK>
[ 133.033698] Modules linked in: snd_seq_dummy snd_hrtimer
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc iwlmvm hid_logitech_hidpp btusb btrtl btbcm
snd_seq_midi snd_seq_midi_event btintel btmtk snd_usb_audio bluetooth
snd_usbmidi_lib iwlwifi xpad snd_rawmidi ff_memless mc intel_rapl_msr
joydev intel_rapl_common snd_hda_codec_realtek edac_mce_amd
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u snd_hda_intel kvm_amd
snd_intel_dspcfg mt76x2_common snd_intel_sdw_acpi mt76x02_usb
snd_hda_codec asus_ec_sensors mt76_usb kvm vfat snd_hda_core fat
mt76x02_lib snd_hwdep eeepc_wmi mt76 snd_seq asus_wmi ledtrig_audio
snd_seq_device irqbypass sparse_keymap snd_pcm rapl platform_profile
wmi_bmof pcspkr snd_timer mac80211 k10temp snd i2c_piix4 soundcore
libarc4 acpi_cpufreq
[ 133.033777] cfg80211 hid_logitech_dj rfkill zram amdgpu
drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul video crc32c_intel
polyval_clmulni iommu_v2 gpu_sched polyval_generic drm_buddy nvme
ucsi_ccg drm_display_helper typec_ucsi ghash_clmulni_intel ccp igb
sha512_ssse3 typec nvme_core sp5100_tco cec dca nvme_common wmi
ip6_tables ip_tables fuse
[ 133.033832] ---[ end trace 0000000000000000 ]---

I bisected problematic commit and find this:
ffcb754584603adf7039d7972564fbf6febdc542 is the first bad commit
commit ffcb754584603adf7039d7972564fbf6febdc542
Author: Christoph Hellwig <[email protected]>
Date: Wed Nov 9 08:37:17 2022 +0100

dma-mapping: reject __GFP_COMP in dma_alloc_attrs

DMA allocations can never be turned back into a page pointer, so
requesting compound pages doesn't make sense and it can't even be
supported at all by various backends.

Reject __GFP_COMP with a warning in dma_alloc_attrs, and stop clearing
the flag in the arm dma ops and dma-iommu.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Marek Szyprowski <[email protected]>

arch/arm/mm/dma-mapping.c | 17 -----------------
drivers/iommu/dma-iommu.c | 3 ---
kernel/dma/mapping.c | 8 ++++++++
3 files changed, 8 insertions(+), 20 deletions(-)

Reverting this commit and rebuilding the kernel confirmed the
correctness of the find.

I hope my report helps fix the problem quickly.

Full kernel log is here: https://pastebin.com/5hsuhifY

--
Best Regards,
Mike Gavrilov.


2022-12-15 18:26:49

by Kai Vehmanen

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

Hi,

On Thu, 15 Dec 2022, Mikhail Gavrilov wrote:

> The kernel 6.2 preparation cycle has begun and yesterday after the
> kernel was updated on my Fedora Rawhide all audio devices disappeared.

I can confirm this breaks audio in our SOF tests if I cherry-pick the
identified patch ffcb754584603a to sound tree. This affects audio on a
very large number of x86 systems.

Br, Kai

2022-12-16 00:41:47

by Joan Bruguera Micó

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

The one passing the __GFP_COMP flag appears to be sound/core/memalloc.c,
see also commit e529d3507a93d3c9528580081bbaf931a50de154.
Removing the flags also fixes the sound issues and warnings for me.

*Resent with fixed Message-ID - sorry!

2022-12-16 00:43:35

by Joan Bruguera Micó

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

The one passing the __GFP_COMP flag appears to be sound/core/memalloc.c,
see also commit e529d3507a93d3c9528580081bbaf931a50de154.
Removing the flags also fixes the sound issues and warnings for me.

2022-12-16 07:04:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

Ok, it seems like the sound noncontig alloc code that I already
commented on as potentially bogus GFP_GOMP mapping trips this. I think
for now the right thing would be to revert the hunk in dma-iommu.c
(see patch below). The other thing to try would be to remove both
uses GFP_COMP in sound/core/memalloc.c, which should have the same
effect.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9297b741f5e80e..f798c44e090337 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -744,9 +744,6 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
/* IOMMU can map any pages, so himem can also be used here */
gfp |= __GFP_NOWARN | __GFP_HIGHMEM;

- /* It makes no sense to muck about with huge pages */
- gfp &= ~__GFP_COMP;
-
while (count) {
struct page *page = NULL;
unsigned int order_size;

2022-12-16 11:51:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

On 2022-12-16 06:46, Christoph Hellwig wrote:
> Ok, it seems like the sound noncontig alloc code that I already
> commented on as potentially bogus GFP_GOMP mapping trips this. I think
> for now the right thing would be to revert the hunk in dma-iommu.c
> (see patch below). The other thing to try would be to remove both
> uses GFP_COMP in sound/core/memalloc.c, which should have the same
> effect.

Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe
dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That
seems like it might be the most robust option.

Robin.

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 9297b741f5e80e..f798c44e090337 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -744,9 +744,6 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
> /* IOMMU can map any pages, so himem can also be used here */
> gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
>
> - /* It makes no sense to muck about with huge pages */
> - gfp &= ~__GFP_COMP;
> -
> while (count) {
> struct page *page = NULL;
> unsigned int order_size;
>

2022-12-16 12:39:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

On Fri, Dec 16, 2022 at 11:40:57AM +0000, Robin Murphy wrote:
> On 2022-12-16 06:46, Christoph Hellwig wrote:
>> Ok, it seems like the sound noncontig alloc code that I already
>> commented on as potentially bogus GFP_GOMP mapping trips this. I think
>> for now the right thing would be to revert the hunk in dma-iommu.c
>> (see patch below). The other thing to try would be to remove both
>> uses GFP_COMP in sound/core/memalloc.c, which should have the same
>> effect.
>
> Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe
> dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That
> seems like it might be the most robust option.

In the long run warning there and returning an error seems like the
right thing to do, yes. I'm just a little worried doing this right now
after the merge window.

2022-12-16 13:14:13

by Robin Murphy

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)

On 2022-12-16 12:15, Christoph Hellwig wrote:
> On Fri, Dec 16, 2022 at 11:40:57AM +0000, Robin Murphy wrote:
>> On 2022-12-16 06:46, Christoph Hellwig wrote:
>>> Ok, it seems like the sound noncontig alloc code that I already
>>> commented on as potentially bogus GFP_GOMP mapping trips this. I think
>>> for now the right thing would be to revert the hunk in dma-iommu.c
>>> (see patch below). The other thing to try would be to remove both
>>> uses GFP_COMP in sound/core/memalloc.c, which should have the same
>>> effect.
>>
>> Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe
>> dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That
>> seems like it might be the most robust option.
>
> In the long run warning there and returning an error seems like the
> right thing to do, yes. I'm just a little worried doing this right now
> after the merge window.

Fair point, I guess nobody else actually implements
dma_alloc_noncontiguous(), and dma_alloc_pages() seems a bit of a grey
area since it is more of an explicit page allocator. So yeah, just
restoring iommu-dma (perhaps with a mild VM_WARN_ON?) seems like a
sufficiently safe and sensible fix for the short term. You can have my
pre-emptive ack for that.

Cheers,
Robin.

2022-12-22 12:52:58

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!) #forregzbot

[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]

On 15.12.22 15:17, Mikhail Gavrilov wrote:
> Hi,
> The kernel 6.2 preparation cycle has begun and yesterday after the
> kernel was updated on my Fedora Rawhide all audio devices disappeared.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced ffcb754584603adf
#regzbot title dma-mapping: audio devices disappeared
#regzbot monitor:
https://lore.kernel.org/all/[email protected]/
#regzbot fix: dma-mapping: reject GFP_COMP for noncohernt allocaions
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.



2022-12-24 07:38:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!) #forregzbot



On 22.12.22 13:17, Thorsten Leemhuis wrote:
> [Note: this mail contains only information for Linux kernel regression
> tracking. Mails like these contain '#forregzbot' in the subject to make
> then easy to spot and filter out. The author also tried to remove most
> or all individuals from the list of recipients to spare them the hassle.]
>
> On 15.12.22 15:17, Mikhail Gavrilov wrote:
>> Hi,
>> The kernel 6.2 preparation cycle has begun and yesterday after the
>> kernel was updated on my Fedora Rawhide all audio devices disappeared.
>
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced ffcb754584603adf
> #regzbot title dma-mapping: audio devices disappeared
> #regzbot monitor:
> https://lore.kernel.org/all/[email protected]/
> #regzbot fix: dma-mapping: reject GFP_COMP for noncohernt allocaions

The typo in the subject of the fix was fixed, hence this is needed:

#regzbot fix: 3622b86f49f8