I didn't get any responses to this.
git bisect shows that the problem did actually exist in 4.5.0-rc6, but
has gotten worse by many orders of magnitude (< 1/week to ~20M/hour).
Presently with 4.9-rc5, it's now writing ~2.5GB/hour to syslog.
The list of addresses in that time is only ~80 unique ranges, each
appearing ~320K times. They don't appear exactly in order, so the kernel
does not squelch the log message for appearing too frequently.
Could somebody at least make a suggestion on how to trace the printed
range to somewhere in the kernel?
On Sat, Nov 19, 2016 at 03:25:32AM +0000, Robin H. Johnson wrote:
> (Replies CC to list and direct to me please)
>
> Summary:
> --------
> dmesg spammed with alloc_contig_range: [XX, YY) PFNs busy
>
> Description:
> ------------
> I recently upgrading 4.9-rc5, (previous kernel 4.5.0-rc6-00141-g6794402),
> and since then my dmesg has been absolutely flooded with 'PFNs busy'
> (>3GiB/day). My config did not change (all new options =n).
>
> It's not consistent addresses, so the squelch of identical printk lines
> hasn't helped.
> Eg output:
> [187487.621916] alloc_contig_range: [83f0a9, 83f0aa) PFNs busy
> [187487.621924] alloc_contig_range: [83f0ce, 83f0cf) PFNs busy
> [187487.621976] alloc_contig_range: [83f125, 83f126) PFNs busy
> [187487.622013] alloc_contig_range: [83f127, 83f128) PFNs busy
>
> Keywords:
> ---------
> mm, alloc_contig_range, CMA
>
> Most recent kernel version which did not have the bug:
> ------------------------------------------------------
> Known 4.5.0-rc6-00141-g6794402
>
> ver_linux:
> ----------
> Linux bohr-int 4.9.0-rc5-00177-g81bcfe5 #12 SMP Wed Nov 16 13:16:32 PST
> 2016 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel
> GNU/Linux
>
> GNU C 5.3.0
> GNU Make 4.2.1
> Binutils 2.25.1
> Util-linux 2.29
> Mount 2.29
> Quota-tools 4.03
> Linux C Library 2.23
> Dynamic linker (ldd) 2.23
> readlink: missing operand
> Try 'readlink --help' for more information.
> Procps 3.3.12
> Net-tools 1.60
> Kbd 2.0.3
> Console-tools 2.0.3
> Sh-utils 8.25
> Udev 230
> Modules Loaded 3w_sas 3w_xxxx ablk_helper aesni_intel
> aes_x86_64 af_packet ahci aic79xx amdgpu async_memcpy async_pq
> async_raid6_recov async_tx async_xor ata_piix auth_rpcgss binfmt_misc
> bluetooth bnep bnx2 bonding btbcm btintel btrfs btrtl btusb button cdrom
> cn configs coretemp crc32c_intel crc32_pclmul crc_ccitt crc_itu_t
> crct10dif_pclmul cryptd dca dm_bio_prison dm_bufio dm_cache dm_cache_smq
> dm_crypt dm_delay dm_flakey dm_log dm_log_userspace dm_mirror dm_mod
> dm_multipath dm_persistent_data dm_queue_length dm_raid dm_region_hash
> dm_round_robin dm_service_time dm_snapshot dm_thin_pool dm_zero drm
> drm_kms_helper dummy e1000 e1000e evdev ext2 fat fb_sys_fops
> firewire_core firewire_ohci fjes fscache fuse ghash_clmulni_intel
> glue_helper grace hangcheck_timer hid_a4tech hid_apple hid_belkin
> hid_cherry hid_chicony hid_cypress hid_ezkey hid_generic hid_gyration
> hid_logitech hid_logitech_dj hid_microsoft hid_monterey hid_petalynx
> hid_pl hid_samsung hid_sony hid_sunplus hwmon_vid i2c_algo_bit i2c_i801
> i2c_smbus igb input_leds intel_rapl ip6_udp_tunnel ipv6 irqbypass
> iscsi_tcp iTCO_vendor_support iTCO_wdt ixgb ixgbe jfs kvm kvm_intel
> libahci libata libcrc32c libiscsi libiscsi_tcp linear lockd lpc_ich lpfc
> lrw macvlan mdio md_mod megaraid_mbox megaraid_mm megaraid_sas mii
> mptbase mptfc mptsas mptscsih mptspi multipath nfs nfs_acl nfsd
> nls_cp437 nls_iso8859_1 nvram ohci_hcd pata_jmicron pata_marvell
> pata_platform pcspkr psmouse qla1280 qla2xxx r8169 radeon raid0 raid10
> raid1 raid456 raid6_pq reiserfs rfkill sata_mv sata_sil24
> scsi_transport_fc scsi_transport_iscsi scsi_transport_sas
> scsi_transport_spi sd_mod sg sky2 snd snd_hda_codec
> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_core snd_hda_intel snd_hwdep snd_pcm snd_timer soundcore sr_mod
> sunrpc syscopyarea sysfillrect sysimgblt tg3 ttm uas udp_tunnel
> usb_storage vfat virtio virtio_net virtio_ring vxlan w83627ehf
> x86_pkg_temp_thermal xfs xhci_hcd xhci_pci xor zlib_deflate
--
Robin Hugh Johnson
E-Mail : [email protected]
Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ# : 30269588 or 41961639
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
[Let's CC linux-mm and Michal]
On Tue 29-11-16 22:43:08, Robin H. Johnson wrote:
> I didn't get any responses to this.
>
> git bisect shows that the problem did actually exist in 4.5.0-rc6, but
> has gotten worse by many orders of magnitude (< 1/week to ~20M/hour).
>
> Presently with 4.9-rc5, it's now writing ~2.5GB/hour to syslog.
This is really not helpful. I think we should simply make it pr_debug or
need some ratelimitting. AFAIU the message is far from serious
> The list of addresses in that time is only ~80 unique ranges, each
> appearing ~320K times. They don't appear exactly in order, so the kernel
> does not squelch the log message for appearing too frequently.
>
> Could somebody at least make a suggestion on how to trace the printed
> range to somewhere in the kernel?
>
> On Sat, Nov 19, 2016 at 03:25:32AM +0000, Robin H. Johnson wrote:
> > (Replies CC to list and direct to me please)
> >
> > Summary:
> > --------
> > dmesg spammed with alloc_contig_range: [XX, YY) PFNs busy
> >
> > Description:
> > ------------
> > I recently upgrading 4.9-rc5, (previous kernel 4.5.0-rc6-00141-g6794402),
> > and since then my dmesg has been absolutely flooded with 'PFNs busy'
> > (>3GiB/day). My config did not change (all new options =n).
> >
> > It's not consistent addresses, so the squelch of identical printk lines
> > hasn't helped.
> > Eg output:
> > [187487.621916] alloc_contig_range: [83f0a9, 83f0aa) PFNs busy
> > [187487.621924] alloc_contig_range: [83f0ce, 83f0cf) PFNs busy
> > [187487.621976] alloc_contig_range: [83f125, 83f126) PFNs busy
> > [187487.622013] alloc_contig_range: [83f127, 83f128) PFNs busy
> >
> > Keywords:
> > ---------
> > mm, alloc_contig_range, CMA
> >
> > Most recent kernel version which did not have the bug:
> > ------------------------------------------------------
> > Known 4.5.0-rc6-00141-g6794402
> >
> > ver_linux:
> > ----------
> > Linux bohr-int 4.9.0-rc5-00177-g81bcfe5 #12 SMP Wed Nov 16 13:16:32 PST
> > 2016 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel
> > GNU/Linux
> >
> > GNU C 5.3.0
> > GNU Make 4.2.1
> > Binutils 2.25.1
> > Util-linux 2.29
> > Mount 2.29
> > Quota-tools 4.03
> > Linux C Library 2.23
> > Dynamic linker (ldd) 2.23
> > readlink: missing operand
> > Try 'readlink --help' for more information.
> > Procps 3.3.12
> > Net-tools 1.60
> > Kbd 2.0.3
> > Console-tools 2.0.3
> > Sh-utils 8.25
> > Udev 230
> > Modules Loaded 3w_sas 3w_xxxx ablk_helper aesni_intel
> > aes_x86_64 af_packet ahci aic79xx amdgpu async_memcpy async_pq
> > async_raid6_recov async_tx async_xor ata_piix auth_rpcgss binfmt_misc
> > bluetooth bnep bnx2 bonding btbcm btintel btrfs btrtl btusb button cdrom
> > cn configs coretemp crc32c_intel crc32_pclmul crc_ccitt crc_itu_t
> > crct10dif_pclmul cryptd dca dm_bio_prison dm_bufio dm_cache dm_cache_smq
> > dm_crypt dm_delay dm_flakey dm_log dm_log_userspace dm_mirror dm_mod
> > dm_multipath dm_persistent_data dm_queue_length dm_raid dm_region_hash
> > dm_round_robin dm_service_time dm_snapshot dm_thin_pool dm_zero drm
> > drm_kms_helper dummy e1000 e1000e evdev ext2 fat fb_sys_fops
> > firewire_core firewire_ohci fjes fscache fuse ghash_clmulni_intel
> > glue_helper grace hangcheck_timer hid_a4tech hid_apple hid_belkin
> > hid_cherry hid_chicony hid_cypress hid_ezkey hid_generic hid_gyration
> > hid_logitech hid_logitech_dj hid_microsoft hid_monterey hid_petalynx
> > hid_pl hid_samsung hid_sony hid_sunplus hwmon_vid i2c_algo_bit i2c_i801
> > i2c_smbus igb input_leds intel_rapl ip6_udp_tunnel ipv6 irqbypass
> > iscsi_tcp iTCO_vendor_support iTCO_wdt ixgb ixgbe jfs kvm kvm_intel
> > libahci libata libcrc32c libiscsi libiscsi_tcp linear lockd lpc_ich lpfc
> > lrw macvlan mdio md_mod megaraid_mbox megaraid_mm megaraid_sas mii
> > mptbase mptfc mptsas mptscsih mptspi multipath nfs nfs_acl nfsd
> > nls_cp437 nls_iso8859_1 nvram ohci_hcd pata_jmicron pata_marvell
> > pata_platform pcspkr psmouse qla1280 qla2xxx r8169 radeon raid0 raid10
> > raid1 raid456 raid6_pq reiserfs rfkill sata_mv sata_sil24
> > scsi_transport_fc scsi_transport_iscsi scsi_transport_sas
> > scsi_transport_spi sd_mod sg sky2 snd snd_hda_codec
> > snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek
> > snd_hda_core snd_hda_intel snd_hwdep snd_pcm snd_timer soundcore sr_mod
> > sunrpc syscopyarea sysfillrect sysimgblt tg3 ttm uas udp_tunnel
> > usb_storage vfat virtio virtio_net virtio_ring vxlan w83627ehf
> > x86_pkg_temp_thermal xfs xhci_hcd xhci_pci xor zlib_deflate
>
> --
> Robin Hugh Johnson
> E-Mail : [email protected]
> Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
> ICQ# : 30269588 or 41961639
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
--
Michal Hocko
SUSE Labs
On Wed, Nov 30 2016, Michal Hocko wrote:
> [Let's CC linux-mm and Michal]
>
> On Tue 29-11-16 22:43:08, Robin H. Johnson wrote:
>> I didn't get any responses to this.
>>
>> git bisect shows that the problem did actually exist in 4.5.0-rc6, but
>> has gotten worse by many orders of magnitude (< 1/week to ~20M/hour).
>>
>> Presently with 4.9-rc5, it's now writing ~2.5GB/hour to syslog.
>
> This is really not helpful. I think we should simply make it pr_debug or
> need some ratelimitting. AFAIU the message is far from serious
On the other hand, if this didn’t happen and now happens all the time,
this indicates a regression in CMA’s capability to allocate pages so
just rate limiting the output would hide the potential actual issue.
>
>> The list of addresses in that time is only ~80 unique ranges, each
>> appearing ~320K times. They don't appear exactly in order, so the kernel
>> does not squelch the log message for appearing too frequently.
>>
>> Could somebody at least make a suggestion on how to trace the printed
>> range to somewhere in the kernel?
>>
>> On Sat, Nov 19, 2016 at 03:25:32AM +0000, Robin H. Johnson wrote:
>> > (Replies CC to list and direct to me please)
>> >
>> > Summary:
>> > --------
>> > dmesg spammed with alloc_contig_range: [XX, YY) PFNs busy
>> >
>> > Description:
>> > ------------
>> > I recently upgrading 4.9-rc5, (previous kernel 4.5.0-rc6-00141-g6794402),
>> > and since then my dmesg has been absolutely flooded with 'PFNs busy'
>> > (>3GiB/day). My config did not change (all new options =n).
>> >
>> > It's not consistent addresses, so the squelch of identical printk lines
>> > hasn't helped.
>> > Eg output:
>> > [187487.621916] alloc_contig_range: [83f0a9, 83f0aa) PFNs busy
>> > [187487.621924] alloc_contig_range: [83f0ce, 83f0cf) PFNs busy
>> > [187487.621976] alloc_contig_range: [83f125, 83f126) PFNs busy
>> > [187487.622013] alloc_contig_range: [83f127, 83f128) PFNs busy
>> >
>> > Keywords:
>> > ---------
>> > mm, alloc_contig_range, CMA
>> >
>> > Most recent kernel version which did not have the bug:
>> > ------------------------------------------------------
>> > Known 4.5.0-rc6-00141-g6794402
>> >
>> > ver_linux:
>> > ----------
>> > Linux bohr-int 4.9.0-rc5-00177-g81bcfe5 #12 SMP Wed Nov 16 13:16:32 PST
>> > 2016 x86_64 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz GenuineIntel
>> > GNU/Linux
>> >
>> > GNU C 5.3.0
>> > GNU Make 4.2.1
>> > Binutils 2.25.1
>> > Util-linux 2.29
>> > Mount 2.29
>> > Quota-tools 4.03
>> > Linux C Library 2.23
>> > Dynamic linker (ldd) 2.23
>> > readlink: missing operand
>> > Try 'readlink --help' for more information.
>> > Procps 3.3.12
>> > Net-tools 1.60
>> > Kbd 2.0.3
>> > Console-tools 2.0.3
>> > Sh-utils 8.25
>> > Udev 230
>> > Modules Loaded 3w_sas 3w_xxxx ablk_helper aesni_intel
>> > aes_x86_64 af_packet ahci aic79xx amdgpu async_memcpy async_pq
>> > async_raid6_recov async_tx async_xor ata_piix auth_rpcgss binfmt_misc
>> > bluetooth bnep bnx2 bonding btbcm btintel btrfs btrtl btusb button cdrom
>> > cn configs coretemp crc32c_intel crc32_pclmul crc_ccitt crc_itu_t
>> > crct10dif_pclmul cryptd dca dm_bio_prison dm_bufio dm_cache dm_cache_smq
>> > dm_crypt dm_delay dm_flakey dm_log dm_log_userspace dm_mirror dm_mod
>> > dm_multipath dm_persistent_data dm_queue_length dm_raid dm_region_hash
>> > dm_round_robin dm_service_time dm_snapshot dm_thin_pool dm_zero drm
>> > drm_kms_helper dummy e1000 e1000e evdev ext2 fat fb_sys_fops
>> > firewire_core firewire_ohci fjes fscache fuse ghash_clmulni_intel
>> > glue_helper grace hangcheck_timer hid_a4tech hid_apple hid_belkin
>> > hid_cherry hid_chicony hid_cypress hid_ezkey hid_generic hid_gyration
>> > hid_logitech hid_logitech_dj hid_microsoft hid_monterey hid_petalynx
>> > hid_pl hid_samsung hid_sony hid_sunplus hwmon_vid i2c_algo_bit i2c_i801
>> > i2c_smbus igb input_leds intel_rapl ip6_udp_tunnel ipv6 irqbypass
>> > iscsi_tcp iTCO_vendor_support iTCO_wdt ixgb ixgbe jfs kvm kvm_intel
>> > libahci libata libcrc32c libiscsi libiscsi_tcp linear lockd lpc_ich lpfc
>> > lrw macvlan mdio md_mod megaraid_mbox megaraid_mm megaraid_sas mii
>> > mptbase mptfc mptsas mptscsih mptspi multipath nfs nfs_acl nfsd
>> > nls_cp437 nls_iso8859_1 nvram ohci_hcd pata_jmicron pata_marvell
>> > pata_platform pcspkr psmouse qla1280 qla2xxx r8169 radeon raid0 raid10
>> > raid1 raid456 raid6_pq reiserfs rfkill sata_mv sata_sil24
>> > scsi_transport_fc scsi_transport_iscsi scsi_transport_sas
>> > scsi_transport_spi sd_mod sg sky2 snd snd_hda_codec
>> > snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek
>> > snd_hda_core snd_hda_intel snd_hwdep snd_pcm snd_timer soundcore sr_mod
>> > sunrpc syscopyarea sysfillrect sysimgblt tg3 ttm uas udp_tunnel
>> > usb_storage vfat virtio virtio_net virtio_ring vxlan w83627ehf
>> > x86_pkg_temp_thermal xfs xhci_hcd xhci_pci xor zlib_deflate
>>
>> --
>> Robin Hugh Johnson
>> E-Mail : [email protected]
>> Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
>> ICQ# : 30269588 or 41961639
>> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
>
>
>
> --
> Michal Hocko
> SUSE Labs
--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
On Wed 30-11-16 14:08:00, Michal Nazarewicz wrote:
> On Wed, Nov 30 2016, Michal Hocko wrote:
> > [Let's CC linux-mm and Michal]
> >
> > On Tue 29-11-16 22:43:08, Robin H. Johnson wrote:
> >> I didn't get any responses to this.
> >>
> >> git bisect shows that the problem did actually exist in 4.5.0-rc6, but
> >> has gotten worse by many orders of magnitude (< 1/week to ~20M/hour).
> >>
> >> Presently with 4.9-rc5, it's now writing ~2.5GB/hour to syslog.
> >
> > This is really not helpful. I think we should simply make it pr_debug or
> > need some ratelimitting. AFAIU the message is far from serious
>
> On the other hand, if this didn’t happen and now happens all the time,
> this indicates a regression in CMA’s capability to allocate pages so
> just rate limiting the output would hide the potential actual issue.
Or there might be just a much larger demand on those large blocks, no?
But seriously, dumping those message again and again into the low (see
the 2.5_GB_/h to the log is just insane. So there really should be some
throttling.
Does the following help you Robin. At least to not get swamped by those
message.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fbfead6aa7d..96eb8d107582 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7424,7 +7424,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
/* Make sure the range is really isolated. */
if (test_pages_isolated(outer_start, end, false)) {
- pr_info("%s: [%lx, %lx) PFNs busy\n",
+ printk_ratelimited(KERN_DEBUG "%s: [%lx, %lx) PFNs busy\n",
__func__, outer_start, end);
ret = -EBUSY;
goto done;
I would also suggest to add dump_stack() to that path to see who is
actually demanding so much large continuous blocks.
--
Michal Hocko
SUSE Labs
(I'm going to respond directly to this email with the stack trace.)
On Wed, Nov 30, 2016 at 02:28:49PM +0100, Michal Hocko wrote:
> > On the other hand, if this didn’t happen and now happens all the time,
> > this indicates a regression in CMA’s capability to allocate pages so
> > just rate limiting the output would hide the potential actual issue.
>
> Or there might be just a much larger demand on those large blocks, no?
> But seriously, dumping those message again and again into the low (see
> the 2.5_GB_/h to the log is just insane. So there really should be some
> throttling.
>
> Does the following help you Robin. At least to not get swamped by those
> message.
Here's what I whipped up based on that, to ensure that dump_stack got
rate-limited at the same pass as PFNs-busy. It dropped the dmesg spew to
~25MB/hour (and is suppressing ~43 entries/second right now).
commit 6ad4037e18ec2199f8755274d8a745a9904241a1
Author: Robin H. Johnson <[email protected]>
Date: Wed Nov 30 10:32:57 2016 -0800
mm: ratelimit & trace PFNs busy.
Signed-off-by: Robin H. Johnson <[email protected]>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440e3ae2..3c28ec3d18f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7289,8 +7289,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
/* Make sure the range is really isolated. */
if (test_pages_isolated(outer_start, end, false)) {
- pr_info("%s: [%lx, %lx) PFNs busy\n",
- __func__, outer_start, end);
+ static DEFINE_RATELIMIT_STATE(ratelimit_pfn_busy,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ if (__ratelimit(&ratelimit_pfn_busy)) {
+ pr_info("%s: [%lx, %lx) PFNs busy\n",
+ __func__, outer_start, end);
+ dump_stack();
+ }
+
ret = -EBUSY;
goto done;
}
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail : [email protected]
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
Somewhere in the Radeon/DRM codebase, CMA page allocation has either
regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is
doing something different with pages.
Given that I haven't seen ANY other reports of this, I'm inclined to
believe the problem is drm/radeon specific (if I don't start X, I can't
reproduce the problem).
The rate of the problem starts slow, and also is relatively low on an idle
system (my screens blank at night, no xscreensaver running), but it still ramps
up over time (to the point of generating 2.5GB/hour of "(timestamp)
alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100
unique ranges for a day).
My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9
virtual desktops per monitor).
I added a stack trace & rate limit to alloc_contig_range's PFNs busy message
(patch in previous email on LKML/-MM lists); and they point to radeon.
alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
CPU: 3 PID: 8518 Comm: X Not tainted 4.9.0-rc7-00024-g6ad4037e18ec #27
Hardware name: System manufacturer System Product Name/P8Z68 DELUXE, BIOS 0501 05/09/2011
ffffad50c3d7f730 ffffffffb236c873 000000000083f2a3 000000000083f2a4
ffffad50c3d7f810 ffffffffb2183b38 ffff999dff4d8040 0000000020fca8c0
000000000083f400 000000000083f000 000000000083f2a3 0000000000000004
Call Trace:
[<ffffffffb236c873>] dump_stack+0x85/0xc2
[<ffffffffb2183b38>] alloc_contig_range+0x368/0x370
[<ffffffffb2202d37>] cma_alloc+0x127/0x2e0
[<ffffffffb24c4b28>] dma_alloc_from_contiguous+0x38/0x40
[<ffffffffb2020b01>] dma_generic_alloc_coherent+0x91/0x1d0
[<ffffffffb2049b75>] x86_swiotlb_alloc_coherent+0x25/0x50
[<ffffffffc0ef17da>] ttm_dma_populate+0x48a/0x9a0 [ttm]
[<ffffffffb21df8d6>] ? __kmalloc+0x1b6/0x250
[<ffffffffc0f2a3ea>] radeon_ttm_tt_populate+0x22a/0x2d0 [radeon]
[<ffffffffc0ee80f7>] ? ttm_dma_tt_init+0x67/0xc0 [ttm]
[<ffffffffc0ee7cc7>] ttm_tt_bind+0x37/0x70 [ttm]
[<ffffffffc0ee9e58>] ttm_bo_handle_move_mem+0x528/0x5a0 [ttm]
[<ffffffffb219464a>] ? shmem_alloc_inode+0x1a/0x30
[<ffffffffc0eead24>] ttm_bo_validate+0x114/0x130 [ttm]
[<ffffffffb269346e>] ? _raw_write_unlock+0xe/0x10
[<ffffffffc0eeb05d>] ttm_bo_init+0x31d/0x3f0 [ttm]
[<ffffffffc0f2b7ab>] radeon_bo_create+0x19b/0x260 [radeon]
[<ffffffffc0f2b2e0>] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
[<ffffffffc0f3e29d>] radeon_gem_object_create+0xad/0x180 [radeon]
[<ffffffffc0f3e6ff>] radeon_gem_create_ioctl+0x5f/0xf0 [radeon]
[<ffffffffc0e3a9eb>] drm_ioctl+0x21b/0x4d0 [drm]
[<ffffffffc0f3e6a0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
[<ffffffffc0f0d04c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[<ffffffffb221bae2>] do_vfs_ioctl+0x92/0x5c0
[<ffffffffb221c089>] SyS_ioctl+0x79/0x90
[<ffffffffb2002bf3>] do_syscall_64+0x73/0x190
[<ffffffffb26936c8>] entry_SYSCALL64_slow_path+0x25/0x25
The Radeon card in my case is a VisionTek HD 7750 Eyefinity 6, which is
reported as:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] (prog-if 00 [VGA controller])
Subsystem: VISIONTEK Cape Verde PRO [Radeon HD 7750/8740 / R7 250E]
Flags: bus master, fast devsel, latency 0, IRQ 58
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at fbe00000 (64-bit, non-prefetchable) [size=256K]
I/O ports at e000 [size=256]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Kernel driver in use: radeon
Kernel modules: radeon, amdgpu
--
Robin Hugh Johnson
E-Mail : [email protected]
Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ# : 30269588 or 41961639
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
[add more CC's]
On 11/30/2016 09:19 PM, Robin H. Johnson wrote:
> Somewhere in the Radeon/DRM codebase, CMA page allocation has either
> regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is
> doing something different with pages.
Could be that it didn't use dma_generic_alloc_coherent() before, or you didn't
have the generic CMA pool configured. What's the output of "grep CMA" on your
.config? Or any kernel boot options with cma in name? By default config this
should not be used on x86.
> Given that I haven't seen ANY other reports of this, I'm inclined to
> believe the problem is drm/radeon specific (if I don't start X, I can't
> reproduce the problem).
It's rather CMA specific, the allocation attemps just can't be 100% reliable due
to how CMA works. The question is if it should be spewing in the log in the
context of dma-cma, which has a fallback allocation option. It even uses
__GFP_NOWARN, perhaps the CMA path should respect that?
> The rate of the problem starts slow, and also is relatively low on an idle
> system (my screens blank at night, no xscreensaver running), but it still ramps
> up over time (to the point of generating 2.5GB/hour of "(timestamp)
> alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100
> unique ranges for a day).
>
> My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9
> virtual desktops per monitor).
So IIUC, except the messages, everything actually works fine?
> I added a stack trace & rate limit to alloc_contig_range's PFNs busy message
> (patch in previous email on LKML/-MM lists); and they point to radeon.
>
> alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> CPU: 3 PID: 8518 Comm: X Not tainted 4.9.0-rc7-00024-g6ad4037e18ec #27
> Hardware name: System manufacturer System Product Name/P8Z68 DELUXE, BIOS 0501 05/09/2011
> ffffad50c3d7f730 ffffffffb236c873 000000000083f2a3 000000000083f2a4
> ffffad50c3d7f810 ffffffffb2183b38 ffff999dff4d8040 0000000020fca8c0
> 000000000083f400 000000000083f000 000000000083f2a3 0000000000000004
> Call Trace:
> [<ffffffffb236c873>] dump_stack+0x85/0xc2
> [<ffffffffb2183b38>] alloc_contig_range+0x368/0x370
> [<ffffffffb2202d37>] cma_alloc+0x127/0x2e0
> [<ffffffffb24c4b28>] dma_alloc_from_contiguous+0x38/0x40
> [<ffffffffb2020b01>] dma_generic_alloc_coherent+0x91/0x1d0
> [<ffffffffb2049b75>] x86_swiotlb_alloc_coherent+0x25/0x50
> [<ffffffffc0ef17da>] ttm_dma_populate+0x48a/0x9a0 [ttm]
> [<ffffffffb21df8d6>] ? __kmalloc+0x1b6/0x250
> [<ffffffffc0f2a3ea>] radeon_ttm_tt_populate+0x22a/0x2d0 [radeon]
> [<ffffffffc0ee80f7>] ? ttm_dma_tt_init+0x67/0xc0 [ttm]
> [<ffffffffc0ee7cc7>] ttm_tt_bind+0x37/0x70 [ttm]
> [<ffffffffc0ee9e58>] ttm_bo_handle_move_mem+0x528/0x5a0 [ttm]
> [<ffffffffb219464a>] ? shmem_alloc_inode+0x1a/0x30
> [<ffffffffc0eead24>] ttm_bo_validate+0x114/0x130 [ttm]
> [<ffffffffb269346e>] ? _raw_write_unlock+0xe/0x10
> [<ffffffffc0eeb05d>] ttm_bo_init+0x31d/0x3f0 [ttm]
> [<ffffffffc0f2b7ab>] radeon_bo_create+0x19b/0x260 [radeon]
> [<ffffffffc0f2b2e0>] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
> [<ffffffffc0f3e29d>] radeon_gem_object_create+0xad/0x180 [radeon]
> [<ffffffffc0f3e6ff>] radeon_gem_create_ioctl+0x5f/0xf0 [radeon]
> [<ffffffffc0e3a9eb>] drm_ioctl+0x21b/0x4d0 [drm]
> [<ffffffffc0f3e6a0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> [<ffffffffc0f0d04c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
> [<ffffffffb221bae2>] do_vfs_ioctl+0x92/0x5c0
> [<ffffffffb221c089>] SyS_ioctl+0x79/0x90
> [<ffffffffb2002bf3>] do_syscall_64+0x73/0x190
> [<ffffffffb26936c8>] entry_SYSCALL64_slow_path+0x25/0x25
>
> The Radeon card in my case is a VisionTek HD 7750 Eyefinity 6, which is
> reported as:
>
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] (prog-if 00 [VGA controller])
> Subsystem: VISIONTEK Cape Verde PRO [Radeon HD 7750/8740 / R7 250E]
> Flags: bus master, fast devsel, latency 0, IRQ 58
> Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Memory at fbe00000 (64-bit, non-prefetchable) [size=256K]
> I/O ports at e000 [size=256]
> Expansion ROM at 000c0000 [disabled] [size=128K]
> Capabilities: [48] Vendor Specific Information: Len=08 <?>
> Capabilities: [50] Power Management version 3
> Capabilities: [58] Express Legacy Endpoint, MSI 00
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Capabilities: [150] Advanced Error Reporting
> Kernel driver in use: radeon
> Kernel modules: radeon, amdgpu
>
On Wed, Nov 30 2016, Robin H. Johnson wrote:
> (I'm going to respond directly to this email with the stack trace.)
>
> On Wed, Nov 30, 2016 at 02:28:49PM +0100, Michal Hocko wrote:
>> > On the other hand, if this didn’t happen and now happens all the time,
>> > this indicates a regression in CMA’s capability to allocate pages so
>> > just rate limiting the output would hide the potential actual issue.
>>
>> Or there might be just a much larger demand on those large blocks, no?
>> But seriously, dumping those message again and again into the low (see
>> the 2.5_GB_/h to the log is just insane. So there really should be some
>> throttling.
>>
>> Does the following help you Robin. At least to not get swamped by those
>> message.
> Here's what I whipped up based on that, to ensure that dump_stack got
> rate-limited at the same pass as PFNs-busy. It dropped the dmesg spew to
> ~25MB/hour (and is suppressing ~43 entries/second right now).
>
> commit 6ad4037e18ec2199f8755274d8a745a9904241a1
> Author: Robin H. Johnson <[email protected]>
> Date: Wed Nov 30 10:32:57 2016 -0800
>
> mm: ratelimit & trace PFNs busy.
>
> Signed-off-by: Robin H. Johnson <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6de9440e3ae2..3c28ec3d18f8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7289,8 +7289,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
> /* Make sure the range is really isolated. */
> if (test_pages_isolated(outer_start, end, false)) {
> - pr_info("%s: [%lx, %lx) PFNs busy\n",
> - __func__, outer_start, end);
> + static DEFINE_RATELIMIT_STATE(ratelimit_pfn_busy,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> + if (__ratelimit(&ratelimit_pfn_busy)) {
> + pr_info("%s: [%lx, %lx) PFNs busy\n",
> + __func__, outer_start, end);
I’m thinking out loud here, but maybe it would be useful to include
a count of how many times this message has been suppressed?
> + dump_stack();
Perhaps do it only if CMA_DEBUG?
+ if (IS_ENABLED(CONFIG_CMA_DEBUG))
+ dump_stack();
> + }
> +
> ret = -EBUSY;
> goto done;
> }
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
> E-Mail : [email protected]
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
On Wed, Nov 30, 2016 at 10:24:59PM +0100, Vlastimil Babka wrote:
> [add more CC's]
>
> On 11/30/2016 09:19 PM, Robin H. Johnson wrote:
> > Somewhere in the Radeon/DRM codebase, CMA page allocation has either
> > regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is
> > doing something different with pages.
>
> Could be that it didn't use dma_generic_alloc_coherent() before, or you didn't
> have the generic CMA pool configured.
v4.9-rc7-23-gded6e842cf49:
[ 0.000000] cma: Reserved 16 MiB at 0x000000083e400000
[ 0.000000] Memory: 32883108K/33519432K available (6752K kernel code, 1244K
rwdata, 4716K rodata, 1772K init, 2720K bss, 619940K reserved, 16384K
cma-reserved)
> What's the output of "grep CMA" on your
> .config?
# grep CMA .config |grep -v -e SECMARK= -e CONFIG_BCMA -e CONFIG_USB_HCD_BCMA -e INPUT_CMA3000 -e CRYPTO_CMAC
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=7
CONFIG_DMA_CMA=y
CONFIG_CMA_SIZE_MBYTES=16
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
> Or any kernel boot options with cma in name?
None.
> By default config this should not be used on x86.
What do you mean by that statement?
It should be disallowed to enable CONFIG_CMA? Radeon and CMA should be
mutually exclusive?
> > Given that I haven't seen ANY other reports of this, I'm inclined to
> > believe the problem is drm/radeon specific (if I don't start X, I can't
> > reproduce the problem).
>
> It's rather CMA specific, the allocation attemps just can't be 100% reliable due
> to how CMA works. The question is if it should be spewing in the log in the
> context of dma-cma, which has a fallback allocation option. It even uses
> __GFP_NOWARN, perhaps the CMA path should respect that?
Yes, I'd say if there's a fallback without much penalty, nowarn makes
sense. If the fallback just tries multiple addresses until success, then
the warning should only be issued when too many attempts have been made.
>
> > The rate of the problem starts slow, and also is relatively low on an idle
> > system (my screens blank at night, no xscreensaver running), but it still ramps
> > up over time (to the point of generating 2.5GB/hour of "(timestamp)
> > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100
> > unique ranges for a day).
> >
> > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9
> > virtual desktops per monitor).
> So IIUC, except the messages, everything actually works fine?
There's high kernel CPU usage that seems to roughly correlate with the
messages, but I can't yet tell if that's due to the syslog itself, or
repeated alloc_contig_range requests.
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail : [email protected]
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
[...]
> alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
Huh, do I get it right that the request was for a _single_ page? Why do
we need CMA for that?
--
Michal Hocko
SUSE Labs
Forgot to CC Joonsoo. The email thread starts more or less here
http://lkml.kernel.org/r/[email protected]
On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> [...]
> > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
>
> Huh, do I get it right that the request was for a _single_ page? Why do
> we need CMA for that?
--
Michal Hocko
SUSE Labs
On 12/01/2016 07:21 AM, Robin H. Johnson wrote:
> On Wed, Nov 30, 2016 at 10:24:59PM +0100, Vlastimil Babka wrote:
>> [add more CC's]
>>
>> On 11/30/2016 09:19 PM, Robin H. Johnson wrote:
>> > Somewhere in the Radeon/DRM codebase, CMA page allocation has either
>> > regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is
>> > doing something different with pages.
>>
>> Could be that it didn't use dma_generic_alloc_coherent() before, or you didn't
>> have the generic CMA pool configured.
> v4.9-rc7-23-gded6e842cf49:
> [ 0.000000] cma: Reserved 16 MiB at 0x000000083e400000
> [ 0.000000] Memory: 32883108K/33519432K available (6752K kernel code, 1244K
> rwdata, 4716K rodata, 1772K init, 2720K bss, 619940K reserved, 16384K
> cma-reserved)
>
>> What's the output of "grep CMA" on your
>> .config?
>
> # grep CMA .config |grep -v -e SECMARK= -e CONFIG_BCMA -e CONFIG_USB_HCD_BCMA -e INPUT_CMA3000 -e CRYPTO_CMAC
> CONFIG_CMA=y
> # CONFIG_CMA_DEBUG is not set
> # CONFIG_CMA_DEBUGFS is not set
> CONFIG_CMA_AREAS=7
> CONFIG_DMA_CMA=y
> CONFIG_CMA_SIZE_MBYTES=16
> CONFIG_CMA_SIZE_SEL_MBYTES=y
> # CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
> # CONFIG_CMA_SIZE_SEL_MIN is not set
> # CONFIG_CMA_SIZE_SEL_MAX is not set
> CONFIG_CMA_ALIGNMENT=8
>
>> Or any kernel boot options with cma in name?
> None.
>
>
>> By default config this should not be used on x86.
> What do you mean by that statement?
I mean that the 16 mbytes for generic CMA area is not a default on x86:
config CMA_SIZE_MBYTES
int "Size in Mega Bytes"
depends on !CMA_SIZE_SEL_PERCENTAGE
default 0 if X86
default 16
Which explains why it's rare to see these reports in the context such as yours.
I'd recommend just disabling it, as the primary use case for CMA are devices on
mobile phones that don't have any other fallback (unlike the dma alloc).
> It should be disallowed to enable CONFIG_CMA? Radeon and CMA should be
> mutually exclusive?
I don't think this is a specific problem of radeon. But looks like it's a heavy
user of the dma alloc. There might be others.
>> > Given that I haven't seen ANY other reports of this, I'm inclined to
>> > believe the problem is drm/radeon specific (if I don't start X, I can't
>> > reproduce the problem).
>>
>> It's rather CMA specific, the allocation attemps just can't be 100% reliable due
>> to how CMA works. The question is if it should be spewing in the log in the
>> context of dma-cma, which has a fallback allocation option. It even uses
>> __GFP_NOWARN, perhaps the CMA path should respect that?
> Yes, I'd say if there's a fallback without much penalty, nowarn makes
> sense. If the fallback just tries multiple addresses until success, then
> the warning should only be issued when too many attempts have been made.
On the other hand, if the warnings are correlated with high kernel CPU usage,
it's arguably better to be warned.
>>
>> > The rate of the problem starts slow, and also is relatively low on an idle
>> > system (my screens blank at night, no xscreensaver running), but it still ramps
>> > up over time (to the point of generating 2.5GB/hour of "(timestamp)
>> > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100
>> > unique ranges for a day).
>> >
>> > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9
>> > virtual desktops per monitor).
>> So IIUC, except the messages, everything actually works fine?
> There's high kernel CPU usage that seems to roughly correlate with the
> messages, but I can't yet tell if that's due to the syslog itself, or
> repeated alloc_contig_range requests.
You could try running perf top.
On 12/01/2016 08:21 AM, Michal Hocko wrote:
> Forgot to CC Joonsoo. The email thread starts more or less here
> http://lkml.kernel.org/r/[email protected]
>
> On Thu 01-12-16 08:15:07, Michal Hocko wrote:
>> On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
>> [...]
>> > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
>>
>> Huh, do I get it right that the request was for a _single_ page? Why do
>> we need CMA for that?
Ugh, good point. I assumed that was just the PFNs that it failed to migrate
away, but it seems that's indeed the whole requested range. Yeah sounds some
part of the dma-cma chain could be smarter and attempt CMA only for e.g. costly
orders.
On Thu, Dec 01, 2016 at 08:38:15AM +0100, Vlastimil Babka wrote:
> >> By default config this should not be used on x86.
> > What do you mean by that statement?
>
> I mean that the 16 mbytes for generic CMA area is not a default on x86:
>
> config CMA_SIZE_MBYTES
> int "Size in Mega Bytes"
> depends on !CMA_SIZE_SEL_PERCENTAGE
> default 0 if X86
> default 16
d7be003a9d275299f5ee36bbdf156654f59e08e9 (v3.18-2122-gd7be003a9d27)
is there the 0MB if-x86 default was added to the tree. Prior to that, it
was 16MiB, and that's where my system picked up the value from.
I have a record of all my kconfigs, because I use oldconfig each time
(going back 8 years to 2.6.27)
# Added in 3.12.0-00001-g5f258d0
CONFIG_CMA=y
# Added in 3.16.0-rc6-00042-g67dd8f3
CONFIG_CMA_ALIGNMENT=8
CONFIG_CMA_AREAS=7
CONFIG_CMA_SIZE_MBYTES=16
CONFIG_CMA_SIZE_SEL_MBYTES=y
CONFIG_DMA_CMA=y
So the next question, is why did I pick up CMA in
3.16.0-rc6-00042-g67dd8f3... I'll poke at that.
> > Yes, I'd say if there's a fallback without much penalty, nowarn makes
> > sense. If the fallback just tries multiple addresses until success, then
> > the warning should only be issued when too many attempts have been made.
> On the other hand, if the warnings are correlated with high kernel CPU usage,
> it's arguably better to be warned.
Keep the rate-limit on the warning for cases like this?
> >> > The rate of the problem starts slow, and also is relatively low on an idle
> >> > system (my screens blank at night, no xscreensaver running), but it still ramps
> >> > up over time (to the point of generating 2.5GB/hour of "(timestamp)
> >> > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100
> >> > unique ranges for a day).
> >> >
> >> > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9
> >> > virtual desktops per monitor).
> >> So IIUC, except the messages, everything actually works fine?
> > There's high kernel CPU usage that seems to roughly correlate with the
> > messages, but I can't yet tell if that's due to the syslog itself, or
> > repeated alloc_contig_range requests.
> You could try running perf top.
Will do in the morning.
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail : [email protected]
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
Let's also CC Marek
On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
> On 12/01/2016 08:21 AM, Michal Hocko wrote:
> > Forgot to CC Joonsoo. The email thread starts more or less here
> > http://lkml.kernel.org/r/[email protected]
> >
> > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> > > [...]
> > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> > >
> > > Huh, do I get it right that the request was for a _single_ page? Why do
> > > we need CMA for that?
>
> Ugh, good point. I assumed that was just the PFNs that it failed to migrate
> away, but it seems that's indeed the whole requested range. Yeah sounds some
> part of the dma-cma chain could be smarter and attempt CMA only for e.g.
> costly orders.
Is there any reason why the DMA api doesn't try the page allocator first
before falling back to the CMA? I simply have a hard time to see why the
CMA should be used (and fragment) for small requests size.
--
Michal Hocko
SUSE Labs
On Thu, Dec 01 2016, Michal Hocko wrote:
> Let's also CC Marek
>
> On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
>> On 12/01/2016 08:21 AM, Michal Hocko wrote:
>> > Forgot to CC Joonsoo. The email thread starts more or less here
>> > http://lkml.kernel.org/r/[email protected]
>> >
>> > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
>> > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
>> > > [...]
>> > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
>> > >
>> > > Huh, do I get it right that the request was for a _single_ page? Why do
>> > > we need CMA for that?
>>
>> Ugh, good point. I assumed that was just the PFNs that it failed to migrate
>> away, but it seems that's indeed the whole requested range. Yeah sounds some
>> part of the dma-cma chain could be smarter and attempt CMA only for e.g.
>> costly orders.
>
> Is there any reason why the DMA api doesn't try the page allocator first
> before falling back to the CMA? I simply have a hard time to see why the
> CMA should be used (and fragment) for small requests size.
There actually may be reasons to always go with CMA even if small
regions are requested. CMA areas may be defined to map to particular
physical addresses and given device may require allocations from those
addresses. This may be more than just a matter of DMA address space.
I cannot give you specific examples though and I might be talking
nonsense.
> --
> Michal Hocko
> SUSE Labs
--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
On Thu 01-12-16 17:03:52, Michal Nazarewicz wrote:
> On Thu, Dec 01 2016, Michal Hocko wrote:
> > Let's also CC Marek
> >
> > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
> >> On 12/01/2016 08:21 AM, Michal Hocko wrote:
> >> > Forgot to CC Joonsoo. The email thread starts more or less here
> >> > http://lkml.kernel.org/r/[email protected]
> >> >
> >> > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> >> > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> >> > > [...]
> >> > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> >> > >
> >> > > Huh, do I get it right that the request was for a _single_ page? Why do
> >> > > we need CMA for that?
> >>
> >> Ugh, good point. I assumed that was just the PFNs that it failed to migrate
> >> away, but it seems that's indeed the whole requested range. Yeah sounds some
> >> part of the dma-cma chain could be smarter and attempt CMA only for e.g.
> >> costly orders.
> >
> > Is there any reason why the DMA api doesn't try the page allocator first
> > before falling back to the CMA? I simply have a hard time to see why the
> > CMA should be used (and fragment) for small requests size.
>
> There actually may be reasons to always go with CMA even if small
> regions are requested. CMA areas may be defined to map to particular
> physical addresses and given device may require allocations from those
> addresses. This may be more than just a matter of DMA address space.
> I cannot give you specific examples though and I might be talking
> nonsense.
I am not familiar with this code so I cannot really argue but a quick
look at rmem_cma_setup doesn't suggest any speicific placing or
anything...
--
Michal Hocko
SUSE Labs
On Thu, Dec 01 2016, Michal Hocko wrote:
> I am not familiar with this code so I cannot really argue but a quick
> look at rmem_cma_setup doesn't suggest any speicific placing or
> anything...
early_cma parses ‘cma’ command line argument which can specify where
exactly the default CMA area is to be located. Furthermore, CMA areas
can be assigned per-device (via the Device Tree IIRC).
--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
On 12/01/2016 10:02 PM, Michal Nazarewicz wrote:
> On Thu, Dec 01 2016, Michal Hocko wrote:
>> I am not familiar with this code so I cannot really argue but a quick
>> look at rmem_cma_setup doesn't suggest any speicific placing or
>> anything...
>
> early_cma parses ‘cma’ command line argument which can specify where
> exactly the default CMA area is to be located. Furthermore, CMA areas
> can be assigned per-device (via the Device Tree IIRC).
OK, but the context of this bug report is a generic cma pool and generic
dma alloc, which tries cma first and then fallback to
alloc_pages_node(). If a device really requires specific placing as you
suggest, then it probably uses a different allocation interface,
otherwise there would be some flag to disallow the alloc_pages_node()
fallback?
Am Donnerstag, den 01.12.2016, 15:11 +0100 schrieb Michal Hocko:
> Let's also CC Marek
>
> On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
> > On 12/01/2016 08:21 AM, Michal Hocko wrote:
> > > Forgot to CC Joonsoo. The email thread starts more or less here
> > > http://lkml.kernel.org/r/[email protected]
> > >
> > > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> > > > [...]
> > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> > > >
> > > > Huh, do I get it right that the request was for a _single_ page? Why do
> > > > we need CMA for that?
> >
> > Ugh, good point. I assumed that was just the PFNs that it failed to migrate
> > away, but it seems that's indeed the whole requested range. Yeah sounds some
> > part of the dma-cma chain could be smarter and attempt CMA only for e.g.
> > costly orders.
>
> Is there any reason why the DMA api doesn't try the page allocator first
> before falling back to the CMA? I simply have a hard time to see why the
> CMA should be used (and fragment) for small requests size.
On x86 that is true, but on ARM CMA is the only (low memory) region that
can change the memory attributes, by being excluded from the lowmem
section mapping. Changing the memory attributes to
uncached/writecombined for DMA is crucial on ARM to fulfill the
requirement that no there aren't any conflicting mappings of the same
physical page.
On ARM we can possibly do the optimization of asking the page allocator,
but only if we can request _only_ highmem pages.
Regards,
Lucas
On Fri, Dec 02, 2016 at 11:26:02AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 01.12.2016, 15:11 +0100 schrieb Michal Hocko:
> > Let's also CC Marek
> >
> > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
> > > On 12/01/2016 08:21 AM, Michal Hocko wrote:
> > > > Forgot to CC Joonsoo. The email thread starts more or less here
> > > > http://lkml.kernel.org/r/[email protected]
> > > >
> > > > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> > > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> > > > > [...]
> > > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> > > > >
> > > > > Huh, do I get it right that the request was for a _single_ page? Why do
> > > > > we need CMA for that?
> > >
> > > Ugh, good point. I assumed that was just the PFNs that it failed to migrate
> > > away, but it seems that's indeed the whole requested range. Yeah sounds some
> > > part of the dma-cma chain could be smarter and attempt CMA only for e.g.
> > > costly orders.
> >
> > Is there any reason why the DMA api doesn't try the page allocator first
> > before falling back to the CMA? I simply have a hard time to see why the
> > CMA should be used (and fragment) for small requests size.
>
> On x86 that is true, but on ARM CMA is the only (low memory) region that
> can change the memory attributes, by being excluded from the lowmem
> section mapping. Changing the memory attributes to
> uncached/writecombined for DMA is crucial on ARM to fulfill the
> requirement that no there aren't any conflicting mappings of the same
> physical page.
>
> On ARM we can possibly do the optimization of asking the page allocator,
> but only if we can request _only_ highmem pages.
>
So this memory allocation strategy should only apply to ARM and not x86 we
already had fall out couple year ago when Ubuntu decided to enable CMA on
x86 where it does not make sense as i don't think we have any single device
we care that is not behind an IOMMU and thus does not require contiguous
memory allocation.
The DMA API should only use CMA on architecture where it is necessary not
on all of them.
Cheers,
J?r?me