2013-03-08 05:54:46

by WANG Chao

[permalink] [raw]
Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

Hi, All

On 3.9-rc1, I load crash kernel with latest kexec-tools(up to 28d413a), but
2nd kernel panic at early time:
[ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer
[ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
[ 2.965426] Call Trace:
[ 2.967866] [<ffffffff81619acd>] panic+0xc1/0x1d0
[ 2.972644] [<ffffffff8131e50c>] swiotlb_tbl_map_single+0x27c/0x280
[ 2.978991] [<ffffffff8131ed59>] map_single+0x19/0x20
[ 2.984115] [<ffffffff8131ef1e>] swiotlb_map_page+0x6e/0x160
[ 2.989845] [<ffffffff81447e40>] usb_hcd_map_urb_for_dma+0x230/0x4a0
[ 2.996268] [<ffffffff81448345>] usb_hcd_submit_urb+0x295/0x8e0
[ 3.002258] [<ffffffff8109b90f>] ? __dequeue_entity+0x2f/0x50
[ 3.008076] [<ffffffff8101358e>] ? __switch_to+0x13e/0x4a0
[ 3.013632] [<ffffffff81449a2f>] usb_submit_urb+0xff/0x3d0
[ 3.019186] [<ffffffff816241be>] ? __schedule+0x3de/0x7e0
[ 3.024657] [<ffffffff8144ab6a>] usb_start_wait_urb+0x6a/0x160
[ 3.030560] [<ffffffff81189d15>] ? __kmalloc+0x55/0x210
[ 3.035856] [<ffffffff8144977e>] ? usb_alloc_urb+0x1e/0x50
[ 3.041411] [<ffffffff8144aece>] usb_control_msg+0xde/0x140
[ 3.047056] [<ffffffff81440680>] ? hub_port_init+0x310/0xaf0
[ 3.052785] [<ffffffff8144065b>] ? hub_port_init+0x2eb/0xaf0
[ 3.058515] [<ffffffff814406a8>] hub_port_init+0x338/0xaf0
[ 3.064071] [<ffffffff81407679>] ? update_autosuspend+0x39/0x60
[ 3.070062] [<ffffffff81407759>] ? pm_runtime_set_autosuspend_delay+0x49/0x70
[ 3.077264] [<ffffffff814438ba>] hub_port_connect_change+0x24a/0xaa0
[ 3.083684] [<ffffffff814443fa>] hub_events+0x2ea/0x910
[ 3.088981] [<ffffffff816241be>] ? __schedule+0x3de/0x7e0
[ 3.094451] [<ffffffff81444a55>] hub_thread+0x35/0x1e0
[ 3.099661] [<ffffffff81087480>] ? wake_up_bit+0x40/0x40
[ 3.105045] [<ffffffff81444a20>] ? hub_events+0x910/0x910
[ 3.110514] [<ffffffff81086a70>] kthread+0xc0/0xd0
[ 3.115378] [<ffffffff810869b0>] ? kthread_create_on_node+0x120/0x120
[ 3.121887] [<ffffffff8162e86c>] ret_from_fork+0x7c/0xb0
[ 3.127271] [<ffffffff810869b0>] ? kthread_create_on_node+0x120/0x120


Here's the full log:
# grep 'Crash' /proc/iomem
146000000-14dffffff : Crash kernel

# dmesg | grep -i reserving
[ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel (System RAM: 3977MB)

# kexec -p /boot/vmlinuz-3.9.0-rc1+ --command-line='console=ttyS0,115200n81'
# echo c > /proc/sysrq-trigger

[ 217.879315] SysRq : Trigger a crash
[ 217.882836] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 217.890674] IP: [<ffffffff813c19a6>] sysrq_handle_crash+0x16/0x20
[ 217.896773] PGD 13df22067 PUD 139726067 PMD 0
[ 217.901244] Oops: 0002 [#1] SMP
[ 217.904491] Modules linked in: lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ip
v4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg coretemp kvm_inte
l kvm e1000e iTCO_wdt crc32c_intel iTCO_vendor_support ptp ghash_clmulni_intel pps_core mei microcode pcspkr i2c_i801 lpc_ich mfd_core xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif i915 i2c_al
go_bit drm_kms_helper drm ahci libahci libata i2c_core video dm_mirror dm_region_hash dm_log dm_mod
[ 217.963690] CPU 0
[ 217.965526] Pid: 1206, comm: bash Not tainted 3.9.0-rc1+ #1 Intel Corporation 2012 Client Platform/Emerald Lake 2
[ 217.975948] RIP: 0010:[<ffffffff813c19a6>] [<ffffffff813c19a6>] sysrq_handle_crash+0x16/0x20
[ 217.984468] RSP: 0018:ffff8801367e9e38 EFLAGS: 00010092
[ 217.989765] RAX: 000000000000000f RBX: ffffffff819b67c0 RCX: ffff88014e20ffe8
[ 217.996881] RDX: 0000000000000000 RSI: ffff88014e20e3b8 RDI: 0000000000000063
[ 218.003998] RBP: ffff8801367e9e38 R08: ffffffff81c06280 R09: 0000000000000419
[ 218.011113] R10: 0000000000000002 R11: 0000000000000418 R12: 0000000000000063
[ 218.018230] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000007
[ 218.025346] FS: 00007fdd48ace740(0000) GS:ffff88014e200000(0000) knlGS:0000000000000000
[ 218.033416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 218.039147] CR2: 0000000000000000 CR3: 000000013a67c000 CR4: 00000000001407f0
[ 218.046263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 218.053379] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 218.060496] Process bash (pid: 1206, threadinfo ffff8801367e8000, task ffff88013e8ae5c0)
[ 218.068564] Stack:
[ 218.070570] ffff8801367e9e78 ffffffff813c2147 ffff88013e8ae5c0 0000000000000002
[ 218.078001] ffff88013c4f9200 00007fdd48ad1000 0000000000000002 ffff8801367e9f50
[ 218.085427] ffff8801367e9ea8 ffffffff813c21fa ffff88013c4f9200 00007fdd48ad1000
[ 218.092854] Call Trace:
[ 218.095298] [<ffffffff813c2147>] __handle_sysrq+0x127/0x190
[ 218.100947] [<ffffffff813c21fa>] write_sysrq_trigger+0x4a/0x50
[ 218.106854] [<ffffffff812064d5>] proc_reg_write+0x75/0xb0
[ 218.112329] [<ffffffff811a1d2c>] vfs_write+0xac/0x180
[ 218.117456] [<ffffffff811a2072>] sys_write+0x52/0xa0
[ 218.122499] [<ffffffff8162a23e>] ? do_page_fault+0xe/0x10
[ 218.127977] [<ffffffff8162e919>] system_call_fastpath+0x16/0x1b
[ 218.133970] Code: 89 ef e8 ee f7 ff ff eb c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 c7 05 64 44 84 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44
00 00 55 48 89 e5 53 48
[ 218.153653] RIP [<ffffffff813c19a6>] sysrq_handle_crash+0x16/0x20
[ 218.159834] RSP <ffff8801367e9e38>
[ 218.163311] CR2: 0000000000000000
I'm in purgatory
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.9.0-rc1+ (root@localhost) (gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) ) #1 SMP Wed Mar 6 23:38:21 EST 2013
[ 0.000000] Command line: console=ttyS0,115200n81 memmap=exactmap memmap=615K@4K memmap=130432K@5341184K elfcorehdr=5471616K memmap=2560K#2799016K memmap=84K#2801576K
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009abff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009ac00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000040005000-0x00000000a8aaefff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000a8aaf000-0x00000000a8af1fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000a8af2000-0x00000000aa5c3fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000aa5c4000-0x00000000aad69fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000aad6a000-0x00000000aafe9fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000aafea000-0x00000000aaffefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000aafff000-0x00000000aaffffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000ab000000-0x00000000af9fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed18000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff900000-0x00000000ffbfffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ffd00000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000014e5fffff] usable
[ 0.000000] e820: last_pfn = 0x14e600 max_arch_pfn = 0x400000000
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] e820: user-defined physical RAM map:
[ 0.000000] user: [mem 0x0000000000001000-0x000000000009abff] usable
[ 0.000000] user: [mem 0x00000000aad6a000-0x00000000aaffefff] ACPI data
[ 0.000000] user: [mem 0x0000000146000000-0x000000014df5ffff] usable
[ 0.000000] SMBIOS 2.6 present.
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x14df60 max_arch_pfn = 0x400000000
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106
[ 0.000000] total RAM covered: 3990M
[ 0.000000] gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 64K chunk_size: 128K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 64K chunk_size: 256K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 64K chunk_size: 512K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 64K chunk_size: 1M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 64K chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 64K chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 64K chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 64K chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 64K chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 128K chunk_size: 128K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 128K chunk_size: 256K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 128K chunk_size: 512K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 128K chunk_size: 1M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 128K chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 128K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 128K chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 128K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 128K chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 128K chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 128K chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 128K chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 128K chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 128K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 128K chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 256K chunk_size: 256K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 256K chunk_size: 512K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 256K chunk_size: 1M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 256K chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 256K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 256K chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 256K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 256K chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 256K chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 256K chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 256K chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 256K chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 256K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 256K chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 512K chunk_size: 512K num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 512K chunk_size: 1M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 512K chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 512K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 512K chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 512K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 512K chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 512K chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 512K chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 512K chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 512K chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 512K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 512K chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 1M chunk_size: 1M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 1M chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 1M chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 1M chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 1M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 1M chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 1M chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 1M chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 1M chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 1M chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 1M chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 1M chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 2M chunk_size: 2M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 2M chunk_size: 4M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 2M chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 2M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 2M chunk_size: 32M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 2M chunk_size: 64M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 2M chunk_size: 128M num_reg: 10 lose cover RAM: 0G
[ 0.000000] gran_size: 2M chunk_size: 256M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 2M chunk_size: 512M num_reg: 10 lose cover RAM: -256M
[ 0.000000] *BAD*gran_size: 2M chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 2M chunk_size: 2G num_reg: 10 lose cover RAM: -512M
[ 0.000000] gran_size: 4M chunk_size: 4M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 8M num_reg: 10 lose cover RAM: -2M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
[ 0.000000] gran_size: 4M chunk_size: 32M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 4M chunk_size: 64M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 4M chunk_size: 128M num_reg: 10 lose cover RAM: 2M
[ 0.000000] gran_size: 4M chunk_size: 256M num_reg: 10 lose cover RAM: 2M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 512M num_reg: 10 lose cover RAM: -254M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 1G num_reg: 10 lose cover RAM: -510M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 2G num_reg: 10 lose cover RAM: -510M
[ 0.000000] gran_size: 8M chunk_size: 8M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 16M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 32M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 64M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 128M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 256M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 512M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 1G num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 8M chunk_size: 2G num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 16M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 32M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 64M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 128M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 256M num_reg: 8 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 512M num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 1G num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 16M chunk_size: 2G num_reg: 9 lose cover RAM: 6M
[ 0.000000] gran_size: 32M chunk_size: 32M num_reg: 8 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 64M num_reg: 8 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 128M num_reg: 8 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 256M num_reg: 8 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 512M num_reg: 9 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 1G num_reg: 9 lose cover RAM: 22M
[ 0.000000] gran_size: 32M chunk_size: 2G num_reg: 9 lose cover RAM: 22M
[ 0.000000] gran_size: 64M chunk_size: 64M num_reg: 6 lose cover RAM: 86M
[ 0.000000] gran_size: 64M chunk_size: 128M num_reg: 6 lose cover RAM: 86M
[ 0.000000] gran_size: 64M chunk_size: 256M num_reg: 7 lose cover RAM: 86M
[ 0.000000] gran_size: 64M chunk_size: 512M num_reg: 8 lose cover RAM: 86M
[ 0.000000] gran_size: 64M chunk_size: 1G num_reg: 8 lose cover RAM: 86M
[ 0.000000] gran_size: 64M chunk_size: 2G num_reg: 8 lose cover RAM: 86M
[ 0.000000] gran_size: 128M chunk_size: 128M num_reg: 5 lose cover RAM: 150M
[ 0.000000] gran_size: 128M chunk_size: 256M num_reg: 7 lose cover RAM: 150M
[ 0.000000] gran_size: 128M chunk_size: 512M num_reg: 8 lose cover RAM: 150M
[ 0.000000] gran_size: 128M chunk_size: 1G num_reg: 8 lose cover RAM: 150M
[ 0.000000] gran_size: 128M chunk_size: 2G num_reg: 8 lose cover RAM: 150M
[ 0.000000] gran_size: 256M chunk_size: 256M num_reg: 3 lose cover RAM: 406M
[ 0.000000] gran_size: 256M chunk_size: 512M num_reg: 3 lose cover RAM: 406M
[ 0.000000] gran_size: 256M chunk_size: 1G num_reg: 4 lose cover RAM: 406M
[ 0.000000] gran_size: 256M chunk_size: 2G num_reg: 4 lose cover RAM: 406M
[ 0.000000] gran_size: 512M chunk_size: 512M num_reg: 3 lose cover RAM: 406M
[ 0.000000] gran_size: 512M chunk_size: 1G num_reg: 4 lose cover RAM: 406M
[ 0.000000] gran_size: 512M chunk_size: 2G num_reg: 4 lose cover RAM: 406M
[ 0.000000] gran_size: 1G chunk_size: 1G num_reg: 2 lose cover RAM: 918M
[ 0.000000] gran_size: 1G chunk_size: 2G num_reg: 2 lose cover RAM: 918M
[ 0.000000] gran_size: 2G chunk_size: 2G num_reg: 1 lose cover RAM: 1942M
[ 0.000000] mtrr_cleanup: can not find optimal value
[ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size
[ 0.000000] x2apic enabled by BIOS, switching to x2apic ops
[ 0.000000] e820: last_pfn = 0x9a max_arch_pfn = 0x400000000
[ 0.000000] found SMP MP-table at [mem 0x000fc9f0-0x000fc9ff] mapped at [ffff8800000fc9f0]
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x14de00000-0x14df5ffff]
[ 0.000000] init_memory_mapping: [mem 0x14c000000-0x14ddfffff]
[ 0.000000] init_memory_mapping: [mem 0x146000000-0x14bffffff]
[ 0.000000] ACPI: RSDP 00000000000f0410 00024 (v02 INTEL)
[ 0.000000] ACPI: XSDT 00000000aaffde18 00074 (v01 INTEL IVB-PPT 06222004 MSFT 00010013)
[ 0.000000] ACPI: FACP 00000000aafe3d98 000F4 (v04 INTEL IVB-PPT 06222004 MSFT 00010013)
[ 0.000000] ACPI: DSDT 00000000aafbe018 0FD25 (v02 INTEL IVB-PPT 00000000 INTL 20110623)
[ 0.000000] ACPI: FACS 00000000aafe9e40 00040
[ 0.000000] ACPI: APIC 00000000aaffcf18 000CC (v02 INTEL IVB-PPT 06222004 MSFT 00010013)
[ 0.000000] ACPI: HPET 00000000aafe8f18 00038 (v01 A M I PCHHPET 06222004 AMI. 00000003)
[ 0.000000] ACPI: SSDT 00000000aafe5018 010A8 (v01 TrmRef PtidDevc 00001000 INTL 20110623)
[ 0.000000] ACPI: SSDT 00000000aafe4a18 00461 (v01 AMI PerfTune 00001000 INTL 20110623)
[ 0.000000] ACPI: MCFG 00000000aafe8e98 0003C (v01 INTEL SNDYBRDG 06222004 MSFT 00000097)
[ 0.000000] ACPI: SSDT 00000000aafd2018 009AA (v01 PmRef Cpu0Ist 00003000 INTL 20110623)
[ 0.000000] ACPI: SSDT 00000000aafd1018 00A92 (v01 PmRef CpuPm 00003000 INTL 20110623)
[ 0.000000] ACPI: DMAR 00000000aafe4f18 000B8 (v01 INTEL SNB 00000001 INTL 00000001)
[ 0.000000] ACPI: FPDT 00000000aaff4018 00064 (v01 INTEL IVB-CPT 00010000 INTL 20111107)
[ 0.000000] Setting APIC routing to cluster x2apic.
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000014df5ffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x14df5ffff]
[ 0.000000] NODE_DATA [mem 0x14df39000-0x14df5ffff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal [mem 0x100000000-0x14df5ffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x00099fff]
[ 0.000000] node 0: [mem 0x146000000-0x14df5ffff]
[ 0.000000] ACPI: PM-Timer IO Port: 0x408
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[ 0.000000] smpboot: Allowing 16 CPUs, 8 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: 000000000009a000 - 00000000aad6a000
[ 0.000000] PM: Registered nosave memory: 00000000aad6a000 - 00000000aafff000
[ 0.000000] PM: Registered nosave memory: 00000000aafff000 - 0000000146000000
[ 0.000000] e820: [mem 0x0009ac00-0xaad69fff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:16 nr_cpu_ids:16 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 29 pages/cpu @ffff88014dc00000 s86272 r8192 d24320 u131072
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32227
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: console=ttyS0,115200n81 memmap=exactmap memmap=615K@4K memmap=130432K@5341184K elfcorehdr=5471616K memmap=2560K#2799016K memmap=84K#2801576K
[ 0.000000] PID hash table entries: 512 (order: 0, 4096 bytes)
[ 0.000000] __ex_table already sorted, skipping sort
[ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[ 0.000000] Cannot allocate SWIOTLB buffer
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 108184k/5471616k available (6346k kernel code, 5340572k absent, 22860k reserved, 3983k data, 1484k init)
[ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=4096 to nr_cpu_ids=16.
[ 0.000000] NR_IRQS:262400 nr_irqs:808 16
[ 0.000000] Spurious LAPIC timer interrupt on cpu 0
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.000000] allocated 1572864 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.001000] tsc: Fast TSC calibration using PIT
[ 0.002000] tsc: Detected 2593.881 MHz processor
[ 0.000005] Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.76 BogoMIPS (lpj=2593881)
[ 0.010628] pid_max: default: 32768 minimum: 301
[ 0.015311] Security Framework initialized
[ 0.019418] SELinux: Initializing.
[ 0.023012] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.030043] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.036777] Mount-cache hash table entries: 256
[ 0.041629] Initializing cgroup subsys cpuacct
[ 0.046079] Initializing cgroup subsys memory
[ 0.050447] Initializing cgroup subsys devices
[ 0.054891] Initializing cgroup subsys freezer
[ 0.059335] Initializing cgroup subsys net_cls
[ 0.063777] Initializing cgroup subsys blkio
[ 0.068045] Initializing cgroup subsys perf_event
[ 0.072794] CPU: Physical Processor ID: 0
[ 0.076806] CPU: Processor Core ID: 0
[ 0.081210] mce: CPU supports 9 MCE banks
[ 0.085254] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0
[ 0.085254] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32
[ 0.085254] tlb_flushall_shift: 1
[ 0.099844] Freeing SMP alternatives: 24k freed
[ 0.106487] ACPI: Core revision 20130117
[ 0.122586] ACPI: All ACPI Tables successfully acquired
[ 0.127868] ftrace: allocating 23601 entries in 93 pages
[ 0.154031] dmar: Host address width 36
[ 0.157876] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.163194] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020e60262 ecap f0101a
[ 0.171277] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.176590] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f010da
[ 0.184672] dmar: RMRR base: 0x000000aac95000 end: 0x000000aacb2fff
[ 0.190934] dmar: RMRR base: 0x000000ab800000 end: 0x000000af9fffff
[ 0.197268] IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.202825] HPET id 0 under DRHD base 0xfed91000
[ 0.207714] Enabled IRQ remapping in x2apic mode
[ 0.212866] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.228870] smpboot: CPU0: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz (fam: 06, model: 3a, stepping: 09)
[ 0.238319] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, Broken BIOS detected, complain to your hardware vendor.
[ 0.250098] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is b0)
[ 0.257739] Intel PMU driver.
[ 0.260701] ... version: 3
[ 0.264704] ... bit width: 48
[ 0.268790] ... generic registers: 4
[ 0.272791] ... value mask: 0000ffffffffffff
[ 0.278092] ... max period: 000000007fffffff
[ 0.283391] ... fixed-purpose events: 3
[ 0.287391] ... event mask: 000000070000000f
[ 0.294917] smpboot: Booting Node 0, Processors #1[ 0.314489] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
#2 #3 #4 #5 #6 #7
[ 0.408750] Brought up 8 CPUs
[ 0.411901] smpboot: Total of 8 processors activated (41502.09 BogoMIPS)
[ 0.432134] devtmpfs: initialized
[ 0.437882] atomic64 test passed for x86-64 platform with CX8 and with SSE
[ 0.444833] NET: Registered protocol family 16
[ 0.449455] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[ 0.457010] ACPI: bus type pci registered
[ 0.461122] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[ 0.470411] PCI: not using MMCONFIG
[ 0.473897] PCI: Using configuration type 1 for base access
[ 0.480792] bio: create slab <bio-0> at 0
[ 0.484937] ACPI: Added _OSI(Module Device)
[ 0.489116] ACPI: Added _OSI(Processor Device)
[ 0.493554] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.498254] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.509686] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.529844] ACPI: SSDT 00000000aaccd018 00A4F (v01 PmRef Cpu0Cst 00003001 INTL 20110623)
[ 0.538912] ACPI: Dynamic OEM Table Load:
[ 0.542947] ACPI: SSDT (null) 00A4F (v01 PmRef Cpu0Cst 00003001 INTL 20110623)
[ 0.556403] ACPI: SSDT 00000000aaccea98 00303 (v01 PmRef ApIst 00003000 INTL 20110623)
[ 0.565515] ACPI: Dynamic OEM Table Load:
[ 0.569549] ACPI: SSDT (null) 00303 (v01 PmRef ApIst 00003000 INTL 20110623)
[ 0.582778] ACPI: SSDT 00000000aacccd98 00119 (v01 PmRef ApCst 00003000 INTL 20110623)
[ 0.591819] ACPI: Dynamic OEM Table Load:
[ 0.595855] ACPI: SSDT (null) 00119 (v01 PmRef ApCst 00003000 INTL 20110623)
[ 1.418743] ACPI: Interpreter enabled
[ 1.422408] ACPI: (supports S0 S1ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\\_S2_] (20130117/hwxface-568)
[ 1.433515] S3 S4 S5)
[ 1.435935] ACPI: Using IOAPIC for interrupt routing
[ 1.440920] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[ 1.451100] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in ACPI motherboard resources
[ 1.464742] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 1.488550] ACPI: Power Resource [FN00] (off)
[ 1.493020] ACPI: Power Resource [FN01] (off)
[ 1.497481] ACPI: Power Resource [FN02] (on)
[ 1.502241] ACPI: Power Resource [FN03] (on)
[ 1.506935] ACPI: Power Resource [FN04] (on)
[ 1.512503] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
[ 1.519005] acpi PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[ 1.526995] acpi PNP0A08:00: Unable to request _OSC control (_OSC support mask: 0x08)
[ 1.535557] PCI host bridge to bus 0000:00
[ 1.539653] pci_bus 0000:00: root bus resource [bus 00-3e]
[ 1.545132] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
[ 1.551304] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
[ 1.557474] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[ 1.564338] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff]
[ 1.571201] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff]
[ 1.578064] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff]
[ 1.584923] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff]
[ 1.591784] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff]
[ 1.598644] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff]
[ 1.605503] pci_bus 0000:00: root bus resource [mem 0xafa00000-0xfeafffff]
[ 1.613008] pci 0000:00:14.0: System wakeup disabled by ACPI
[ 1.619582] pci 0000:00:19.0: System wakeup disabled by ACPI
[ 1.625584] pci 0000:00:1a.0: System wakeup disabled by ACPI
[ 1.631518] pci 0000:00:1c.0: System wakeup disabled by ACPI
[ 1.637463] pci 0000:00:1c.6: System wakeup disabled by ACPI
[ 1.643459] pci 0000:00:1d.0: System wakeup disabled by ACPI
[ 1.650069] pci 0000:00:1c.0: PCI bridge to [bus 01]
[ 1.655146] pci 0000:00:1c.6: PCI bridge to [bus 02-09]
[ 1.660411] ACPI _OSC control for PCIe not granted, disabling ASPM
[ 1.668220] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15), disabled.
[ 1.676298] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[ 1.684560] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11 12 14 15), disabled.
[ 1.692628] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *10 11 12 14 15), disabled.
[ 1.700698] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 10 11 12 14 15), disabled.
[ 1.708765] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[ 1.717023] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[ 1.725280] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *10 11 12 14 15), disabled.
[ 1.733756] ACPI: Enabled 5 GPEs in block 00 to 3F
[ 1.738777] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 1.746393] ACPI: ACPI Dock Station Driver: 1 docks/bays found
[ 1.752307] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 1.760387] vgaarb: loaded
[ 1.763093] vgaarb: bridge control possible 0000:00:02.0
[ 1.768498] SCSI subsystem initialized
[ 1.772270] ACPI: bus type usb registered
[ 1.776297] usbcore: registered new interface driver usbfs
[ 1.781783] usbcore: registered new interface driver hub
[ 1.787122] usbcore: registered new device driver usb
[ 1.792242] PCI: Using ACPI for IRQ routing
[ 1.800410] NetLabel: Initializing
[ 1.803809] NetLabel: domain hash size = 128
[ 1.808159] NetLabel: protocols = UNLABELED CIPSOv4
[ 1.813125] NetLabel: unlabeled traffic allowed by default
[ 1.818736] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
[ 1.825032] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[ 1.832871] Switching to clocksource hpet
[ 1.845000] pnp: PnP ACPI init
[ 1.848089] ACPI: bus type pnp registered
[ 1.852490] system 00:00: [io 0x06a4] has been reserved
[ 1.857802] system 00:00: [io 0x06a0] has been reserved
[ 1.863449] system 00:05: [io 0x0680-0x069f] has been reserved
[ 1.869370] system 00:05: [io 0x1004-0x1013] has been reserved
[ 1.875291] system 00:05: [io 0xffff] has been reserved
[ 1.880604] system 00:05: [io 0xffff] has been reserved
[ 1.885918] system 00:05: [io 0x0400-0x0453] has been reserved
[ 1.891836] system 00:05: [io 0x0458-0x047f] has been reserved
[ 1.897754] system 00:05: [io 0x0500-0x057f] has been reserved
[ 1.903673] system 00:05: [io 0x164e-0x164f] has been reserved
[ 1.909718] system 00:07: [io 0x0454-0x0457] has been reserved
[ 1.916807] system 00:0b: [mem 0xfed1c000-0xfed1ffff] has been reserved
[ 1.923425] system 00:0b: [mem 0xfed10000-0xfed17fff] has been reserved
[ 1.930043] system 00:0b: [mem 0xfed18000-0xfed18fff] has been reserved
[ 1.936656] system 00:0b: [mem 0xfed19000-0xfed19fff] has been reserved
[ 1.943267] system 00:0b: [mem 0xf8000000-0xfbffffff] has been reserved
[ 1.949879] system 00:0b: [mem 0xfed20000-0xfed3ffff] has been reserved
[ 1.956492] system 00:0b: [mem 0xfed90000-0xfed93fff] could not be reserved
[ 1.963449] system 00:0b: [mem 0xfed45000-0xfed8ffff] has been reserved
[ 1.970062] system 00:0b: [mem 0xff000000-0xffffffff] has been reserved
[ 1.976677] system 00:0b: [mem 0xfee00000-0xfeefffff] has been reserved
[ 1.983292] system 00:0b: [mem 0xafa00000-0xafa00fff] has been reserved
[ 1.990474] system 00:0c: [mem 0x00000000-0x0009cfff] could not be reserved
[ 1.997547] system 00:0d: [mem 0x20000000-0x201fffff] has been reserved
[ 2.004160] system 00:0d: [mem 0x40004000-0x40004fff] has been reserved
[ 2.010844] pnp: PnP ACPI: found 14 devices
[ 2.015025] ACPI: ACPI bus type pnp unregistered
[ 2.027290] pci 0000:00:1c.0: PCI bridge to [bus 01]
[ 2.032277] pci 0000:00:1c.6: PCI bridge to [bus 02-09]
[ 2.037501] pci 0000:00:1c.6: bridge window [io 0x2000-0x5fff]
[ 2.043599] pci 0000:00:1c.6: bridge window [mem 0xb0000000-0xb10fffff]
[ 2.050391] pci 0000:00:1c.6: bridge window [mem 0xd0000000-0xd10fffff 64bit pref]
[ 2.058446] NET: Registered protocol family 2
[ 2.063068] TCP established hash table entries: 1024 (order: 2, 16384 bytes)
[ 2.070133] TCP bind hash table entries: 1024 (order: 2, 16384 bytes)
[ 2.076578] TCP: Hash tables configured (established 1024 bind 1024)
[ 2.082996] TCP: reno registered
[ 2.086255] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 2.092092] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 2.098509] NET: Registered protocol family 1
[ 2.103710] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 2.110160] software IO TLB: No low mem
[ 2.116187] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[ 2.123111] Initialise module verification
[ 2.127274] audit: initializing netlink socket (disabled)
[ 2.132684] type=2000 audit(1362698281.479:1): initialized
[ 2.185575] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 2.193943] VFS: Disk quotas dquot_6.5.2
[ 2.197950] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 2.205330] msgmni has been set to 211
[ 2.210287] alg: No test for stdrng (krng)
[ 2.214402] NET: Registered protocol family 38
[ 2.218851] Key type asymmetric registered
[ 2.222951] Asymmetric key parser 'x509' registered
[ 2.227888] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 2.235322] io scheduler noop registered
[ 2.239247] io scheduler deadline registered (default)
[ 2.244403] io scheduler cfq registered
[ 2.248531] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[ 2.254132] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[ 2.260741] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 2.267305] acpiphp_glue: can't evaluate _ADR (0x5)
[ 2.272577] ACPI: AC Adapter [ADP1] (on-line)
[ 2.277080] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
[ 2.285391] ACPI: Lid Switch [LID0]
[ 2.288954] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
[ 2.297299] ACPI: Power Button [PWRB]
[ 2.301017] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[ 2.308410] ACPI: Power Button [PWRF]
[ 2.312155] ACPI: Fan [FAN0] (off)
[ 2.315615] ACPI: Fan [FAN1] (off)
[ 2.319067] ACPI: Fan [FAN2] (on)
[ 2.322435] ACPI: Fan [FAN3] (on)
[ 2.325793] ACPI: Fan [FAN4] (on)
[ 2.329202] ACPI: Requesting acpi_cpufreq
[ 2.340021] thermal LNXTHERM:00: registered as thermal_zone0
[ 2.345688] ACPI: Thermal Zone [TZ00] (32 C)
[ 2.351008] thermal LNXTHERM:01: registered as thermal_zone1
[ 2.356672] ACPI: Thermal Zone [TZ01] (33 C)
[ 2.364912] GHES: HEST is not enabled!
[ 2.365203] ACPI: Battery Slot [BAT0] (battery present)
[ 2.365238] ACPI: Battery Slot [BAT1] (battery absent)
[ 2.365350] ACPI: Battery Slot [BAT2] (battery absent)
[ 2.384217] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 2.411446] 00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 2.438082] serial8250: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
[ 2.464897] 0000:00:16.3: ttyS1 at I/O 0x60e0 (irq = 19) is a 16550A
[ 2.471729] Non-volatile memory driver v1.3
[ 2.475917] Linux agpgart interface v0.103
[ 2.481169] loop: module loaded
[ 2.484342] rdac: device handler registered
[ 2.488599] hp_sw: device handler registered
[ 2.492869] emc: device handler registered
[ 2.496970] alua: device handler registered
[ 2.501205] libphy: Fixed MDIO Bus: probed
[ 2.505370] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 2.511893] ehci-pci: EHCI PCI platform driver
[ 2.516470] ehci-pci 0000:00:1a.0: EHCI Host Controller
[ 2.521763] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1
[ 2.529178] ehci-pci 0000:00:1a.0: debug port 2
[ 2.537665] ehci-pci 0000:00:1a.0: irq 16, io mem 0xb1160000
[ 2.548387] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[ 2.554157] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[ 2.560941] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.568161] usb usb1: Product: EHCI Host Controller
[ 2.573041] usb usb1: Manufacturer: Linux 3.9.0-rc1+ ehci_hcd
[ 2.578784] usb usb1: SerialNumber: 0000:00:1a.0
[ 2.583547] hub 1-0:1.0: USB hub found
[ 2.587305] hub 1-0:1.0: 3 ports detected
[ 2.591639] ehci-pci 0000:00:1d.0: EHCI Host Controller
[ 2.596930] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 2
[ 2.604343] ehci-pci 0000:00:1d.0: debug port 2
[ 2.612811] ehci-pci 0000:00:1d.0: irq 23, io mem 0xb1150000
[ 2.624437] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[ 2.630204] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[ 2.636987] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.644206] usb usb2: Product: EHCI Host Controller
[ 2.649084] usb usb2: Manufacturer: Linux 3.9.0-rc1+ ehci_hcd
[ 2.654830] usb usb2: SerialNumber: 0000:00:1d.0
[ 2.659586] hub 2-0:1.0: USB hub found
[ 2.663345] hub 2-0:1.0: 3 ports detected
[ 2.667593] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 2.673792] uhci_hcd: USB Universal Host Controller Interface driver
[ 2.680291] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 2.685578] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
[ 2.693119] xhci_hcd 0000:00:14.0: irq 16, io mem 0xb11a0000
[ 2.698942] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[ 2.705729] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.712948] usb usb3: Product: xHCI Host Controller
[ 2.717823] usb usb3: Manufacturer: Linux 3.9.0-rc1+ xhci_hcd
[ 2.723564] usb usb3: SerialNumber: 0000:00:14.0
[ 2.728328] hub 3-0:1.0: USB hub found
[ 2.732094] hub 3-0:1.0: 4 ports detected
[ 2.736783] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 2.742078] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4
[ 2.749514] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[ 2.756299] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.763516] usb usb4: Product: xHCI Host Controller
[ 2.768400] usb usb4: Manufacturer: Linux 3.9.0-rc1+ xhci_hcd
[ 2.774148] usb usb4: SerialNumber: 0000:00:14.0
[ 2.778964] hub 4-0:1.0: USB hub found
[ 2.782737] hub 4-0:1.0: 4 ports detected
[ 2.791587] usbcore: registered new interface driver usbserial
[ 2.797430] usbcore: registered new interface driver usbserial_generic
[ 2.803956] usbserial: USB Serial support registered for generic
[ 2.810001] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[ 2.816782] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[ 2.827473] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 2.832534] mousedev: PS/2 mouse device common for all mice
[ 2.838423] rtc_cmos 00:06: RTC can wake from S4
[ 2.843292] rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0
[ 2.849434] rtc_cmos 00:06: alarms up to one month, y3k, 242 bytes nvram, hpet irqs
[ 2.857324] cpuidle: using governor ladder
[ 2.861761] cpuidle: using governor menu
[ 2.866158] EFI Variables Facility v0.08 2004-May-17
[ 2.871140] hidraw: raw HID events driver (C) Jiri Kosina
[ 2.873905] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
[ 2.885248] usbcore: registered new interface driver usbhid
[ 2.890818] usbhid: USB HID core driver
[ 2.894647] usb 1-1: new high-speed USB device number 2 using ehci-pci
[ 2.894695] drop_monitor: Initializing network drop monitor service
[ 2.894813] TCP: cubic registered
[ 2.894815] Initializing XFRM netlink socket
[ 2.894988] NET: Registered protocol family 10
[ 2.895269] NET: Registered protocol family 17
[ 2.895770] Loading module verification certificates
[ 2.897372] MODSIGN: Loaded cert 'Magrathea: Glacier signing key: b14fa6fba81316fe0bb193bbf458deba6d430978'
[ 2.897384] registered taskstats version 1
[ 2.897388] IMA: No TPM chip found, activating TPM-bypass!
[ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer
[ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
[ 2.965426] Call Trace:
[ 2.967866] [<ffffffff81619acd>] panic+0xc1/0x1d0
[ 2.972644] [<ffffffff8131e50c>] swiotlb_tbl_map_single+0x27c/0x280
[ 2.978991] [<ffffffff8131ed59>] map_single+0x19/0x20
[ 2.984115] [<ffffffff8131ef1e>] swiotlb_map_page+0x6e/0x160
[ 2.989845] [<ffffffff81447e40>] usb_hcd_map_urb_for_dma+0x230/0x4a0
[ 2.996268] [<ffffffff81448345>] usb_hcd_submit_urb+0x295/0x8e0
[ 3.002258] [<ffffffff8109b90f>] ? __dequeue_entity+0x2f/0x50
[ 3.008076] [<ffffffff8101358e>] ? __switch_to+0x13e/0x4a0
[ 3.013632] [<ffffffff81449a2f>] usb_submit_urb+0xff/0x3d0
[ 3.019186] [<ffffffff816241be>] ? __schedule+0x3de/0x7e0
[ 3.024657] [<ffffffff8144ab6a>] usb_start_wait_urb+0x6a/0x160
[ 3.030560] [<ffffffff81189d15>] ? __kmalloc+0x55/0x210
[ 3.035856] [<ffffffff8144977e>] ? usb_alloc_urb+0x1e/0x50
[ 3.041411] [<ffffffff8144aece>] usb_control_msg+0xde/0x140
[ 3.047056] [<ffffffff81440680>] ? hub_port_init+0x310/0xaf0
[ 3.052785] [<ffffffff8144065b>] ? hub_port_init+0x2eb/0xaf0
[ 3.058515] [<ffffffff814406a8>] hub_port_init+0x338/0xaf0
[ 3.064071] [<ffffffff81407679>] ? update_autosuspend+0x39/0x60
[ 3.070062] [<ffffffff81407759>] ? pm_runtime_set_autosuspend_delay+0x49/0x70
[ 3.077264] [<ffffffff814438ba>] hub_port_connect_change+0x24a/0xaa0
[ 3.083684] [<ffffffff814443fa>] hub_events+0x2ea/0x910
[ 3.088981] [<ffffffff816241be>] ? __schedule+0x3de/0x7e0
[ 3.094451] [<ffffffff81444a55>] hub_thread+0x35/0x1e0
[ 3.099661] [<ffffffff81087480>] ? wake_up_bit+0x40/0x40
[ 3.105045] [<ffffffff81444a20>] ? hub_events+0x910/0x910
[ 3.110514] [<ffffffff81086a70>] kthread+0xc0/0xd0
[ 3.115378] [<ffffffff810869b0>] ? kthread_create_on_node+0x120/0x120
[ 3.121887] [<ffffffff8162e86c>] ret_from_fork+0x7c/0xb0
[ 3.127271] [<ffffffff810869b0>] ? kthread_create_on_node+0x120/0x120

--
Thanks,
WANG Chao


2013-03-08 06:03:48

by Qian Cai

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

CC'ing kexec ML. Also mentioned that 3.8 has no such issue.

This message looks suspicious and out of range while 3.8 reservation
looks within the range.

[ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel
(System RAM: 3977MB)

Wondering if anything to do with memblock again...

CAI Qian

----- Original Message -----
> From: "WANG Chao" <[email protected]>
> To: "LKML" vger.kernel.org>
> Cc: "CAI Qian" <[email protected]>
> Sent: Friday, March 8, 2013 1:54:37 PM
> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you
> with the DMA bounce buffer
>
> Hi, All
>
> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
> 28d413a), but
> 2nd kernel panic at early time:
> [ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
> buffer earlier and can't now provide you with the DMA bounce buffer
> [ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
> [ 2.965426] Call Trace:
> [ 2.967866] [] panic+0xc1/0x1d0
> [ 2.972644] []
> swiotlb_tbl_map_single+0x27c/0x280
> [ 2.978991] [] map_single+0x19/0x20
> [ 2.984115] [] swiotlb_map_page+0x6e/0x160
> [ 2.989845] []
> usb_hcd_map_urb_for_dma+0x230/0x4a0
> [ 2.996268] [] usb_hcd_submit_urb+0x295/0x8e0
> [ 3.002258] [] ? __dequeue_entity+0x2f/0x50
> [ 3.008076] [] ? __switch_to+0x13e/0x4a0
> [ 3.013632] [] usb_submit_urb+0xff/0x3d0
> [ 3.019186] [] ? __schedule+0x3de/0x7e0
> [ 3.024657] [] usb_start_wait_urb+0x6a/0x160
> [ 3.030560] [] ? __kmalloc+0x55/0x210
> [ 3.035856] [] ? usb_alloc_urb+0x1e/0x50
> [ 3.041411] [] usb_control_msg+0xde/0x140
> [ 3.047056] [] ? hub_port_init+0x310/0xaf0
> [ 3.052785] [] ? hub_port_init+0x2eb/0xaf0
> [ 3.058515] [] hub_port_init+0x338/0xaf0
> [ 3.064071] [] ? update_autosuspend+0x39/0x60
> [ 3.070062] [] ?
> pm_runtime_set_autosuspend_delay+0x49/0x70
> [ 3.077264] []
> hub_port_connect_change+0x24a/0xaa0
> [ 3.083684] [] hub_events+0x2ea/0x910
> [ 3.088981] [] ? __schedule+0x3de/0x7e0
> [ 3.094451] [] hub_thread+0x35/0x1e0
> [ 3.099661] [] ? wake_up_bit+0x40/0x40
> [ 3.105045] [] ? hub_events+0x910/0x910
> [ 3.110514] [] kthread+0xc0/0xd0
> [ 3.115378] [] ?
> kthread_create_on_node+0x120/0x120
> [ 3.121887] [] ret_from_fork+0x7c/0xb0
> [ 3.127271] [] ?
> kthread_create_on_node+0x120/0x120
>
>
> Here's the full log:
> # grep 'Crash' /proc/iomem
> 146000000-14dffffff : Crash kernel
>
> # dmesg | grep -i reserving
> [ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel
> (System RAM: 3977MB)
>
> # kexec -p /boot/vmlinuz-3.9.0-rc1+
> --command-line='console=ttyS0,115200n81'
> # echo c > /proc/sysrq-trigger
>
> [ 217.879315] SysRq : Trigger a crash
> [ 217.882836] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 217.890674] IP: [] sysrq_handle_crash+0x16/0x20
> [ 217.896773] PGD 13df22067 PUD 139726067 PMD 0
> [ 217.901244] Oops: 0002 [#1] SMP
> [ 217.904491] Modules linked in: lockd sunrpc
> nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
> ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> iptable_nat nf_nat_ip
> v4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter ip_tables sg coretemp kvm_inte
> l kvm e1000e iTCO_wdt crc32c_intel iTCO_vendor_support ptp
> ghash_clmulni_intel pps_core mei microcode pcspkr i2c_i801 lpc_ich
> mfd_core xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif i915 i2c_al
> go_bit drm_kms_helper drm ahci libahci libata i2c_core video
> dm_mirror dm_region_hash dm_log dm_mod
> [ 217.963690] CPU 0
> [ 217.965526] Pid: 1206, comm: bash Not tainted 3.9.0-rc1+ #1 Intel
> Corporation 2012 Client Platform/Emerald Lake 2
> [ 217.975948] RIP: 0010:[] []
> sysrq_handle_crash+0x16/0x20
> [ 217.984468] RSP: 0018:ffff8801367e9e38 EFLAGS: 00010092
> [ 217.989765] RAX: 000000000000000f RBX: ffffffff819b67c0 RCX:
> ffff88014e20ffe8
> [ 217.996881] RDX: 0000000000000000 RSI: ffff88014e20e3b8 RDI:
> 0000000000000063
> [ 218.003998] RBP: ffff8801367e9e38 R08: ffffffff81c06280 R09:
> 0000000000000419
> [ 218.011113] R10: 0000000000000002 R11: 0000000000000418 R12:
> 0000000000000063
> [ 218.018230] R13: 0000000000000286 R14: 0000000000000000 R15:
> 0000000000000007
> [ 218.025346] FS: 00007fdd48ace740(0000) GS:ffff88014e200000(0000)
> knlGS:0000000000000000
> [ 218.033416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 218.039147] CR2: 0000000000000000 CR3: 000000013a67c000 CR4:
> 00000000001407f0
> [ 218.046263] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 218.053379] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 218.060496] Process bash (pid: 1206, threadinfo ffff8801367e8000,
> task ffff88013e8ae5c0)
> [ 218.068564] Stack:
> [ 218.070570] ffff8801367e9e78 ffffffff813c2147 ffff88013e8ae5c0
> 0000000000000002
> [ 218.078001] ffff88013c4f9200 00007fdd48ad1000 0000000000000002
> ffff8801367e9f50
> [ 218.085427] ffff8801367e9ea8 ffffffff813c21fa ffff88013c4f9200
> 00007fdd48ad1000
> [ 218.092854] Call Trace:
> [ 218.095298] [] __handle_sysrq+0x127/0x190
> [ 218.100947] [] write_sysrq_trigger+0x4a/0x50
> [ 218.106854] [] proc_reg_write+0x75/0xb0
> [ 218.112329] [] vfs_write+0xac/0x180
> [ 218.117456] [] sys_write+0x52/0xa0
> [ 218.122499] [] ? do_page_fault+0xe/0x10
> [ 218.127977] [] system_call_fastpath+0x16/0x1b
> [ 218.133970] Code: 89 ef e8 ee f7 ff ff eb c3 66 2e 0f 1f 84 00 00
> 00 00 00 66 90 0f 1f 44 00 00 55 c7 05 64 44 84 00 01 00 00 00 48 89
> e5 0f ae f8 04 25 00 00 00 00 01 5d c3 0f 1f 44
> 00 00 55 48 89 e5 53 48
> [ 218.153653] RIP [] sysrq_handle_crash+0x16/0x20
> [ 218.159834] RSP
> [ 218.163311] CR2: 0000000000000000
> I'm in purgatory
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 3.9.0-rc1+ (root@localhost) (gcc version
> 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) ) #1 SMP Wed Mar 6 23:38:21
> EST 2013
> [ 0.000000] Command line: console=ttyS0,115200n81 memmap=exactmap
> memmap=615K@4K memmap=130432K@5341184K elfcorehdr=5471616K
> memmap=2560K#2799016K memmap=84K#2801576K
> [ 0.000000] e820: BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009abff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x000000000009ac00-0x000000000009ffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000040005000-0x00000000a8aaefff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x00000000a8aaf000-0x00000000a8af1fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000a8af2000-0x00000000aa5c3fff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x00000000aa5c4000-0x00000000aad69fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000aad6a000-0x00000000aafe9fff]
> ACPI NVS
> [ 0.000000] BIOS-e820: [mem 0x00000000aafea000-0x00000000aaffefff]
> ACPI data
> [ 0.000000] BIOS-e820: [mem 0x00000000aafff000-0x00000000aaffffff]
> usable
> [ 0.000000] BIOS-e820: [mem 0x00000000ab000000-0x00000000af9fffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fed18000-0x00000000fed19fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000ff900000-0x00000000ffbfffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000ffd00000-0x00000000ffffffff]
> reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000014e5fffff]
> usable
> [ 0.000000] e820: last_pfn = 0x14e600 max_arch_pfn = 0x400000000
> [ 0.000000] NX (Execute Disable) protection: active
> [ 0.000000] e820: user-defined physical RAM map:
> [ 0.000000] user: [mem 0x0000000000001000-0x000000000009abff]
> usable
> [ 0.000000] user: [mem 0x00000000aad6a000-0x00000000aaffefff] ACPI
> data
> [ 0.000000] user: [mem 0x0000000146000000-0x000000014df5ffff]
> usable
> [ 0.000000] SMBIOS 2.6 present.
> [ 0.000000] No AGP bridge found
> [ 0.000000] e820: last_pfn = 0x14df60 max_arch_pfn = 0x400000000
> [ 0.000000] x86 PAT enabled: cpu 0, old 0x7010600070106, new
> 0x7010600070106
> [ 0.000000] total RAM covered: 3990M
> [ 0.000000] gran_size: 64K chunk_size: 64K num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 64K chunk_size: 128K num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 64K chunk_size: 256K num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 64K chunk_size: 512K num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 64K chunk_size: 1M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] gran_size: 64K chunk_size: 2M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 64K chunk_size: 32M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 64K chunk_size: 64M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 64K chunk_size: 256M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 64K chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 128K chunk_size: 128K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 128K chunk_size: 256K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 128K chunk_size: 512K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 128K chunk_size: 1M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 128K chunk_size: 2M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 128K chunk_size: 32M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 128K chunk_size: 64M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 128K chunk_size: 128M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 128K chunk_size: 256M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 128K chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 256K chunk_size: 256K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 256K chunk_size: 512K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 256K chunk_size: 1M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 256K chunk_size: 2M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 256K chunk_size: 32M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 256K chunk_size: 64M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 256K chunk_size: 128M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 256K chunk_size: 256M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 256K chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 512K chunk_size: 512K
> num_reg: 10 lose cover RAM: 2M
> [ 0.000000] gran_size: 512K chunk_size: 1M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 512K chunk_size: 2M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 512K chunk_size: 32M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 512K chunk_size: 64M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 512K chunk_size: 128M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] gran_size: 512K chunk_size: 256M
> num_reg: 10 lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 512K chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 1M chunk_size: 1M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] gran_size: 1M chunk_size: 2M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 1M chunk_size: 32M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 1M chunk_size: 64M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 1M chunk_size: 128M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 1M chunk_size: 256M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 1M chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 2M chunk_size: 2M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 4M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 2M chunk_size: 32M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 2M chunk_size: 64M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 2M chunk_size: 128M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] gran_size: 2M chunk_size: 256M num_reg: 10
> lose cover RAM: 0G
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 512M
> num_reg: 10 lose cover RAM: -256M
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 1G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] *BAD*gran_size: 2M chunk_size: 2G num_reg: 10
> lose cover RAM: -512M
> [ 0.000000] gran_size: 4M chunk_size: 4M num_reg: 10 lose
> cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 4M chunk_size: 8M num_reg: 10
> lose cover RAM: -2M
> [ 0.000000] *BAD*gran_size: 4M chunk_size: 16M
> num_reg: 10 lose cover RAM: -10M
> [ 0.000000] gran_size: 4M chunk_size: 32M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 4M chunk_size: 64M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 4M chunk_size: 128M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] gran_size: 4M chunk_size: 256M num_reg: 10
> lose cover RAM: 2M
> [ 0.000000] *BAD*gran_size: 4M chunk_size: 512M
> num_reg: 10 lose cover RAM: -254M
> [ 0.000000] *BAD*gran_size: 4M chunk_size: 1G num_reg: 10
> lose cover RAM: -510M
> [ 0.000000] *BAD*gran_size: 4M chunk_size: 2G num_reg: 10
> lose cover RAM: -510M
> [ 0.000000] gran_size: 8M chunk_size: 8M num_reg: 9 lose
> cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 16M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 32M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 64M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 128M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 256M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 512M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 1G num_reg: 9 lose
> cover RAM: 6M
> [ 0.000000] gran_size: 8M chunk_size: 2G num_reg: 9 lose
> cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 16M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 32M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 64M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 128M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 256M num_reg: 8
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 512M num_reg: 9
> lose cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 1G num_reg: 9 lose
> cover RAM: 6M
> [ 0.000000] gran_size: 16M chunk_size: 2G num_reg: 9 lose
> cover RAM: 6M
> [ 0.000000] gran_size: 32M chunk_size: 32M num_reg: 8
> lose cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 64M num_reg: 8
> lose cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 128M num_reg: 8
> lose cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 256M num_reg: 8
> lose cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 512M num_reg: 9
> lose cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 1G num_reg: 9 lose
> cover RAM: 22M
> [ 0.000000] gran_size: 32M chunk_size: 2G num_reg: 9 lose
> cover RAM: 22M
> [ 0.000000] gran_size: 64M chunk_size: 64M num_reg: 6
> lose cover RAM: 86M
> [ 0.000000] gran_size: 64M chunk_size: 128M num_reg: 6
> lose cover RAM: 86M
> [ 0.000000] gran_size: 64M chunk_size: 256M num_reg: 7
> lose cover RAM: 86M
> [ 0.000000] gran_size: 64M chunk_size: 512M num_reg: 8
> lose cover RAM: 86M
> [ 0.000000] gran_size: 64M chunk_size: 1G num_reg: 8 lose
> cover RAM: 86M
> [ 0.000000] gran_size: 64M chunk_size: 2G num_reg: 8 lose
> cover RAM: 86M
> [ 0.000000] gran_size: 128M chunk_size: 128M
> num_reg: 5 lose cover RAM: 150M
> [ 0.000000] gran_size: 128M chunk_size: 256M
> num_reg: 7 lose cover RAM: 150M
> [ 0.000000] gran_size: 128M chunk_size: 512M
> num_reg: 8 lose cover RAM: 150M
> [ 0.000000] gran_size: 128M chunk_size: 1G num_reg: 8
> lose cover RAM: 150M
> [ 0.000000] gran_size: 128M chunk_size: 2G num_reg: 8
> lose cover RAM: 150M
> [ 0.000000] gran_size: 256M chunk_size: 256M
> num_reg: 3 lose cover RAM: 406M
> [ 0.000000] gran_size: 256M chunk_size: 512M
> num_reg: 3 lose cover RAM: 406M
> [ 0.000000] gran_size: 256M chunk_size: 1G num_reg: 4
> lose cover RAM: 406M
> [ 0.000000] gran_size: 256M chunk_size: 2G num_reg: 4
> lose cover RAM: 406M
> [ 0.000000] gran_size: 512M chunk_size: 512M
> num_reg: 3 lose cover RAM: 406M
> [ 0.000000] gran_size: 512M chunk_size: 1G num_reg: 4
> lose cover RAM: 406M
> [ 0.000000] gran_size: 512M chunk_size: 2G num_reg: 4
> lose cover RAM: 406M
> [ 0.000000] gran_size: 1G chunk_size: 1G num_reg: 2 lose
> cover RAM: 918M
> [ 0.000000] gran_size: 1G chunk_size: 2G num_reg: 2 lose
> cover RAM: 918M
> [ 0.000000] gran_size: 2G chunk_size: 2G num_reg: 1 lose
> cover RAM: 1942M
> [ 0.000000] mtrr_cleanup: can not find optimal value
> [ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size
> [ 0.000000] x2apic enabled by BIOS, switching to x2apic ops
> [ 0.000000] e820: last_pfn = 0x9a max_arch_pfn = 0x400000000
> [ 0.000000] found SMP MP-table at [mem 0x000fc9f0-0x000fc9ff]
> mapped at [ffff8800000fc9f0]
> [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.000000] init_memory_mapping: [mem 0x14de00000-0x14df5ffff]
> [ 0.000000] init_memory_mapping: [mem 0x14c000000-0x14ddfffff]
> [ 0.000000] init_memory_mapping: [mem 0x146000000-0x14bffffff]
> [ 0.000000] ACPI: RSDP 00000000000f0410 00024 (v02 INTEL)
> [ 0.000000] ACPI: XSDT 00000000aaffde18 00074 (v01 INTEL IVB-PPT
> 06222004 MSFT 00010013)
> [ 0.000000] ACPI: FACP 00000000aafe3d98 000F4 (v04 INTEL IVB-PPT
> 06222004 MSFT 00010013)
> [ 0.000000] ACPI: DSDT 00000000aafbe018 0FD25 (v02 INTEL IVB-PPT
> 00000000 INTL 20110623)
> [ 0.000000] ACPI: FACS 00000000aafe9e40 00040
> [ 0.000000] ACPI: APIC 00000000aaffcf18 000CC (v02 INTEL IVB-PPT
> 06222004 MSFT 00010013)
> [ 0.000000] ACPI: HPET 00000000aafe8f18 00038 (v01 A M I PCHHPET
> 06222004 AMI. 00000003)
> [ 0.000000] ACPI: SSDT 00000000aafe5018 010A8 (v01 TrmRef PtidDevc
> 00001000 INTL 20110623)
> [ 0.000000] ACPI: SSDT 00000000aafe4a18 00461 (v01 AMI PerfTune
> 00001000 INTL 20110623)
> [ 0.000000] ACPI: MCFG 00000000aafe8e98 0003C (v01 INTEL SNDYBRDG
> 06222004 MSFT 00000097)
> [ 0.000000] ACPI: SSDT 00000000aafd2018 009AA (v01 PmRef Cpu0Ist
> 00003000 INTL 20110623)
> [ 0.000000] ACPI: SSDT 00000000aafd1018 00A92 (v01 PmRef CpuPm
> 00003000 INTL 20110623)
> [ 0.000000] ACPI: DMAR 00000000aafe4f18 000B8 (v01 INTEL SNB
> 00000001 INTL 00000001)
> [ 0.000000] ACPI: FPDT 00000000aaff4018 00064 (v01 INTEL IVB-CPT
> 00010000 INTL 20111107)
> [ 0.000000] Setting APIC routing to cluster x2apic.
> [ 0.000000] No NUMA configuration found
> [ 0.000000] Faking a node at [mem
> 0x0000000000000000-0x000000014df5ffff]
> [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x14df5ffff]
> [ 0.000000] NODE_DATA [mem 0x14df39000-0x14df5ffff]
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x00001000-0x00ffffff]
> [ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
> [ 0.000000] Normal [mem 0x100000000-0x14df5ffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x00001000-0x00099fff]
> [ 0.000000] node 0: [mem 0x146000000-0x14df5ffff]
> [ 0.000000] ACPI: PM-Timer IO Port: 0x408
> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
> [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000]
> gsi_base[0])
> [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000,
> GSI 0-23
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl
> dfl)
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high
> level)
> [ 0.000000] Using ACPI (MADT) for SMP configuration information
> [ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
> [ 0.000000] smpboot: Allowing 16 CPUs, 8 hotplug CPUs
> [ 0.000000] PM: Registered nosave memory: 000000000009a000 -
> 00000000aad6a000
> [ 0.000000] PM: Registered nosave memory: 00000000aad6a000 -
> 00000000aafff000
> [ 0.000000] PM: Registered nosave memory: 00000000aafff000 -
> 0000000146000000
> [ 0.000000] e820: [mem 0x0009ac00-0xaad69fff] available for PCI
> devices
> [ 0.000000] Booting paravirtualized kernel on bare hardware
> [ 0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:16
> nr_cpu_ids:16 nr_node_ids:1
> [ 0.000000] PERCPU: Embedded 29 pages/cpu @ffff88014dc00000 s86272
> r8192 d24320 u131072
> [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on.
> Total pages: 32227
> [ 0.000000] Policy zone: Normal
> [ 0.000000] Kernel command line: console=ttyS0,115200n81
> memmap=exactmap memmap=615K@4K memmap=130432K@5341184K
> elfcorehdr=5471616K memmap=2560K#2799016K memmap=84K#2801576K
> [ 0.000000] PID hash table entries: 512 (order: 0, 4096 bytes)
> [ 0.000000] __ex_table already sorted, skipping sort
> [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
> [ 0.000000] Cannot allocate SWIOTLB buffer
> [ 0.000000] Checking aperture...
> [ 0.000000] No AGP bridge found
> [ 0.000000] Memory: 108184k/5471616k available (6346k kernel code,
> 5340572k absent, 22860k reserved, 3983k data, 1484k init)
> [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3,
> MinObjects=0, CPUs=16, Nodes=1
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU restricting CPUs from NR_CPUS=4096 to
> nr_cpu_ids=16.
> [ 0.000000] NR_IRQS:262400 nr_irqs:808 16
> [ 0.000000] Spurious LAPIC timer interrupt on cpu 0
> [ 0.000000] Console: colour VGA+ 80x25
> [ 0.000000] console [ttyS0] enabled
> [ 0.000000] allocated 1572864 bytes of page_cgroup
> [ 0.000000] please try 'cgroup_disable=memory' option if you don't
> want memory cgroups
> [ 0.001000] tsc: Fast TSC calibration using PIT
> [ 0.002000] tsc: Detected 2593.881 MHz processor
> [ 0.000005] Calibrating delay loop (skipped), value calculated
> using timer frequency.. 5187.76 BogoMIPS (lpj=2593881)
> [ 0.010628] pid_max: default: 32768 minimum: 301
> [ 0.015311] Security Framework initialized
> [ 0.019418] SELinux: Initializing.
> [ 0.023012] Dentry cache hash table entries: 16384 (order: 5,
> 131072 bytes)
> [ 0.030043] Inode-cache hash table entries: 8192 (order: 4, 65536
> bytes)
> [ 0.036777] Mount-cache hash table entries: 256
> [ 0.041629] Initializing cgroup subsys cpuacct
> [ 0.046079] Initializing cgroup subsys memory
> [ 0.050447] Initializing cgroup subsys devices
> [ 0.054891] Initializing cgroup subsys freezer
> [ 0.059335] Initializing cgroup subsys net_cls
> [ 0.063777] Initializing cgroup subsys blkio
> [ 0.068045] Initializing cgroup subsys perf_event
> [ 0.072794] CPU: Physical Processor ID: 0
> [ 0.076806] CPU: Processor Core ID: 0
> [ 0.081210] mce: CPU supports 9 MCE banks
> [ 0.085254] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0
> [ 0.085254] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32
> [ 0.085254] tlb_flushall_shift: 1
> [ 0.099844] Freeing SMP alternatives: 24k freed
> [ 0.106487] ACPI: Core revision 20130117
> [ 0.122586] ACPI: All ACPI Tables successfully acquired
> [ 0.127868] ftrace: allocating 23601 entries in 93 pages
> [ 0.154031] dmar: Host address width 36
> [ 0.157876] dmar: DRHD base: 0x000000fed90000 flags: 0x0
> [ 0.163194] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap
> c0000020e60262 ecap f0101a
> [ 0.171277] dmar: DRHD base: 0x000000fed91000 flags: 0x1
> [ 0.176590] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap
> c9008020660262 ecap f010da
> [ 0.184672] dmar: RMRR base: 0x000000aac95000 end:
> 0x000000aacb2fff
> [ 0.190934] dmar: RMRR base: 0x000000ab800000 end:
> 0x000000af9fffff
> [ 0.197268] IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
> [ 0.202825] HPET id 0 under DRHD base 0xfed91000
> [ 0.207714] Enabled IRQ remapping in x2apic mode
> [ 0.212866] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.228870] smpboot: CPU0: Intel(R) Core(TM) i7-3720QM CPU @
> 2.60GHz (fam: 06, model: 3a, stepping: 09)
> [ 0.238319] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge
> events, Broken BIOS detected, complain to your hardware vendor.
> [ 0.250098] [Firmware Bug]: the BIOS has corrupted hw-PMU
> resources (MSR 38d is b0)
> [ 0.257739] Intel PMU driver.
> [ 0.260701] ... version: 3
> [ 0.264704] ... bit width: 48
> [ 0.268790] ... generic registers: 4
> [ 0.272791] ... value mask: 0000ffffffffffff
> [ 0.278092] ... max period: 000000007fffffff
> [ 0.283391] ... fixed-purpose events: 3
> [ 0.287391] ... event mask: 000000070000000f
> [ 0.294917] smpboot: Booting Node 0, Processors #1[
> 0.314489] NMI watchdog: enabled on all CPUs, permanently consumes
> one hw-PMU counter.
> #2 #3 #4 #5 #6 #7
> [ 0.408750] Brought up 8 CPUs
> [ 0.411901] smpboot: Total of 8 processors activated (41502.09
> BogoMIPS)
> [ 0.432134] devtmpfs: initialized
> [ 0.437882] atomic64 test passed for x86-64 platform with CX8 and
> with SSE
> [ 0.444833] NET: Registered protocol family 16
> [ 0.449455] ACPI FADT declares the system doesn't support PCIe
> ASPM, so disable it
> [ 0.457010] ACPI: bus type pci registered
> [ 0.461122] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem
> 0xf8000000-0xfbffffff] (base 0xf8000000)
> [ 0.470411] PCI: not using MMCONFIG
> [ 0.473897] PCI: Using configuration type 1 for base access
> [ 0.480792] bio: create slab at 0
> [ 0.484937] ACPI: Added _OSI(Module Device)
> [ 0.489116] ACPI: Added _OSI(Processor Device)
> [ 0.493554] ACPI: Added _OSI(3.0 _SCP Extensions)
> [ 0.498254] ACPI: Added _OSI(Processor Aggregator Device)
> [ 0.509686] ACPI: Executed 1 blocks of module-level executable AML
> code
> [ 0.529844] ACPI: SSDT 00000000aaccd018 00A4F (v01 PmRef Cpu0Cst
> 00003001 INTL 20110623)
> [ 0.538912] ACPI: Dynamic OEM Table Load:
> [ 0.542947] ACPI: SSDT (null) 00A4F (v01 PmRef Cpu0Cst
> 00003001 INTL 20110623)
> [ 0.556403] ACPI: SSDT 00000000aaccea98 00303 (v01 PmRef ApIst
> 00003000 INTL 20110623)
> [ 0.565515] ACPI: Dynamic OEM Table Load:
> [ 0.569549] ACPI: SSDT (null) 00303 (v01 PmRef ApIst
> 00003000 INTL 20110623)
> [ 0.582778] ACPI: SSDT 00000000aacccd98 00119 (v01 PmRef ApCst
> 00003000 INTL 20110623)
> [ 0.591819] ACPI: Dynamic OEM Table Load:
> [ 0.595855] ACPI: SSDT (null) 00119 (v01 PmRef ApCst
> 00003000 INTL 20110623)
> [ 1.418743] ACPI: Interpreter enabled
> [ 1.422408] ACPI: (supports S0 S1ACPI Exception: AE_NOT_FOUND,
> While evaluating Sleep State [\\_S2_] (20130117/hwxface-568)
> [ 1.433515] S3 S4 S5)
> [ 1.435935] ACPI: Using IOAPIC for interrupt routing
> [ 1.440920] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem
> 0xf8000000-0xfbffffff] (base 0xf8000000)
> [ 1.451100] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved
> in ACPI motherboard resources
> [ 1.464742] PCI: Using host bridge windows from ACPI; if
> necessary, use "pci=nocrs" and report a bug
> [ 1.488550] ACPI: Power Resource [FN00] (off)
> [ 1.493020] ACPI: Power Resource [FN01] (off)
> [ 1.497481] ACPI: Power Resource [FN02] (on)
> [ 1.502241] ACPI: Power Resource [FN03] (on)
> [ 1.506935] ACPI: Power Resource [FN04] (on)
> [ 1.512503] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
> [ 1.519005] acpi PNP0A08:00: ACPI _OSC support notification
> failed, disabling PCIe ASPM
> [ 1.526995] acpi PNP0A08:00: Unable to request _OSC control (_OSC
> support mask: 0x08)
> [ 1.535557] PCI host bridge to bus 0000:00
> [ 1.539653] pci_bus 0000:00: root bus resource [bus 00-3e]
> [ 1.545132] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
> [ 1.551304] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
> [ 1.557474] pci_bus 0000:00: root bus resource [mem
> 0x000a0000-0x000bffff]
> [ 1.564338] pci_bus 0000:00: root bus resource [mem
> 0x000d0000-0x000d3fff]
> [ 1.571201] pci_bus 0000:00: root bus resource [mem
> 0x000d4000-0x000d7fff]
> [ 1.578064] pci_bus 0000:00: root bus resource [mem
> 0x000d8000-0x000dbfff]
> [ 1.584923] pci_bus 0000:00: root bus resource [mem
> 0x000dc000-0x000dffff]
> [ 1.591784] pci_bus 0000:00: root bus resource [mem
> 0x000e0000-0x000e3fff]
> [ 1.598644] pci_bus 0000:00: root bus resource [mem
> 0x000e4000-0x000e7fff]
> [ 1.605503] pci_bus 0000:00: root bus resource [mem
> 0xafa00000-0xfeafffff]
> [ 1.613008] pci 0000:00:14.0: System wakeup disabled by ACPI
> [ 1.619582] pci 0000:00:19.0: System wakeup disabled by ACPI
> [ 1.625584] pci 0000:00:1a.0: System wakeup disabled by ACPI
> [ 1.631518] pci 0000:00:1c.0: System wakeup disabled by ACPI
> [ 1.637463] pci 0000:00:1c.6: System wakeup disabled by ACPI
> [ 1.643459] pci 0000:00:1d.0: System wakeup disabled by ACPI
> [ 1.650069] pci 0000:00:1c.0: PCI bridge to [bus 01]
> [ 1.655146] pci 0000:00:1c.6: PCI bridge to [bus 02-09]
> [ 1.660411] ACPI _OSC control for PCIe not granted, disabling ASPM
> [ 1.668220] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11
> 12 14 15), disabled.
> [ 1.676298] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 10 11 12
> 14 15) *0, disabled.
> [ 1.684560] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11
> 12 14 15), disabled.
> [ 1.692628] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *10 11
> 12 14 15), disabled.
> [ 1.700698] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 10 11
> 12 14 15), disabled.
> [ 1.708765] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 11 12
> 14 15) *0, disabled.
> [ 1.717023] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 10 11 12
> 14 15) *0, disabled.
> [ 1.725280] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *10 11
> 12 14 15), disabled.
> [ 1.733756] ACPI: Enabled 5 GPEs in block 00 to 3F
> [ 1.738777] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data
> = 0x62
> [ 1.746393] ACPI: ACPI Dock Station Driver: 1 docks/bays found
> [ 1.752307] vgaarb: device added:
> PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [ 1.760387] vgaarb: loaded
> [ 1.763093] vgaarb: bridge control possible 0000:00:02.0
> [ 1.768498] SCSI subsystem initialized
> [ 1.772270] ACPI: bus type usb registered
> [ 1.776297] usbcore: registered new interface driver usbfs
> [ 1.781783] usbcore: registered new interface driver hub
> [ 1.787122] usbcore: registered new device driver usb
> [ 1.792242] PCI: Using ACPI for IRQ routing
> [ 1.800410] NetLabel: Initializing
> [ 1.803809] NetLabel: domain hash size = 128
> [ 1.808159] NetLabel: protocols = UNLABELED CIPSOv4
> [ 1.813125] NetLabel: unlabeled traffic allowed by default
> [ 1.818736] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
> [ 1.825032] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
> [ 1.832871] Switching to clocksource hpet
> [ 1.845000] pnp: PnP ACPI init
> [ 1.848089] ACPI: bus type pnp registered
> [ 1.852490] system 00:00: [io 0x06a4] has been reserved
> [ 1.857802] system 00:00: [io 0x06a0] has been reserved
> [ 1.863449] system 00:05: [io 0x0680-0x069f] has been reserved
> [ 1.869370] system 00:05: [io 0x1004-0x1013] has been reserved
> [ 1.875291] system 00:05: [io 0xffff] has been reserved
> [ 1.880604] system 00:05: [io 0xffff] has been reserved
> [ 1.885918] system 00:05: [io 0x0400-0x0453] has been reserved
> [ 1.891836] system 00:05: [io 0x0458-0x047f] has been reserved
> [ 1.897754] system 00:05: [io 0x0500-0x057f] has been reserved
> [ 1.903673] system 00:05: [io 0x164e-0x164f] has been reserved
> [ 1.909718] system 00:07: [io 0x0454-0x0457] has been reserved
> [ 1.916807] system 00:0b: [mem 0xfed1c000-0xfed1ffff] has been
> reserved
> [ 1.923425] system 00:0b: [mem 0xfed10000-0xfed17fff] has been
> reserved
> [ 1.930043] system 00:0b: [mem 0xfed18000-0xfed18fff] has been
> reserved
> [ 1.936656] system 00:0b: [mem 0xfed19000-0xfed19fff] has been
> reserved
> [ 1.943267] system 00:0b: [mem 0xf8000000-0xfbffffff] has been
> reserved
> [ 1.949879] system 00:0b: [mem 0xfed20000-0xfed3ffff] has been
> reserved
> [ 1.956492] system 00:0b: [mem 0xfed90000-0xfed93fff] could not be
> reserved
> [ 1.963449] system 00:0b: [mem 0xfed45000-0xfed8ffff] has been
> reserved
> [ 1.970062] system 00:0b: [mem 0xff000000-0xffffffff] has been
> reserved
> [ 1.976677] system 00:0b: [mem 0xfee00000-0xfeefffff] has been
> reserved
> [ 1.983292] system 00:0b: [mem 0xafa00000-0xafa00fff] has been
> reserved
> [ 1.990474] system 00:0c: [mem 0x00000000-0x0009cfff] could not be
> reserved
> [ 1.997547] system 00:0d: [mem 0x20000000-0x201fffff] has been
> reserved
> [ 2.004160] system 00:0d: [mem 0x40004000-0x40004fff] has been
> reserved
> [ 2.010844] pnp: PnP ACPI: found 14 devices
> [ 2.015025] ACPI: ACPI bus type pnp unregistered
> [ 2.027290] pci 0000:00:1c.0: PCI bridge to [bus 01]
> [ 2.032277] pci 0000:00:1c.6: PCI bridge to [bus 02-09]
> [ 2.037501] pci 0000:00:1c.6: bridge window [io 0x2000-0x5fff]
> [ 2.043599] pci 0000:00:1c.6: bridge window [mem
> 0xb0000000-0xb10fffff]
> [ 2.050391] pci 0000:00:1c.6: bridge window [mem
> 0xd0000000-0xd10fffff 64bit pref]
> [ 2.058446] NET: Registered protocol family 2
> [ 2.063068] TCP established hash table entries: 1024 (order: 2,
> 16384 bytes)
> [ 2.070133] TCP bind hash table entries: 1024 (order: 2, 16384
> bytes)
> [ 2.076578] TCP: Hash tables configured (established 1024 bind
> 1024)
> [ 2.082996] TCP: reno registered
> [ 2.086255] UDP hash table entries: 256 (order: 1, 8192 bytes)
> [ 2.092092] UDP-Lite hash table entries: 256 (order: 1, 8192
> bytes)
> [ 2.098509] NET: Registered protocol family 1
> [ 2.103710] PCI-DMA: Using software bounce buffering for IO
> (SWIOTLB)
> [ 2.110160] software IO TLB: No low mem
> [ 2.116187] alg: No test for __gcm-aes-aesni
> (__driver-gcm-aes-aesni)
> [ 2.123111] Initialise module verification
> [ 2.127274] audit: initializing netlink socket (disabled)
> [ 2.132684] type=2000 audit(1362698281.479:1): initialized
> [ 2.185575] HugeTLB registered 2 MB page size, pre-allocated 0
> pages
> [ 2.193943] VFS: Disk quotas dquot_6.5.2
> [ 2.197950] Dquot-cache hash table entries: 512 (order 0, 4096
> bytes)
> [ 2.205330] msgmni has been set to 211
> [ 2.210287] alg: No test for stdrng (krng)
> [ 2.214402] NET: Registered protocol family 38
> [ 2.218851] Key type asymmetric registered
> [ 2.222951] Asymmetric key parser 'x509' registered
> [ 2.227888] Block layer SCSI generic (bsg) driver version 0.4
> loaded (major 252)
> [ 2.235322] io scheduler noop registered
> [ 2.239247] io scheduler deadline registered (default)
> [ 2.244403] io scheduler cfq registered
> [ 2.248531] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> [ 2.254132] pciehp: PCI Express Hot Plug Controller Driver
> version: 0.4
> [ 2.260741] acpiphp: ACPI Hot Plug PCI Controller Driver version:
> 0.5
> [ 2.267305] acpiphp_glue: can't evaluate _ADR (0x5)
> [ 2.272577] ACPI: AC Adapter [ADP1] (on-line)
> [ 2.277080] input: Lid Switch as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
> [ 2.285391] ACPI: Lid Switch [LID0]
> [ 2.288954] input: Power Button as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
> [ 2.297299] ACPI: Power Button [PWRB]
> [ 2.301017] input: Power Button as
> /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
> [ 2.308410] ACPI: Power Button [PWRF]
> [ 2.312155] ACPI: Fan [FAN0] (off)
> [ 2.315615] ACPI: Fan [FAN1] (off)
> [ 2.319067] ACPI: Fan [FAN2] (on)
> [ 2.322435] ACPI: Fan [FAN3] (on)
> [ 2.325793] ACPI: Fan [FAN4] (on)
> [ 2.329202] ACPI: Requesting acpi_cpufreq
> [ 2.340021] thermal LNXTHERM:00: registered as thermal_zone0
> [ 2.345688] ACPI: Thermal Zone [TZ00] (32 C)
> [ 2.351008] thermal LNXTHERM:01: registered as thermal_zone1
> [ 2.356672] ACPI: Thermal Zone [TZ01] (33 C)
> [ 2.364912] GHES: HEST is not enabled!
> [ 2.365203] ACPI: Battery Slot [BAT0] (battery present)
> [ 2.365238] ACPI: Battery Slot [BAT1] (battery absent)
> [ 2.365350] ACPI: Battery Slot [BAT2] (battery absent)
> [ 2.384217] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> enabled
> [ 2.411446] 00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [ 2.438082] serial8250: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
> [ 2.464897] 0000:00:16.3: ttyS1 at I/O 0x60e0 (irq = 19) is a
> 16550A
> [ 2.471729] Non-volatile memory driver v1.3
> [ 2.475917] Linux agpgart interface v0.103
> [ 2.481169] loop: module loaded
> [ 2.484342] rdac: device handler registered
> [ 2.488599] hp_sw: device handler registered
> [ 2.492869] emc: device handler registered
> [ 2.496970] alua: device handler registered
> [ 2.501205] libphy: Fixed MDIO Bus: probed
> [ 2.505370] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
> Driver
> [ 2.511893] ehci-pci: EHCI PCI platform driver
> [ 2.516470] ehci-pci 0000:00:1a.0: EHCI Host Controller
> [ 2.521763] ehci-pci 0000:00:1a.0: new USB bus registered,
> assigned bus number 1
> [ 2.529178] ehci-pci 0000:00:1a.0: debug port 2
> [ 2.537665] ehci-pci 0000:00:1a.0: irq 16, io mem 0xb1160000
> [ 2.548387] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00
> [ 2.554157] usb usb1: New USB device found, idVendor=1d6b,
> idProduct=0002
> [ 2.560941] usb usb1: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [ 2.568161] usb usb1: Product: EHCI Host Controller
> [ 2.573041] usb usb1: Manufacturer: Linux 3.9.0-rc1+ ehci_hcd
> [ 2.578784] usb usb1: SerialNumber: 0000:00:1a.0
> [ 2.583547] hub 1-0:1.0: USB hub found
> [ 2.587305] hub 1-0:1.0: 3 ports detected
> [ 2.591639] ehci-pci 0000:00:1d.0: EHCI Host Controller
> [ 2.596930] ehci-pci 0000:00:1d.0: new USB bus registered,
> assigned bus number 2
> [ 2.604343] ehci-pci 0000:00:1d.0: debug port 2
> [ 2.612811] ehci-pci 0000:00:1d.0: irq 23, io mem 0xb1150000
> [ 2.624437] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
> [ 2.630204] usb usb2: New USB device found, idVendor=1d6b,
> idProduct=0002
> [ 2.636987] usb usb2: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [ 2.644206] usb usb2: Product: EHCI Host Controller
> [ 2.649084] usb usb2: Manufacturer: Linux 3.9.0-rc1+ ehci_hcd
> [ 2.654830] usb usb2: SerialNumber: 0000:00:1d.0
> [ 2.659586] hub 2-0:1.0: USB hub found
> [ 2.663345] hub 2-0:1.0: 3 ports detected
> [ 2.667593] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [ 2.673792] uhci_hcd: USB Universal Host Controller Interface
> driver
> [ 2.680291] xhci_hcd 0000:00:14.0: xHCI Host Controller
> [ 2.685578] xhci_hcd 0000:00:14.0: new USB bus registered,
> assigned bus number 3
> [ 2.693119] xhci_hcd 0000:00:14.0: irq 16, io mem 0xb11a0000
> [ 2.698942] usb usb3: New USB device found, idVendor=1d6b,
> idProduct=0002
> [ 2.705729] usb usb3: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [ 2.712948] usb usb3: Product: xHCI Host Controller
> [ 2.717823] usb usb3: Manufacturer: Linux 3.9.0-rc1+ xhci_hcd
> [ 2.723564] usb usb3: SerialNumber: 0000:00:14.0
> [ 2.728328] hub 3-0:1.0: USB hub found
> [ 2.732094] hub 3-0:1.0: 4 ports detected
> [ 2.736783] xhci_hcd 0000:00:14.0: xHCI Host Controller
> [ 2.742078] xhci_hcd 0000:00:14.0: new USB bus registered,
> assigned bus number 4
> [ 2.749514] usb usb4: New USB device found, idVendor=1d6b,
> idProduct=0003
> [ 2.756299] usb usb4: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [ 2.763516] usb usb4: Product: xHCI Host Controller
> [ 2.768400] usb usb4: Manufacturer: Linux 3.9.0-rc1+ xhci_hcd
> [ 2.774148] usb usb4: SerialNumber: 0000:00:14.0
> [ 2.778964] hub 4-0:1.0: USB hub found
> [ 2.782737] hub 4-0:1.0: 4 ports detected
> [ 2.791587] usbcore: registered new interface driver usbserial
> [ 2.797430] usbcore: registered new interface driver
> usbserial_generic
> [ 2.803956] usbserial: USB Serial support registered for generic
> [ 2.810001] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at
> 0x60,0x64 irq 1
> [ 2.816782] i8042: PNP: PS/2 appears to have AUX port disabled, if
> this is incorrect please boot with i8042.nopnp
> [ 2.827473] serio: i8042 KBD port at 0x60,0x64 irq 1
> [ 2.832534] mousedev: PS/2 mouse device common for all mice
> [ 2.838423] rtc_cmos 00:06: RTC can wake from S4
> [ 2.843292] rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0
> [ 2.849434] rtc_cmos 00:06: alarms up to one month, y3k, 242 bytes
> nvram, hpet irqs
> [ 2.857324] cpuidle: using governor ladder
> [ 2.861761] cpuidle: using governor menu
> [ 2.866158] EFI Variables Facility v0.08 2004-May-17
> [ 2.871140] hidraw: raw HID events driver (C) Jiri Kosina
> [ 2.873905] input: AT Translated Set 2 keyboard as
> /devices/platform/i8042/serio0/input/input3
> [ 2.885248] usbcore: registered new interface driver usbhid
> [ 2.890818] usbhid: USB HID core driver
> [ 2.894647] usb 1-1: new high-speed USB device number 2 using
> ehci-pci
> [ 2.894695] drop_monitor: Initializing network drop monitor
> service
> [ 2.894813] TCP: cubic registered
> [ 2.894815] Initializing XFRM netlink socket
> [ 2.894988] NET: Registered protocol family 10
> [ 2.895269] NET: Registered protocol family 17
> [ 2.895770] Loading module verification certificates
> [ 2.897372] MODSIGN: Loaded cert 'Magrathea: Glacier signing key:
> b14fa6fba81316fe0bb193bbf458deba6d430978'
> [ 2.897384] registered taskstats version 1
> [ 2.897388] IMA: No TPM chip found, activating TPM-bypass!
> [ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
> buffer earlier and can't now provide you with the DMA bounce buffer
> [ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
> [ 2.965426] Call Trace:
> [ 2.967866] [] panic+0xc1/0x1d0
> [ 2.972644] []
> swiotlb_tbl_map_single+0x27c/0x280
> [ 2.978991] [] map_single+0x19/0x20
> [ 2.984115] [] swiotlb_map_page+0x6e/0x160
> [ 2.989845] []
> usb_hcd_map_urb_for_dma+0x230/0x4a0
> [ 2.996268] [] usb_hcd_submit_urb+0x295/0x8e0
> [ 3.002258] [] ? __dequeue_entity+0x2f/0x50
> [ 3.008076] [] ? __switch_to+0x13e/0x4a0
> [ 3.013632] [] usb_submit_urb+0xff/0x3d0
> [ 3.019186] [] ? __schedule+0x3de/0x7e0
> [ 3.024657] [] usb_start_wait_urb+0x6a/0x160
> [ 3.030560] [] ? __kmalloc+0x55/0x210
> [ 3.035856] [] ? usb_alloc_urb+0x1e/0x50
> [ 3.041411] [] usb_control_msg+0xde/0x140
> [ 3.047056] [] ? hub_port_init+0x310/0xaf0
> [ 3.052785] [] ? hub_port_init+0x2eb/0xaf0
> [ 3.058515] [] hub_port_init+0x338/0xaf0
> [ 3.064071] [] ? update_autosuspend+0x39/0x60
> [ 3.070062] [] ?
> pm_runtime_set_autosuspend_delay+0x49/0x70
> [ 3.077264] []
> hub_port_connect_change+0x24a/0xaa0
> [ 3.083684] [] hub_events+0x2ea/0x910
> [ 3.088981] [] ? __schedule+0x3de/0x7e0
> [ 3.094451] [] hub_thread+0x35/0x1e0
> [ 3.099661] [] ? wake_up_bit+0x40/0x40
> [ 3.105045] [] ? hub_events+0x910/0x910
> [ 3.110514] [] kthread+0xc0/0xd0
> [ 3.115378] [] ?
> kthread_create_on_node+0x120/0x120
> [ 3.121887] [] ret_from_fork+0x7c/0xb0
> [ 3.127271] [] ?
> kthread_create_on_node+0x120/0x120
>
> --
> Thanks,
> WANG Chao
>

2013-03-08 06:32:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian <[email protected]> wrote:
> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>
> This message looks suspicious and out of range while 3.8 reservation
> looks within the range.
>
> [ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel
> (System RAM: 3977MB)
>
> Wondering if anything to do with memblock again...

that is intended...

> ----- Original Message -----
>> From: "WANG Chao" <[email protected]>
>> To: "LKML" vger.kernel.org>
>> Cc: "CAI Qian" <[email protected]>
>> Sent: Friday, March 8, 2013 1:54:37 PM
>> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you
>> with the DMA bounce buffer
>>
>> Hi, All
>>
>> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
>> 28d413a), but
>> 2nd kernel panic at early time:
>> [ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
>> buffer earlier and can't now provide you with the DMA bounce buffer
>> [ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1

You need to add crashkernel_low=64M in first kernel.

As your system does not support DMA remapping.

Thanks

Yinghai

2013-03-08 06:36:14

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Thu, Mar 7, 2013 at 10:32 PM, Yinghai Lu <[email protected]> wrote:
> On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian <[email protected]> wrote:
>> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>>
>> This message looks suspicious and out of range while 3.8 reservation
>> looks within the range.
>>
>> [ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel
>> (System RAM: 3977MB)
>>
>> Wondering if anything to do with memblock again...
>
> that is intended...
>
>> ----- Original Message -----
>>> From: "WANG Chao" <[email protected]>
>>> To: "LKML" vger.kernel.org>
>>> Cc: "CAI Qian" <[email protected]>
>>> Sent: Friday, March 8, 2013 1:54:37 PM
>>> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you
>>> with the DMA bounce buffer
>>>
>>> Hi, All
>>>
>>> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
>>> 28d413a), but
>>> 2nd kernel panic at early time:
>>> [ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
>>> buffer earlier and can't now provide you with the DMA bounce buffer
>>> [ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
>
> You need to add crashkernel_low=64M in first kernel.
>
> As your system does not support DMA remapping.

looks like your system DO have DMAR table, please enable dmar
remapping in your kernel config.

Yinghai

2013-03-08 07:21:06

by WANG Chao

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On 03/08/2013 02:36 PM, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 10:32 PM, Yinghai Lu <[email protected]> wrote:
>> On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian <[email protected]> wrote:
>>> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>>>
>>> This message looks suspicious and out of range while 3.8 reservation
>>> looks within the range.
>>>
>>> [ 0.000000] Reserving 128MB of memory at 5216MB for crashkernel
>>> (System RAM: 3977MB)
>>>
>>> Wondering if anything to do with memblock again...
>>
>> that is intended...
>>
>>> ----- Original Message -----
>>>> From: "WANG Chao" <[email protected]>
>>>> To: "LKML" vger.kernel.org>
>>>> Cc: "CAI Qian" <[email protected]>
>>>> Sent: Friday, March 8, 2013 1:54:37 PM
>>>> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you
>>>> with the DMA bounce buffer
>>>>
>>>> Hi, All
>>>>
>>>> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
>>>> 28d413a), but
>>>> 2nd kernel panic at early time:
>>>> [ 2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
>>>> buffer earlier and can't now provide you with the DMA bounce buffer
>>>> [ 2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
>>
>> You need to add crashkernel_low=64M in first kernel.
>>
>> As your system does not support DMA remapping.
>
> looks like your system DO have DMAR table, please enable dmar
> remapping in your kernel config.

I've already got following config:
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
CONFIG_IRQ_REMAP=y

but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
2nd kernel from booting ...

I tested crashkernel=128M and crashkernel_low=64M, seems 2nd-kernel/kexec only
works when two params are used in combination.

Thanks,
WANG Chao

>
> Yinghai
>

2013-03-08 07:27:15

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao <[email protected]> wrote:
>>
>> looks like your system DO have DMAR table, please enable dmar
>> remapping in your kernel config.
>
> I've already got following config:
> CONFIG_DMAR_TABLE=y
> CONFIG_INTEL_IOMMU=y
> CONFIG_IRQ_REMAP=y
>
> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
> 2nd kernel from booting ...

Did you put intel_iommu=on on first and second cpu both?

Thanks

Yinghai

2013-03-08 07:33:37

by WANG Chao

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On 03/08/2013 03:27 PM, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao <[email protected]> wrote:
>>>
>>> looks like your system DO have DMAR table, please enable dmar
>>> remapping in your kernel config.
>>
>> I've already got following config:
>> CONFIG_DMAR_TABLE=y
>> CONFIG_INTEL_IOMMU=y
>> CONFIG_IRQ_REMAP=y
>>
>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>> 2nd kernel from booting ...
>
> Did you put intel_iommu=on on first and second cpu both?

I tried, 2nd kernel didn't boot and keep splitting errors like these:
[ 2.106939] DMAR: No ATSR found
[ 2.110121] IOMMU 0 0xfed90000: using Queued invalidation
[ 2.115522] IOMMU 1 0xfed91000: using Queued invalidation
[ 2.120919] IOMMU: Setting RMRR:
[ 2.124162] IOMMU: Setting identity map for device 0000:00:02.0 [0xab800000
- 0xaf9fffff]
[ 2.133099] IOMMU: Setting identity map for device 0000:00:1d.0 [0xaac95000
- 0xaacb2fff]
[ 2.141305] IOMMU: Setting identity map for device 0000:00:1a.0 [0xaac95000
- 0xaacb2fff]
[ 2.149503] IOMMU: Setting identity map for device 0000:00:14.0 [0xaac95000
- 0xaacb2fff]
[ 2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 2.163011] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff
[Errors, here we go]
[ 2.170932] dmar: DRHD: handling fault status reg 3
[ 2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[ 2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr ffffe000
[ 2.182486] DMAR:[fault reason 05] PTE Write access is not set
[ 2.195705] dmar: DRHD: handling fault status reg 3
[ 2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ff873000
[ 2.200570] DMAR:[fault reason 06] PTE Read access is not set
[ 2.213618] dmar: DRHD: handling fault status reg 3
[..]

Thanks,
WANG Chao
>
> Thanks
>
> Yinghai
>

2013-03-08 07:50:03

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Thu, Mar 7, 2013 at 11:33 PM, WANG Chao <[email protected]> wrote:
> On 03/08/2013 03:27 PM, Yinghai Lu wrote:
>> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao <[email protected]> wrote:
>>>>
>>>> looks like your system DO have DMAR table, please enable dmar
>>>> remapping in your kernel config.
>>>
>>> I've already got following config:
>>> CONFIG_DMAR_TABLE=y
>>> CONFIG_INTEL_IOMMU=y
>>> CONFIG_IRQ_REMAP=y
>>>
>>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>>> 2nd kernel from booting ...
>>
>> Did you put intel_iommu=on on first and second cpu both?
>
> I tried, 2nd kernel didn't boot and keep splitting errors like these:
> [ 2.106939] DMAR: No ATSR found
> [ 2.110121] IOMMU 0 0xfed90000: using Queued invalidation
> [ 2.115522] IOMMU 1 0xfed91000: using Queued invalidation
> [ 2.120919] IOMMU: Setting RMRR:
> [ 2.124162] IOMMU: Setting identity map for device 0000:00:02.0 [0xab800000
> - 0xaf9fffff]
> [ 2.133099] IOMMU: Setting identity map for device 0000:00:1d.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.141305] IOMMU: Setting identity map for device 0000:00:1a.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.149503] IOMMU: Setting identity map for device 0000:00:14.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [ 2.163011] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff
> [Errors, here we go]
> [ 2.170932] dmar: DRHD: handling fault status reg 3
> [ 2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> [ 2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr ffffe000
> [ 2.182486] DMAR:[fault reason 05] PTE Write access is not set
> [ 2.195705] dmar: DRHD: handling fault status reg 3
> [ 2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ff873000
> [ 2.200570] DMAR:[fault reason 06] PTE Read access is not set
> [ 2.213618] dmar: DRHD: handling fault status reg 3

my Nehalem-EX and Westmere-EX is working with iommu enabled in second kernel.

what is 00:02.0 in your system?

Is your kernel upsteam kernel or redhat flavor one?

Thanks

Yinghai

2013-03-08 07:53:13

by Takao Indoh

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

(2013/03/08 16:33), WANG Chao wrote:
> On 03/08/2013 03:27 PM, Yinghai Lu wrote:
>> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao <[email protected]> wrote:
>>>>
>>>> looks like your system DO have DMAR table, please enable dmar
>>>> remapping in your kernel config.
>>>
>>> I've already got following config:
>>> CONFIG_DMAR_TABLE=y
>>> CONFIG_INTEL_IOMMU=y
>>> CONFIG_IRQ_REMAP=y
>>>
>>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>>> 2nd kernel from booting ...
>>
>> Did you put intel_iommu=on on first and second cpu both?
>
> I tried, 2nd kernel didn't boot and keep splitting errors like these:
> [ 2.106939] DMAR: No ATSR found
> [ 2.110121] IOMMU 0 0xfed90000: using Queued invalidation
> [ 2.115522] IOMMU 1 0xfed91000: using Queued invalidation
> [ 2.120919] IOMMU: Setting RMRR:
> [ 2.124162] IOMMU: Setting identity map for device 0000:00:02.0 [0xab800000
> - 0xaf9fffff]
> [ 2.133099] IOMMU: Setting identity map for device 0000:00:1d.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.141305] IOMMU: Setting identity map for device 0000:00:1a.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.149503] IOMMU: Setting identity map for device 0000:00:14.0 [0xaac95000
> - 0xaacb2fff]
> [ 2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [ 2.163011] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff
> [Errors, here we go]
> [ 2.170932] dmar: DRHD: handling fault status reg 3
> [ 2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> [ 2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr ffffe000
> [ 2.182486] DMAR:[fault reason 05] PTE Write access is not set
> [ 2.195705] dmar: DRHD: handling fault status reg 3
> [ 2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ff873000
> [ 2.200570] DMAR:[fault reason 06] PTE Read access is not set
> [ 2.213618] dmar: DRHD: handling fault status reg 3
> [..]

This is the problem I'm working on.
https://lkml.org/lkml/2012/11/26/814

Thansk,
Takao Indoh

2013-03-08 12:12:52

by WANG Chao

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On 03/08/2013 03:50 PM, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 11:33 PM, WANG Chao <[email protected]> wrote:
>> On 03/08/2013 03:27 PM, Yinghai Lu wrote:
>>> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao <[email protected]> wrote:
>>>>>
>>>>> looks like your system DO have DMAR table, please enable dmar
>>>>> remapping in your kernel config.
>>>>
>>>> I've already got following config:
>>>> CONFIG_DMAR_TABLE=y
>>>> CONFIG_INTEL_IOMMU=y
>>>> CONFIG_IRQ_REMAP=y
>>>>
>>>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>>>> 2nd kernel from booting ...
>>>
>>> Did you put intel_iommu=on on first and second cpu both?
>>
>> I tried, 2nd kernel didn't boot and keep splitting errors like these:
>> [ 2.106939] DMAR: No ATSR found
>> [ 2.110121] IOMMU 0 0xfed90000: using Queued invalidation
>> [ 2.115522] IOMMU 1 0xfed91000: using Queued invalidation
>> [ 2.120919] IOMMU: Setting RMRR:
>> [ 2.124162] IOMMU: Setting identity map for device 0000:00:02.0 [0xab800000
>> - 0xaf9fffff]
>> [ 2.133099] IOMMU: Setting identity map for device 0000:00:1d.0 [0xaac95000
>> - 0xaacb2fff]
>> [ 2.141305] IOMMU: Setting identity map for device 0000:00:1a.0 [0xaac95000
>> - 0xaacb2fff]
>> [ 2.149503] IOMMU: Setting identity map for device 0000:00:14.0 [0xaac95000
>> - 0xaacb2fff]
>> [ 2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
>> [ 2.163011] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff
>> [Errors, here we go]
>> [ 2.170932] dmar: DRHD: handling fault status reg 3
>> [ 2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>> [ 2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr ffffe000
>> [ 2.182486] DMAR:[fault reason 05] PTE Write access is not set
>> [ 2.195705] dmar: DRHD: handling fault status reg 3
>> [ 2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ff873000
>> [ 2.200570] DMAR:[fault reason 06] PTE Read access is not set
>> [ 2.213618] dmar: DRHD: handling fault status reg 3
>
> my Nehalem-EX and Westmere-EX is working with iommu enabled in second kernel.
>
> what is 00:02.0 in your system?
This IOMMU issue is related to https://lkml.org/lkml/2012/11/26/814. We can
discuss this IOMMU issue in that thread.
Anyway 00:02.0 is a video card, the box is Ivy Bridge.
# lspci -s 00:02.0 -v
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 2211
Flags: bus master, fast devsel, latency 0, IRQ 44
Memory at afc00000 (64-bit, non-prefetchable) [size=4M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 6000 [size=64]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915
>
> Is your kernel upsteam kernel or redhat flavor one?
I'm using upstream vanilla kernel.
I put kernel config in attachment.


Is it expected to intel_iommu=on or crashkernel_low to make 2nd kernel boot in
3.9? Back in 3.8, it works just fine w/ only crashkernel param.

BTW, Have a good weekend!

Thanks,
WANG Chao

>
> Thanks
>
> Yinghai
>


Attachments:
config (119.41 kB)

2013-03-08 18:24:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Fri, Mar 8, 2013 at 4:12 AM, WANG Chao <[email protected]> wrote:

>> what is 00:02.0 in your system?
> This IOMMU issue is related to https://lkml.org/lkml/2012/11/26/814. We can
> discuss this IOMMU issue in that thread.
> Anyway 00:02.0 is a video card, the box is Ivy Bridge.
> # lspci -s 00:02.0 -v
> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
> Graphics Controller (rev 09) (prog-if 00 [VGA controller])
> Subsystem: Intel Corporation Device 2211
> Flags: bus master, fast devsel, latency 0, IRQ 44
> Memory at afc00000 (64-bit, non-prefetchable) [size=4M]
> Memory at c0000000 (64-bit, prefetchable) [size=256M]
> I/O ports at 6000 [size=64]
> Expansion ROM at <unassigned> [disabled]
> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [d0] Power Management version 2
> Capabilities: [a4] PCI Advanced Features
> Kernel driver in use: i915

disable drm for i915 will make your iommu work with dump?

>
>
> Is it expected to intel_iommu=on or crashkernel_low to make 2nd kernel boot in
> 3.9? Back in 3.8, it works just fine w/ only crashkernel param.

Yes, I really do not want to set crashkernel low range like 72M
automatically for all.
that would have the system with proper iommu support lose 72M under 4G
in first kernel.
And can not play allocate and return tricks, as first kernel have no
idea if iommu will work
on second kernel even iommu is working on first kernel.

Better to fix iommu support at first.

For old system that does not have DMAR or kernel does not have IOMMU
support enabled, or
user does not pass intel_iommu=on.
We could set crashkernel low range to 72M automatically.

Thanks

Yinghai

2013-03-08 19:39:53

by Yinghai Lu

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

[ Add more to To list ]

On Fri, Mar 8, 2013 at 10:24 AM, Yinghai Lu <[email protected]> wrote:
> On Fri, Mar 8, 2013 at 4:12 AM, WANG Chao <[email protected]> wrote:
>
>>> what is 00:02.0 in your system?
>> This IOMMU issue is related to https://lkml.org/lkml/2012/11/26/814. We can
>> discuss this IOMMU issue in that thread.
>> Anyway 00:02.0 is a video card, the box is Ivy Bridge.
>> # lspci -s 00:02.0 -v
>> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
>> Graphics Controller (rev 09) (prog-if 00 [VGA controller])
>> Subsystem: Intel Corporation Device 2211
>> Flags: bus master, fast devsel, latency 0, IRQ 44
>> Memory at afc00000 (64-bit, non-prefetchable) [size=4M]
>> Memory at c0000000 (64-bit, prefetchable) [size=256M]
>> I/O ports at 6000 [size=64]
>> Expansion ROM at <unassigned> [disabled]
>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> Capabilities: [d0] Power Management version 2
>> Capabilities: [a4] PCI Advanced Features
>> Kernel driver in use: i915
>
> disable drm for i915 will make your iommu work with dump?
>
>>
>>
>> Is it expected to intel_iommu=on or crashkernel_low to make 2nd kernel boot in
>> 3.9? Back in 3.8, it works just fine w/ only crashkernel param.
>
> Yes, I really do not want to set crashkernel low range like 72M
> automatically for all.
> that would have the system with proper iommu support lose 72M under 4G
> in first kernel.
> And can not play allocate and return tricks, as first kernel have no
> idea if iommu will work
> on second kernel even iommu is working on first kernel.
>
> Better to fix iommu support at first.
>
> For old system that does not have DMAR or kernel does not have IOMMU
> support enabled, or
> user does not pass intel_iommu=on.
> We could set crashkernel low range to 72M automatically.

It seem that it is not worthy to check case that does not support
IOMMU in second kernel.

Please check attached patch that will just set crashkernel_low auto, and if the
system DO support iommu with kdump, user can specify crashkernel_low=0
to save low 72M.

Thanks

Yinghai


Attachments:
fix_crashkernel_low.patch (1.76 kB)

2013-03-11 03:43:12

by WANG Chao

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On 03/09/2013 03:39 AM, Yinghai Lu wrote:
> [ Add more to To list ]
>
> On Fri, Mar 8, 2013 at 10:24 AM, Yinghai Lu <[email protected]> wrote:
>> On Fri, Mar 8, 2013 at 4:12 AM, WANG Chao <[email protected]> wrote:
>>
>>>> what is 00:02.0 in your system?
>>> This IOMMU issue is related to https://lkml.org/lkml/2012/11/26/814. We can
>>> discuss this IOMMU issue in that thread.
>>> Anyway 00:02.0 is a video card, the box is Ivy Bridge.
>>> # lspci -s 00:02.0 -v
>>> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
>>> Graphics Controller (rev 09) (prog-if 00 [VGA controller])
>>> Subsystem: Intel Corporation Device 2211
>>> Flags: bus master, fast devsel, latency 0, IRQ 44
>>> Memory at afc00000 (64-bit, non-prefetchable) [size=4M]
>>> Memory at c0000000 (64-bit, prefetchable) [size=256M]
>>> I/O ports at 6000 [size=64]
>>> Expansion ROM at <unassigned> [disabled]
>>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>> Capabilities: [d0] Power Management version 2
>>> Capabilities: [a4] PCI Advanced Features
>>> Kernel driver in use: i915
>>
>> disable drm for i915 will make your iommu work with dump?
>>
>>>
>>>
>>> Is it expected to intel_iommu=on or crashkernel_low to make 2nd kernel boot in
>>> 3.9? Back in 3.8, it works just fine w/ only crashkernel param.
>>
>> Yes, I really do not want to set crashkernel low range like 72M
>> automatically for all.
>> that would have the system with proper iommu support lose 72M under 4G
>> in first kernel.
>> And can not play allocate and return tricks, as first kernel have no
>> idea if iommu will work
>> on second kernel even iommu is working on first kernel.
>>
>> Better to fix iommu support at first.
>>
>> For old system that does not have DMAR or kernel does not have IOMMU
>> support enabled, or
>> user does not pass intel_iommu=on.
>> We could set crashkernel low range to 72M automatically.
>
> It seem that it is not worthy to check case that does not support
> IOMMU in second kernel.
>
> Please check attached patch that will just set crashkernel_low auto, and if the
> system DO support iommu with kdump, user can specify crashkernel_low=0
> to save low 72M.

The patch works flawlessly on my box! Thank you, Yinghai!
Let me know if anything else I can help.

WANG Chao

>
> Thanks
>
> Yinghai
>

2013-03-11 04:57:26

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86, kdump: Set crashkernel_low automatically

Current code does not set low range for crashkernel if the user
does not specify that.

That cause regressions on system that does not support intel_iommu
properly.

Chao said that his system does work well on 3.8 without extra parameter.
even iommu does not work with kdump.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

Reported-by: WANG Chao <[email protected]>
Tested-by: WANG Chao <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/setup.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -547,19 +547,28 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /* default swiotlb size and overflow: 64M + 8M */
+ low_size = 72UL<<20;
+ auto_set = true;
+ } else {
+ /* passed with crashkernel_low=0 ? */
+ if (!low_size)
+ return;
+ }

low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);

if (!low_base) {
- pr_info("crashkernel low reservation failed - No suitable area found.\n");
+ if (!auto_set)
+ pr_info("crashkernel low reservation failed - No suitable area found.\n");

return;
}

2013-03-11 13:14:42

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

On Fri, Mar 08, 2013 at 11:39:51AM -0800, Yinghai Lu wrote:
> [ Add more to To list ]
>
> On Fri, Mar 8, 2013 at 10:24 AM, Yinghai Lu <[email protected]> wrote:
> > On Fri, Mar 8, 2013 at 4:12 AM, WANG Chao <[email protected]> wrote:
> >
> >>> what is 00:02.0 in your system?
> >> This IOMMU issue is related to https://lkml.org/lkml/2012/11/26/814. We can
> >> discuss this IOMMU issue in that thread.
> >> Anyway 00:02.0 is a video card, the box is Ivy Bridge.
> >> # lspci -s 00:02.0 -v
> >> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
> >> Graphics Controller (rev 09) (prog-if 00 [VGA controller])
> >> Subsystem: Intel Corporation Device 2211
> >> Flags: bus master, fast devsel, latency 0, IRQ 44
> >> Memory at afc00000 (64-bit, non-prefetchable) [size=4M]
> >> Memory at c0000000 (64-bit, prefetchable) [size=256M]
> >> I/O ports at 6000 [size=64]
> >> Expansion ROM at <unassigned> [disabled]
> >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >> Capabilities: [d0] Power Management version 2
> >> Capabilities: [a4] PCI Advanced Features
> >> Kernel driver in use: i915
> >
> > disable drm for i915 will make your iommu work with dump?
> >
> >>
> >>
> >> Is it expected to intel_iommu=on or crashkernel_low to make 2nd kernel boot in
> >> 3.9? Back in 3.8, it works just fine w/ only crashkernel param.
> >
> > Yes, I really do not want to set crashkernel low range like 72M
> > automatically for all.
> > that would have the system with proper iommu support lose 72M under 4G
> > in first kernel.
> > And can not play allocate and return tricks, as first kernel have no
> > idea if iommu will work
> > on second kernel even iommu is working on first kernel.
> >
> > Better to fix iommu support at first.

It would seem that if we really want to go that route we should export
the number of megabytes that SWIOTLB is using. And it actually is - via the
swiotlb_nr_tbl() - thought it is no megabytes but slabs so you do have
to do some bit-shifting around.

If you want to use that, and perhaps alter the function to be swiotlb_size()
(and the xen-swiotlb to do the proper bit-shifting)?

> >
> > For old system that does not have DMAR or kernel does not have IOMMU
> > support enabled, or
> > user does not pass intel_iommu=on.
> > We could set crashkernel low range to 72M automatically.
>
> It seem that it is not worthy to check case that does not support
> IOMMU in second kernel.
>
> Please check attached patch that will just set crashkernel_low auto, and if the
> system DO support iommu with kdump, user can specify crashkernel_low=0
> to save low 72M.
>
> Thanks
>
> Yinghai

2013-03-11 14:49:09

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Sun, Mar 10, 2013 at 09:56:57PM -0700, Yinghai Lu wrote:
> Current code does not set low range for crashkernel if the user
> does not specify that.
>
> That cause regressions on system that does not support intel_iommu
> properly.
>
> Chao said that his system does work well on 3.8 without extra parameter.
> even iommu does not work with kdump.
>
> Set crashkernel_low automatically if the user does not specify that.
>
> For system that does support IOMMU with kdump properly, user could
> specify crashkernel_low=0 to save that 72M low ram.
>

Hi Yinghai,

Had a question about crashkernel_auto_low. So this is the amount of
memory rerved under 4G. I am not very clear about the semantics here.

So by default memory wil always come from areas above 4G (when
crashkernel=X specified) and if user needs reserveation in lower memory
area, it needs to be sepcified explicitly using crashkernel_low?

But will that not break the case of exising bzImage which are 32bit.
They currently work if I specify crashkernel=256M. Now suddenly memory
will come from higher addresses and 32bit bzImage can't be loaded there.
Or I understood the syntax part wrong.

P.S. Explanation in kernel-parameters.txt is really short. Can you please
add some explanation in kdump.txt to clarify how crashkernel_low
is supposed to be used.

Thanks
Vivek

2013-03-11 15:03:18

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 10:48:53AM -0400, Vivek Goyal wrote:
> On Sun, Mar 10, 2013 at 09:56:57PM -0700, Yinghai Lu wrote:
> > Current code does not set low range for crashkernel if the user
> > does not specify that.
> >
> > That cause regressions on system that does not support intel_iommu
> > properly.
> >
> > Chao said that his system does work well on 3.8 without extra parameter.
> > even iommu does not work with kdump.
> >
> > Set crashkernel_low automatically if the user does not specify that.
> >
> > For system that does support IOMMU with kdump properly, user could
> > specify crashkernel_low=0 to save that 72M low ram.
> >
>
> Hi Yinghai,
>
> Had a question about crashkernel_auto_low. So this is the amount of
> memory rerved under 4G. I am not very clear about the semantics here.
>
> So by default memory wil always come from areas above 4G (when
> crashkernel=X specified) and if user needs reserveation in lower memory
> area, it needs to be sepcified explicitly using crashkernel_low?
>
> But will that not break the case of exising bzImage which are 32bit.
> They currently work if I specify crashkernel=256M. Now suddenly memory
> will come from higher addresses and 32bit bzImage can't be loaded there.
> Or I understood the syntax part wrong.
>

IOW, wouldn't it be better that crashkernel=X first tries to find
requested amount of memory in lowest memory area available/possible.

Thanks
Vivek

2013-03-11 17:58:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 8:02 AM, Vivek Goyal <[email protected]> wrote:
>
> IOW, wouldn't it be better that crashkernel=X first tries to find
> requested amount of memory in lowest memory area available/possible.

Yest, that is much better, and user even could stay with old kexec-tools
for system that does not tons of memory.
And I don't need to mess up with auto setting crashkernel_low or export
swiotlb_size() etc.

Please check if you are ok with attached one.

Thanks

Yinghai


Attachments:
fix_crashkernel_low_v2.patch (3.02 kB)

2013-03-11 18:27:12

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 10:58:39AM -0700, Yinghai Lu wrote:
> On Mon, Mar 11, 2013 at 8:02 AM, Vivek Goyal <[email protected]> wrote:
> >
> > IOW, wouldn't it be better that crashkernel=X first tries to find
> > requested amount of memory in lowest memory area available/possible.
>
> Yest, that is much better, and user even could stay with old kexec-tools
> for system that does not tons of memory.
> And I don't need to mess up with auto setting crashkernel_low or export
> swiotlb_size() etc.
>
> Please check if you are ok with attached one.
>
Hi Yinghai,

In mutt your patches are showing as attachment instead of inline. Mutt
thinks attachment is of type "application/octet-stream". Not sure if
this is configuration issue on my part or something is going on your
end.

I have few more concerns.

- Are we able to reserve 512MB memory now below 896MB. I remember so
far it was broken.

- If reserving memory below 896MB fails, we immediately switch to
reserving anywhere till MAXMEM. Would it make sense to first try
to reserve it below 4G (so that we don't have to worry much about
swiotlb or iommu being on).

Thanks
Vivek

2013-03-11 18:44:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 11:26 AM, Vivek Goyal <[email protected]> wrote:
> On Mon, Mar 11, 2013 at 10:58:39AM -0700, Yinghai Lu wrote:
>> On Mon, Mar 11, 2013 at 8:02 AM, Vivek Goyal <[email protected]> wrote:
>> >
>> > IOW, wouldn't it be better that crashkernel=X first tries to find
>> > requested amount of memory in lowest memory area available/possible.
>>
>> Yest, that is much better, and user even could stay with old kexec-tools
>> for system that does not tons of memory.
>> And I don't need to mess up with auto setting crashkernel_low or export
>> swiotlb_size() etc.
>>
>> Please check if you are ok with attached one.
>>
> Hi Yinghai,
>
> In mutt your patches are showing as attachment instead of inline. Mutt
> thinks attachment is of type "application/octet-stream". Not sure if
> this is configuration issue on my part or something is going on your
> end.

I sent it via gmail and it only can have attachment instead of inline.

>
> I have few more concerns.
>
> - Are we able to reserve 512MB memory now below 896MB. I remember so
> far it was broken.

It also depends BIOS memmap layout. some bios put reserved on middle of ram
like just below 512M or just 2G.

>
> - If reserving memory below 896MB fails, we immediately switch to
> reserving anywhere till MAXMEM. Would it make sense to first try
> to reserve it below 4G (so that we don't have to worry much about
> swiotlb or iommu being on).

ok.

Attached again.

Thanks

Yinghai


Attachments:
fix_crashkernel_low_v2.patch (3.25 kB)

2013-03-11 18:49:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 11:26 AM, Vivek Goyal wrote:
>>
> Hi Yinghai,
>
> In mutt your patches are showing as attachment instead of inline. Mutt
> thinks attachment is of type "application/octet-stream". Not sure if
> this is configuration issue on my part or something is going on your
> end.
>
> I have few more concerns.
>
> - Are we able to reserve 512MB memory now below 896MB. I remember so
> far it was broken.
>

What is the purpose of reserving that kind of memory below 896 MB? If
you have a 32-bit system, it will likely be useless since you are
robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
a magic value in any way...?

-hpa

2013-03-11 18:50:35

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 11:46 AM, H. Peter Anvin <[email protected]> wrote:
> On 03/11/2013 11:26 AM, Vivek Goyal wrote:
>>>
>> Hi Yinghai,
>>
>> In mutt your patches are showing as attachment instead of inline. Mutt
>> thinks attachment is of type "application/octet-stream". Not sure if
>> this is configuration issue on my part or something is going on your
>> end.
>>
>> I have few more concerns.
>>
>> - Are we able to reserve 512MB memory now below 896MB. I remember so
>> far it was broken.
>>
>
> What is the purpose of reserving that kind of memory below 896 MB? If
> you have a 32-bit system, it will likely be useless since you are
> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
> a magic value in any way...?

We did not touch 32 bit system.

Do you mean that we should
For 64bit, we should try under 4G, and then try MAXMEM
instead of try under 896M, then 4G, and MAXMEM?

Try 896M at first, we will let user to avoid updating their kexec-tools.

Thanks

Yinghai

2013-03-11 18:58:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 11:50 AM, Yinghai Lu wrote:
>>
>> What is the purpose of reserving that kind of memory below 896 MB? If
>> you have a 32-bit system, it will likely be useless since you are
>> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
>> a magic value in any way...?
>
> We did not touch 32 bit system.
>
> Do you mean that we should
> For 64bit, we should try under 4G, and then try MAXMEM
> instead of try under 896M, then 4G, and MAXMEM?
>
> Try 896M at first, we will let user to avoid updating their kexec-tools.
>

Are you saying 896M is somehow hardcoded into kexec-tools?

I actually disagree with trying low memory at all. Push kdump as high
into the memory range as we can go, if there is a performance penalty it
is much better to take it in the kdump kernel.

All the voodoo to try to keep people from updating kexec-tools is
disturbing; although breaking userspace is bad, updating kexec-tools is
probably easier than updating the kernel, and carrying the voodoo on
indefinitely has serious consequences.

-hpa

2013-03-11 19:03:01

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 11:46:45AM -0700, H. Peter Anvin wrote:
> On 03/11/2013 11:26 AM, Vivek Goyal wrote:
> >>
> > Hi Yinghai,
> >
> > In mutt your patches are showing as attachment instead of inline. Mutt
> > thinks attachment is of type "application/octet-stream". Not sure if
> > this is configuration issue on my part or something is going on your
> > end.
> >
> > I have few more concerns.
> >
> > - Are we able to reserve 512MB memory now below 896MB. I remember so
> > far it was broken.
> >
>
> What is the purpose of reserving that kind of memory below 896 MB? If
> you have a 32-bit system, it will likely be useless since you are
> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
> a magic value in any way...?

Actually I am not sure where did 896MB magic value had come from for
x86_64 so far. I assumed that it was some kexec-tools limitation so
first trying 896MB will preserve working with old kexec-tools. If it
was some kernel limitation, then I agree it should not be required anymore.

I do remember that old pugatory had 2G limit. So may be we can first
try reserve with-in first 2G, then with-in first 4G and then above
4G. (Assuming 896M was not kexec-tools limitation and had something
to do with kernel/initramfs).

Thanks
Vivek

2013-03-11 19:06:08

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 11:55 AM, H. Peter Anvin <[email protected]> wrote:
> On 03/11/2013 11:50 AM, Yinghai Lu wrote:
>>>
>>> What is the purpose of reserving that kind of memory below 896 MB? If
>>> you have a 32-bit system, it will likely be useless since you are
>>> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
>>> a magic value in any way...?
>>
>> We did not touch 32 bit system.
>>
>> Do you mean that we should
>> For 64bit, we should try under 4G, and then try MAXMEM
>> instead of try under 896M, then 4G, and MAXMEM?
>>
>> Try 896M at first, we will let user to avoid updating their kexec-tools.
>>
>
> Are you saying 896M is somehow hardcoded into kexec-tools?

yes, before kexec-tools 2.0.4

>
> I actually disagree with trying low memory at all. Push kdump as high
> into the memory range as we can go, if there is a performance penalty it
> is much better to take it in the kdump kernel.

Agreed, It's better let 64 bit all use one code path.
And we can find more bugs while load them all high.
otherwise it would be hard to fix them if the bugs only happens on systems
that have bunch of dimms.

>
> All the voodoo to try to keep people from updating kexec-tools is
> disturbing; although breaking userspace is bad, updating kexec-tools is
> probably easier than updating the kernel, and carrying the voodoo on
> indefinitely has serious consequences.

Yes.

So please check you are happy with this one. -v3 that set crashkernel_low
automatically.

Thanks

Yinghai


Attachments:
fix_crashkernel_low_v3.patch (3.65 kB)

2013-03-11 19:07:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 12:02 PM, Vivek Goyal wrote:
>>
>> What is the purpose of reserving that kind of memory below 896 MB? If
>> you have a 32-bit system, it will likely be useless since you are
>> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
>> a magic value in any way...?
>
> Actually I am not sure where did 896MB magic value had come from for
> x86_64 so far. I assumed that it was some kexec-tools limitation so
> first trying 896MB will preserve working with old kexec-tools. If it
> was some kernel limitation, then I agree it should not be required anymore.
>
> I do remember that old pugatory had 2G limit. So may be we can first
> try reserve with-in first 2G, then with-in first 4G and then above
> 4G. (Assuming 896M was not kexec-tools limitation and had something
> to do with kernel/initramfs).
>

It is obvious where it *originated* from... it is the *default* (but not
necessarily the actual!) HIGHMEM crossover point on x86-32.

Whether this limitation has crept into somewhere else is a good question.

-hpa

2013-03-11 19:09:01

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:06 PM, H. Peter Anvin <[email protected]> wrote:
> On 03/11/2013 12:06 PM, Yinghai Lu wrote:
>>>
>>> Are you saying 896M is somehow hardcoded into kexec-tools?
>>
>> yes, before kexec-tools 2.0.4
>>
>
> How old is that?

2.0.4 is not released yet. and 2.0.4 would support load v3.9 that
support kernel is loaded high.

Thanks

Yinghai

2013-03-11 19:09:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 12:06 PM, Yinghai Lu wrote:
>>
>> Are you saying 896M is somehow hardcoded into kexec-tools?
>
> yes, before kexec-tools 2.0.4
>

How old is that?

-hpa

2013-03-11 19:10:37

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 11:55:52AM -0700, H. Peter Anvin wrote:
> On 03/11/2013 11:50 AM, Yinghai Lu wrote:
> >>
> >> What is the purpose of reserving that kind of memory below 896 MB? If
> >> you have a 32-bit system, it will likely be useless since you are
> >> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
> >> a magic value in any way...?
> >
> > We did not touch 32 bit system.
> >
> > Do you mean that we should
> > For 64bit, we should try under 4G, and then try MAXMEM
> > instead of try under 896M, then 4G, and MAXMEM?
> >
> > Try 896M at first, we will let user to avoid updating their kexec-tools.
> >
>
> Are you saying 896M is somehow hardcoded into kexec-tools?
>
> I actually disagree with trying low memory at all. Push kdump as high
> into the memory range as we can go, if there is a performance penalty it
> is much better to take it in the kdump kernel.

Reserving above 2G by default will break old kexec-tools. Purgatory can't
be loaded above 2G.

Loading above 4G will require swiotlb buffers to be reserved in low
memory area. It will break all the existing setups where iommu is
not enabled.

Not breaking existing cases makes sense to me. May be we should use
a new parameter crashkernel_high to force memory reservation above
4G and crashkernel=X continues to reserve memory at lower addresses
and remains backward compatible.

Thanks
Vivek

2013-03-11 19:14:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

"H. Peter Anvin" <[email protected]> writes:

> On 03/11/2013 11:50 AM, Yinghai Lu wrote:
>>>
>>> What is the purpose of reserving that kind of memory below 896 MB? If
>>> you have a 32-bit system, it will likely be useless since you are
>>> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
>>> a magic value in any way...?
>>
>> We did not touch 32 bit system.
>>
>> Do you mean that we should
>> For 64bit, we should try under 4G, and then try MAXMEM
>> instead of try under 896M, then 4G, and MAXMEM?
>>
>> Try 896M at first, we will let user to avoid updating their kexec-tools.
>>
>
> Are you saying 896M is somehow hardcoded into kexec-tools?
>
> I actually disagree with trying low memory at all. Push kdump as high
> into the memory range as we can go, if there is a performance penalty it
> is much better to take it in the kdump kernel.
>
> All the voodoo to try to keep people from updating kexec-tools is
> disturbing; although breaking userspace is bad, updating kexec-tools is
> probably easier than updating the kernel, and carrying the voodoo on
> indefinitely has serious consequences.

I don't totally follow the reasoning, but there is one real motivating
example that is not easy to fix and it has little to do with
kexec-tools. There is a practical issue that so far the easiest way
to deal with iommus after a kexec on panic is to just not use them.
The problem is what to do with existing DMAs transfers that were setup
by the kernel that crashed and are using the iommu.

When you are loaded above 4G not using iommus can be a challenge.

There are practical consequences to all of this that started this
discussion, and the practical consequences are primarily in kernel
behavior.


Eric

2013-03-11 19:17:49

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:04:38PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 12:02 PM, Vivek Goyal wrote:
> >>
> >> What is the purpose of reserving that kind of memory below 896 MB? If
> >> you have a 32-bit system, it will likely be useless since you are
> >> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
> >> a magic value in any way...?
> >
> > Actually I am not sure where did 896MB magic value had come from for
> > x86_64 so far. I assumed that it was some kexec-tools limitation so
> > first trying 896MB will preserve working with old kexec-tools. If it
> > was some kernel limitation, then I agree it should not be required anymore.
> >
> > I do remember that old pugatory had 2G limit. So may be we can first
> > try reserve with-in first 2G, then with-in first 4G and then above
> > 4G. (Assuming 896M was not kexec-tools limitation and had something
> > to do with kernel/initramfs).
> >
>
> It is obvious where it *originated* from... it is the *default* (but not
> necessarily the actual!) HIGHMEM crossover point on x86-32.
>

On x86-32, max addr limit is 512M. 896M limit is on x86_64. So it probably
came from somewhere else.

Also always reserving at high memory cuts down on what kind of bzImage
can be booted from that address. For example, x86_32bit kernels. Hence
reserving at low addresses enables booting more type of images without
rebooting.

Thanks
Vivek

2013-03-11 19:20:27

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:06:06PM -0700, Yinghai Lu wrote:

[..]
> > I actually disagree with trying low memory at all. Push kdump as high
> > into the memory range as we can go, if there is a performance penalty it
> > is much better to take it in the kdump kernel.
>
> Agreed, It's better let 64 bit all use one code path.
> And we can find more bugs while load them all high.
> otherwise it would be hard to fix them if the bugs only happens on systems
> that have bunch of dimms.

I find it odd that if a user wants to load a 32bit kernel or use 32bit
entry point then he needs to first reboot the kernel and re-reserve
the memory.

At installation time, one does not necessarily know what kind of kernel
will be used for crashdump. So reserving as high as possible limits
the choices.

I would rather prefer that user opt in for higher addresses instead of
these being reserved by default.

Thanks
Vivek

2013-03-11 19:22:53

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:14:16PM -0700, Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
> > On 03/11/2013 11:50 AM, Yinghai Lu wrote:
> >>>
> >>> What is the purpose of reserving that kind of memory below 896 MB? If
> >>> you have a 32-bit system, it will likely be useless since you are
> >>> robbing the primary of most of lowmem, on a 64-bit system 896 MB is not
> >>> a magic value in any way...?
> >>
> >> We did not touch 32 bit system.
> >>
> >> Do you mean that we should
> >> For 64bit, we should try under 4G, and then try MAXMEM
> >> instead of try under 896M, then 4G, and MAXMEM?
> >>
> >> Try 896M at first, we will let user to avoid updating their kexec-tools.
> >>
> >
> > Are you saying 896M is somehow hardcoded into kexec-tools?
> >
> > I actually disagree with trying low memory at all. Push kdump as high
> > into the memory range as we can go, if there is a performance penalty it
> > is much better to take it in the kdump kernel.
> >
> > All the voodoo to try to keep people from updating kexec-tools is
> > disturbing; although breaking userspace is bad, updating kexec-tools is
> > probably easier than updating the kernel, and carrying the voodoo on
> > indefinitely has serious consequences.
>
> I don't totally follow the reasoning, but there is one real motivating
> example that is not easy to fix and it has little to do with
> kexec-tools. There is a practical issue that so far the easiest way
> to deal with iommus after a kexec on panic is to just not use them.
> The problem is what to do with existing DMAs transfers that were setup
> by the kernel that crashed and are using the iommu.
>
> When you are loaded above 4G not using iommus can be a challenge.
>
> There are practical consequences to all of this that started this
> discussion, and the practical consequences are primarily in kernel
> behavior.

iommu is a real concern. We have quite a few issues that with iommu
on kdump did not work. So practically we keep iommu off both in first
kernel and second kernel.

So always reserving memory at highest address will break all the cases
which work without iommu and rely on swiotlb. I think first we need
to make sure that kdump works reliably with iommu on, and then try
to move to always reserving memory at higest possible address.

Thanks
Vivek

2013-03-11 19:34:03

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:10 PM, Vivek Goyal <[email protected]> wrote:
> Not breaking existing cases makes sense to me.

that is -v2 version:
try 896M, then try 4G, than MAXMEM.

> May be we should use
> a new parameter crashkernel_high to force memory reservation above
> 4G and crashkernel=X continues to reserve memory at lower addresses
> and remains backward compatible.

No need to use crashkernel_high, we can just cashkernel=X@Y instead.

Thanks

Yinghai

2013-03-11 19:38:35

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:34:00PM -0700, Yinghai Lu wrote:
> On Mon, Mar 11, 2013 at 12:10 PM, Vivek Goyal <[email protected]> wrote:
> > Not breaking existing cases makes sense to me.
>
> that is -v2 version:
> try 896M, then try 4G, than MAXMEM.
>
> > May be we should use
> > a new parameter crashkernel_high to force memory reservation above
> > 4G and crashkernel=X continues to reserve memory at lower addresses
> > and remains backward compatible.
>
> No need to use crashkernel_high, we can just cashkernel=X@Y instead.

crashkernel=X@Y is little different. It assumes user knows the memory
map and location "Y" is fixed. There might not be any memory at "Y".

So it is not same.

Thanks
Vivek

2013-03-11 19:39:58

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:38 PM, Vivek Goyal <[email protected]> wrote:
>> No need to use crashkernel_high, we can just cashkernel=X@Y instead.
>
> crashkernel=X@Y is little different. It assumes user knows the memory
> map and location "Y" is fixed. There might not be any memory at "Y".

then use crashkernel=4G?

2013-03-11 19:44:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:39:56PM -0700, Yinghai Lu wrote:
> On Mon, Mar 11, 2013 at 12:38 PM, Vivek Goyal <[email protected]> wrote:
> >> No need to use crashkernel_high, we can just cashkernel=X@Y instead.
> >
> > crashkernel=X@Y is little different. It assumes user knows the memory
> > map and location "Y" is fixed. There might not be any memory at "Y".
>
> then use crashkernel=4G?

But that will reserve 4G of memory. I just want 128/256 MB of memory
reserved for normal cases. This is even worse than crashkernel=X@Y.

Thanks
Vivek

2013-03-11 19:44:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:39 PM, Yinghai Lu <[email protected]> wrote:
> On Mon, Mar 11, 2013 at 12:38 PM, Vivek Goyal <[email protected]> wrote:
>>> No need to use crashkernel_high, we can just cashkernel=X@Y instead.
>>
>> crashkernel=X@Y is little different. It assumes user knows the memory
>> map and location "Y" is fixed. There might not be any memory at "Y".
>
> then use crashkernel=4G?

I re attached -v2 and -v3.

I'm ok that any one is applied or two get applied all together.

only affect 64bit
-v2: try under under 896M, then 4G, then MAXMEM
-v3: auto set crashkernel_low with swiotlb size.

Thanks

Yinghai


Attachments:
fix_crashkernel_low_v2.patch (3.25 kB)
fix_crashkernel_low_v3.patch (3.65 kB)
Download all attachments

2013-03-11 19:58:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 12:20 PM, Vivek Goyal wrote:
>
> I find it odd that if a user wants to load a 32bit kernel or use 32bit
> entry point then he needs to first reboot the kernel and re-reserve
> the memory.
>
> At installation time, one does not necessarily know what kind of kernel
> will be used for crashdump. So reserving as high as possible limits
> the choices.
>
> I would rather prefer that user opt in for higher addresses instead of
> these being reserved by default.
>

Quite frankly the whole design seems to be held together with chewing
gum. At the core, the problem is a tight coupling between kexec-tools
version, kexec-tools options, and kernel command line options that have
to be combined in very ugly ways. Part of the reason is that the kernel
isn't actually given the information it needs to do the job required.

As far as "if a user wants to load"... why on Earth should that be the
default? How could that *not* be an exceptional case?

-hpa

2013-03-11 19:59:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 12:14 PM, Eric W. Biederman wrote:
>
> I don't totally follow the reasoning, but there is one real motivating
> example that is not easy to fix and it has little to do with
> kexec-tools. There is a practical issue that so far the easiest way
> to deal with iommus after a kexec on panic is to just not use them.
> The problem is what to do with existing DMAs transfers that were setup
> by the kernel that crashed and are using the iommu.
>
> When you are loaded above 4G not using iommus can be a challenge.
>

Isn't that what swiotlb is for?

-hpa

2013-03-11 20:02:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 12:22 PM, Vivek Goyal wrote:
>
> So always reserving memory at highest address will break all the cases
> which work without iommu and rely on swiotlb. I think first we need
> to make sure that kdump works reliably with iommu on, and then try
> to move to always reserving memory at higest possible address.
>

We should clearly always reserve an swiotlb window, *or*, probably much
better, teach the kdump kernel to *make* an swiotlb window (by having a
memory buffer in its reserved memory area into which it copies a chunk
of low memory, just as we do for the bottom megabyte. If we are already
in low memory that buffer becomes the swiotlb window, no copy necessary.)

-hpa

2013-03-11 20:12:56

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:55:55PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 12:20 PM, Vivek Goyal wrote:
> >
> > I find it odd that if a user wants to load a 32bit kernel or use 32bit
> > entry point then he needs to first reboot the kernel and re-reserve
> > the memory.
> >
> > At installation time, one does not necessarily know what kind of kernel
> > will be used for crashdump. So reserving as high as possible limits
> > the choices.
> >
> > I would rather prefer that user opt in for higher addresses instead of
> > these being reserved by default.
> >
>
> Quite frankly the whole design seems to be held together with chewing
> gum. At the core, the problem is a tight coupling between kexec-tools
> version, kexec-tools options, and kernel command line options that have
> to be combined in very ugly ways. Part of the reason is that the kernel
> isn't actually given the information it needs to do the job required.
>
> As far as "if a user wants to load"... why on Earth should that be the
> default? How could that *not* be an exceptional case?

Because it breaks existing user cases. We had this limitation so far
that bzImage has to be loaded in first 896MB. And for 32bit bzImage
entry, I think that is still true?

So how can kernel assume that user is always loading a 64bit bzImage
and reserve memory accordingly.

Also in the past we did not have relocatable kernel and memory had to
be reserved at the address new kernel is built. Thankfully that is
no more the case.

Thanks
Vivek

2013-03-11 20:17:42

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:59:44PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 12:22 PM, Vivek Goyal wrote:
> >
> > So always reserving memory at highest address will break all the cases
> > which work without iommu and rely on swiotlb. I think first we need
> > to make sure that kdump works reliably with iommu on, and then try
> > to move to always reserving memory at higest possible address.
> >
>
> We should clearly always reserve an swiotlb window, *or*, probably much
> better, teach the kdump kernel to *make* an swiotlb window (by having a
> memory buffer in its reserved memory area into which it copies a chunk
> of low memory, just as we do for the bottom megabyte. If we are already
> in low memory that buffer becomes the swiotlb window, no copy necessary.)

This sounds like a good idea. It will require exporting how much memory
should be needed for swiotlb. And then kexec-tools should be able to
backup that memory area and make that memory available to second kernel.

Thanks
Vivek

2013-03-11 20:21:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 01:12 PM, Vivek Goyal wrote:
>>
>> Quite frankly the whole design seems to be held together with chewing
>> gum. At the core, the problem is a tight coupling between kexec-tools
>> version, kexec-tools options, and kernel command line options that have
>> to be combined in very ugly ways. Part of the reason is that the kernel
>> isn't actually given the information it needs to do the job required.
>>
>> As far as "if a user wants to load"... why on Earth should that be the
>> default? How could that *not* be an exceptional case?
>
> Because it breaks existing user cases. We had this limitation so far
> that bzImage has to be loaded in first 896MB. And for 32bit bzImage
> entry, I think that is still true?
>
> So how can kernel assume that user is always loading a 64bit bzImage
> and reserve memory accordingly.
>
> Also in the past we did not have relocatable kernel and memory had to
> be reserved at the address new kernel is built. Thankfully that is
> no more the case.
>

The problem with this argument here is that we are spiraling down the
drain of increasing user-visible complexity in order to not break
existing but exotic use cases. We need to stop and reverse this trend.
I want to make a few observations on this:

1. Running with an archaic kexec-tools should be considered an anomaly.
If necessary, we could introduce a kernel option to let the kernel know
which kexec-tools version the user will use.

2. As long as memory is available, there is always the option to shift
memory around to accommodate the crashkernel. That probably should have
been done all along.

3. The memory size reserved should be deduced automatically to the
greatest possible extent.

-hpa

2013-03-11 20:39:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

"H. Peter Anvin" <[email protected]> writes:

> On 03/11/2013 01:12 PM, Vivek Goyal wrote:
>>>
>>> Quite frankly the whole design seems to be held together with chewing
>>> gum. At the core, the problem is a tight coupling between kexec-tools
>>> version, kexec-tools options, and kernel command line options that have
>>> to be combined in very ugly ways. Part of the reason is that the kernel
>>> isn't actually given the information it needs to do the job required.
>>>
>>> As far as "if a user wants to load"... why on Earth should that be the
>>> default? How could that *not* be an exceptional case?
>>
>> Because it breaks existing user cases. We had this limitation so far
>> that bzImage has to be loaded in first 896MB. And for 32bit bzImage
>> entry, I think that is still true?
>>
>> So how can kernel assume that user is always loading a 64bit bzImage
>> and reserve memory accordingly.
>>
>> Also in the past we did not have relocatable kernel and memory had to
>> be reserved at the address new kernel is built. Thankfully that is
>> no more the case.
>>
>
> The problem with this argument here is that we are spiraling down the
> drain of increasing user-visible complexity in order to not break
> existing but exotic use cases. We need to stop and reverse this trend.
> I want to make a few observations on this:

> 1. Running with an archaic kexec-tools should be considered an anomaly.
> If necessary, we could introduce a kernel option to let the kernel know
> which kexec-tools version the user will use.

Sure. Running with the last release of kexec-tools before new changes
were made is not at all unreasonable, as updating both tools in sync is
a practical problem.

Having thought about this a little more with no changes and reserving
memory high we can run with any memory location we want as the syntax
crashkernel=AMOUNT@LOC is still supported.

A distro may not be able to automate that but shrug, a distro can
upgrade to the latest and greatest version of the tools assuming
those tools can support loading high.

> 2. As long as memory is available, there is always the option to shift
> memory around to accommodate the crashkernel. That probably should have
> been done all along.

Arguable. The core strategy is to reserve memory at the beginning of
time so we have memory that we know has never been used for DMA, so
there is a very strong chance that memory will never be the target of
a DMA operation. The expectation is that we do the shifting around at
boot time.

I doubt we have a mechanism in place that can actually shift around
memory in the quantities some people are after, after a system boots.

Now quite frankly I think there are some very silly things going on.
Why does makedumpfile need to allocate and create a huge bitmap of
which pages to dump?

Last I was playing with this I had my amount of reserved memory down to
32MiB or was it 8MiB. It was very small and for the small system I was
on it worked fine.

I totally makes sense to figure out how to load a kernel high. I am not
convinced kexec on panic is the best use of that ability. I would argue
that it might be better to figure out how to use a small memory
foot-print and try to keep that foot-print from growing.

Eric

2013-03-11 20:39:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 01:19 PM, H. Peter Anvin wrote:
>
> The problem with this argument here is that we are spiraling down the
> drain of increasing user-visible complexity in order to not break
> existing but exotic use cases. We need to stop and reverse this trend.
> I want to make a few observations on this:
>
> 1. Running with an archaic kexec-tools should be considered an anomaly.
> If necessary, we could introduce a kernel option to let the kernel know
> which kexec-tools version the user will use.
>
> 2. As long as memory is available, there is always the option to shift
> memory around to accommodate the crashkernel. That probably should have
> been done all along.
>
> 3. The memory size reserved should be deduced automatically to the
> greatest possible extent.
>

The really big picture problem here is that the host kernel is expected
to predict at boot time what will happen in the future: what are the
requirements of the kdump kernel, and its tools, which hasn't been
loaded yet?

Can we get past that as a fundamental problem?

-hpa

2013-03-11 20:44:57

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 01:38 PM, Eric W. Biederman wrote:
>
> I totally makes sense to figure out how to load a kernel high. I am not
> convinced kexec on panic is the best use of that ability. I would argue
> that it might be better to figure out how to use a small memory
> foot-print and try to keep that foot-print from growing.
>

I could not agree more with this. The whole reason this is a problem is
because we are talking about hundreds of megabytes of crashkernel. If
we can fix *that* problem, we are much better off.

-hpa

2013-03-11 20:45:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 01:38:53PM -0700, Eric W. Biederman wrote:


[..]
> I would argue
> that it might be better to figure out how to use a small memory
> foot-print and try to keep that foot-print from growing.

In my experience, trying to keep foot-print small has kind of been a
losing battle.

- People want more functionality in second kernel, want to dump to more
complicated IO stacks and that requires pulling in more drivers,
more libraries, more daemons, more user space tools and what not.

- Now we use dracut generated initramfs and it has been growing in size.
Now systemd has been pulled in too.

- Drivers keep on increasing their memory usage.

- makdumpfile needs more memory to dump large machines.

There are so many places where memory usage is going up and trying
to keep track of all that has been very hard.

So I would think that it is a good goal to have but continuously
increasing memory usage is inevitable.

Thanks
Vivek

2013-03-11 20:52:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 01:45 PM, Vivek Goyal wrote:
>
> - Now we use dracut generated initramfs and it has been growing in size.
> Now systemd has been pulled in too.
>

And the solution to that isn't obvious?

> - makdumpfile needs more memory to dump large machines.
>
> There are so many places where memory usage is going up and trying
> to keep track of all that has been very hard.

Seriously, in particular the O(n) memory requirements you may want to
think very very hard about.

-hpa

2013-03-11 20:57:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 1:45 PM, Vivek Goyal <[email protected]> wrote:
> In my experience, trying to keep foot-print small has kind of been a
> losing battle.
>
> - People want more functionality in second kernel, want to dump to more
> complicated IO stacks and that requires pulling in more drivers,
> more libraries, more daemons, more user space tools and what not.
>
> - Now we use dracut generated initramfs and it has been growing in size.
> Now systemd has been pulled in too.
>
> - Drivers keep on increasing their memory usage.

If the dump file will be only put one place, why should all the drivers
for all devices get loaded?
for example: dump will be on disk with one scsi controller, can you only
load driver that contoller? and forget about all other storage controller and
network etc drivers.

Thanks

Yinghai

2013-03-11 21:03:45

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 01:50:21PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 01:45 PM, Vivek Goyal wrote:
> >
> > - Now we use dracut generated initramfs and it has been growing in size.
> > Now systemd has been pulled in too.
> >
>
> And the solution to that isn't obvious?

Sorry, I did not understand what do you mean by above.

If you are suggesting that move away from dracut, it does not work
in practice. Initially we wrote our custom code to generate custom
initramfs, and we were always lagging in terms of what dump targets
can be supported and kept on constantly fixing the issues which had
been taken care of in dracut one way or other. So it was like
maintaining a duplicate initramfs generation tool.

So we do not want to use non-standard tools just for kdump. dracut
generates the initramfs for first kernel and then it should be able
to for second kernel too.

Another problem is that other user space component developers, they don't
know that they are supposed to work with 64MB in total too. Same is true for
anybody who is writing driver code.

And bloated memory usage is detected, after the fact. After that one
can keep on chasing people, and they say that it is their feature
requirement. And it is not possible to go and optimize every subsystem
so that together they can boot and work with 64MB.

>
> > - makdumpfile needs more memory to dump large machines.
> >
> > There are so many places where memory usage is going up and trying
> > to keep track of all that has been very hard.
>
> Seriously, in particular the O(n) memory requirements you may want to
> think very very hard about.

Well we now also have a mode in makedumpfile where memory requirement is
O(1). Just that it takes more cpu and takes much longer to dump. May be it
can be improved further.

I am more worried about kernel drivers, and all the user space we need
to pull in to initramfs to meet more advanced requirements in kdump
environment.

Thanks
Vivek

2013-03-11 21:06:34

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 01:57:41PM -0700, Yinghai Lu wrote:
> On Mon, Mar 11, 2013 at 1:45 PM, Vivek Goyal <[email protected]> wrote:
> > In my experience, trying to keep foot-print small has kind of been a
> > losing battle.
> >
> > - People want more functionality in second kernel, want to dump to more
> > complicated IO stacks and that requires pulling in more drivers,
> > more libraries, more daemons, more user space tools and what not.
> >
> > - Now we use dracut generated initramfs and it has been growing in size.
> > Now systemd has been pulled in too.
> >
> > - Drivers keep on increasing their memory usage.
>
> If the dump file will be only put one place, why should all the drivers
> for all devices get loaded?
> for example: dump will be on disk with one scsi controller, can you only
> load driver that contoller? and forget about all other storage controller and
> network etc drivers.

We do try to optimize things this way. Only include drivers as needed. But
a single driver might be handling lots of cards and then try to bring
up all the cards in second kernel.

Just fixed some issues with one driver where around 40MB extra memory
was benig consumed because of multiqueue support. Just bunch of allocation
in driver. And we used kernel command line to disable it.

Point being, in practice how many software development areas one can
change to keep memory requirement with-in few MBs. I have been doing
it for last few years and it has been very hard.

Thanks
Vivek

2013-03-11 21:09:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 02:03 PM, Vivek Goyal wrote:
>>
>> And the solution to that isn't obvious?
>
> Sorry, I did not understand what do you mean by above.
>
> If you are suggesting that move away from dracut, it does not work
> in practice. Initially we wrote our custom code to generate custom
> initramfs, and we were always lagging in terms of what dump targets
> can be supported and kept on constantly fixing the issues which had
> been taken care of in dracut one way or other. So it was like
> maintaining a duplicate initramfs generation tool.
>
> So we do not want to use non-standard tools just for kdump. dracut
> generates the initramfs for first kernel and then it should be able
> to for second kernel too.
>
> Another problem is that other user space component developers, they don't
> know that they are supposed to work with 64MB in total too. Same is true for
> anybody who is writing driver code.
>
> And bloated memory usage is detected, after the fact. After that one
> can keep on chasing people, and they say that it is their feature
> requirement. And it is not possible to go and optimize every subsystem
> so that together they can boot and work with 64MB.
>

Your problem is fundamentally that you are using the wrong tool for the
job, simply because it is expedient to you. Arguably dracut & co are
the wrong tool for any job given the enormous amount of bloat it
entails, but at least in the normal kernel case it only affects boot
time as it is jettisoned, but in your case it is not.

>>
>>> - makdumpfile needs more memory to dump large machines.
>>>
>>> There are so many places where memory usage is going up and trying
>>> to keep track of all that has been very hard.
>>
>> Seriously, in particular the O(n) memory requirements you may want to
>> think very very hard about.
>
> Well we now also have a mode in makedumpfile where memory requirement is
> O(1). Just that it takes more cpu and takes much longer to dump. May be it
> can be improved further.
>
> I am more worried about kernel drivers, and all the user space we need
> to pull in to initramfs to meet more advanced requirements in kdump
> environment.
>

That seems like a problem you need to deal with, or you will soon be so
bloated that you have substantial performance impact on any system.

-hpa

2013-03-11 21:11:03

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 01:42:37PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 01:38 PM, Eric W. Biederman wrote:
> >
> > I totally makes sense to figure out how to load a kernel high. I am not
> > convinced kexec on panic is the best use of that ability. I would argue
> > that it might be better to figure out how to use a small memory
> > foot-print and try to keep that foot-print from growing.
> >
>
> I could not agree more with this. The whole reason this is a problem is
> because we are talking about hundreds of megabytes of crashkernel. If
> we can fix *that* problem, we are much better off.

This kind of goal is achievable only if we also restrict how much memory
first kernel can use. Then all the dependent component will be aware of
how severe boot requirements are. And there is no such requirement. It is
just a goal that keep memory bloat to the minimum. But bloat does not mean
that booting is not allowed. To me kdump environment requirement should be
no different. Otherwise enforcing them becomes a challenge.

Thanks
Vivek

2013-03-11 21:21:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/11/2013 02:10 PM, Vivek Goyal wrote:
>
> To me kdump environment requirement should be
> no different.
>

kdump *is* different. Sounds like you need to realize and deal with
that fact.

-hpa

2013-03-11 21:27:17

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 2:06 PM, Vivek Goyal <[email protected]> wrote:
> On Mon, Mar 11, 2013 at 01:57:41PM -0700, Yinghai Lu wrote:
>> On Mon, Mar 11, 2013 at 1:45 PM, Vivek Goyal <[email protected]> wrote:
>> > In my experience, trying to keep foot-print small has kind of been a
>> > losing battle.
>> >
>> > - People want more functionality in second kernel, want to dump to more
>> > complicated IO stacks and that requires pulling in more drivers,
>> > more libraries, more daemons, more user space tools and what not.
>> >
>> > - Now we use dracut generated initramfs and it has been growing in size.
>> > Now systemd has been pulled in too.
>> >
>> > - Drivers keep on increasing their memory usage.
>>
>> If the dump file will be only put one place, why should all the drivers
>> for all devices get loaded?
>> for example: dump will be on disk with one scsi controller, can you only
>> load driver that contoller? and forget about all other storage controller and
>> network etc drivers.
>
> We do try to optimize things this way. Only include drivers as needed. But
> a single driver might be handling lots of cards and then try to bring
> up all the cards in second kernel.

Can you disable auto_probe at first? and then use /sys bind driver to specific
device.

>
> Just fixed some issues with one driver where around 40MB extra memory
> was benig consumed because of multiqueue support. Just bunch of allocation
> in driver. And we used kernel command line to disable it.

Yes that the is one problem, we should have some facility or helpers to reject
those invalid request. and driver should reduce their foot print step by step.
for example, we only have 64M, but first driver want to alloc 32M, it
should be just rejected.

2013-03-12 08:35:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically


* Vivek Goyal <[email protected]> wrote:

> On Mon, Mar 11, 2013 at 01:50:21PM -0700, H. Peter Anvin wrote:
> > On 03/11/2013 01:45 PM, Vivek Goyal wrote:
> > >
> > > - Now we use dracut generated initramfs and it has been growing in
> > > size. Now systemd has been pulled in too.
> >
> > And the solution to that isn't obvious?
>
> Sorry, I did not understand what do you mean by above.
>
> If you are suggesting that move away from dracut, it does not work in
> practice. Initially we wrote our custom code to generate custom
> initramfs, and we were always lagging in terms of what dump targets can
> be supported and kept on constantly fixing the issues which had been
> taken care of in dracut one way or other. So it was like maintaining a
> duplicate initramfs generation tool.

The fundamental design problem is this artificial split of the kernel from
kexec-tools, just to support an arguably exotic feature, which in turn
then tries to support a complex compatibility matrix - making each variant
even more super exotic. There's just not enough usage and not enough
manpower to keep all that tidy ...

If there was tools/kexec/ then many of these constraints and quirks with
old versions would go away: old kernels would come with old kexec tools,
new kernels would come with new kexec tools.

Just look at how tools/perf/ is packaged up with new kernels: you
generally get a new perf with a new kernel version. Alone this eliminates
a fair bit of support complexity and makes it easier to keep users
uptodate.

[ kexec tooling could go even farther: if included in the initramfs then
it could do away with ABI constraints and compatibility expectations
altogether.

This is one of the cases where it _does_ make sense: kexec tools and in
general kernel image analysis is obviously coupled to the kernel's
current data structures. ]

If this was fixed then kexec could step a whole lot further, not just in
terms of robustness, but also in terms of feature set - and, ultimately,
increased usage by users and kernel developers.

Thanks,

Ingo

2013-03-12 13:47:08

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 02:06:57PM -0700, H. Peter Anvin wrote:
> On 03/11/2013 02:03 PM, Vivek Goyal wrote:
> >>
> >> And the solution to that isn't obvious?
> >
> > Sorry, I did not understand what do you mean by above.
> >
> > If you are suggesting that move away from dracut, it does not work
> > in practice. Initially we wrote our custom code to generate custom
> > initramfs, and we were always lagging in terms of what dump targets
> > can be supported and kept on constantly fixing the issues which had
> > been taken care of in dracut one way or other. So it was like
> > maintaining a duplicate initramfs generation tool.
> >
> > So we do not want to use non-standard tools just for kdump. dracut
> > generates the initramfs for first kernel and then it should be able
> > to for second kernel too.
> >
> > Another problem is that other user space component developers, they don't
> > know that they are supposed to work with 64MB in total too. Same is true for
> > anybody who is writing driver code.
> >
> > And bloated memory usage is detected, after the fact. After that one
> > can keep on chasing people, and they say that it is their feature
> > requirement. And it is not possible to go and optimize every subsystem
> > so that together they can boot and work with 64MB.
> >
>
> Your problem is fundamentally that you are using the wrong tool for the
> job, simply because it is expedient to you. Arguably dracut & co are
> the wrong tool for any job given the enormous amount of bloat it
> entails, but at least in the normal kernel case it only affects boot
> time as it is jettisoned, but in your case it is not.

Dracut is just one piece of the puzzle. We are optimizing away dracut
for kdump environment. We generate host only initramfs and using command
line options actively exclude the components which we don't need when
generating kdump initramfs.

But as kdump usage grows, customers want more out of kdump both in
terms of features and performance. For example, now we support dumping
to multipath over iscsi targets. For this target we first pack relevant
networking drivers and networking utilities, then pack in iscsi related
utiilties, then all the multipath related components go in and on top of
that all the lvm related utilities and components and dependent libraries
go in. And inclusion of more components leads to memory bloat.

I do agree with the optimization part though. That is include minimal
set of components and be vigilant about memory usage of various
components.

One of the things which could help is if we had a way to track module
memory usage. May be some /sys interface which could export how much
memory module is using. We could use that as debug output and
over a period of time we could figure out how module memory usage is
growing and who are worst offenders and where are the optimization
opportunities.

Thanks
Vivek

2013-03-18 14:46:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 11, 2013 at 12:44:15PM -0700, Yinghai Lu wrote:
> On Mon, Mar 11, 2013 at 12:39 PM, Yinghai Lu <[email protected]> wrote:
> > On Mon, Mar 11, 2013 at 12:38 PM, Vivek Goyal <[email protected]> wrote:
> >>> No need to use crashkernel_high, we can just cashkernel=X@Y instead.
> >>
> >> crashkernel=X@Y is little different. It assumes user knows the memory
> >> map and location "Y" is fixed. There might not be any memory at "Y".
> >
> > then use crashkernel=4G?
>
> I re attached -v2 and -v3.
>
> I'm ok that any one is applied or two get applied all together.
>
> only affect 64bit
> -v2: try under under 896M, then 4G, then MAXMEM
> -v3: auto set crashkernel_low with swiotlb size.

So, what's the resolution on this issue. Is the verdict that we always
reserve high. Update your kexec-tools to cope with that. For swiotlb
issue, modify kexec-tools to copy low memory in backup region and give
low memory to second kernel for software iotlb?

I am assuming that updating tools will not help with loading and using
of 32bit bzImage. So user needs to use crashkernel=X@Y syntax to cope
with that? (Given the fact that copying code around after crash carries
the greater risk of ongoing DMA on destination location).

Thanks
Vivek

2013-03-18 15:34:04

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 18, 2013 at 10:46:03AM -0400, Vivek Goyal wrote:
> On Mon, Mar 11, 2013 at 12:44:15PM -0700, Yinghai Lu wrote:
> > On Mon, Mar 11, 2013 at 12:39 PM, Yinghai Lu <[email protected]> wrote:
> > > On Mon, Mar 11, 2013 at 12:38 PM, Vivek Goyal <[email protected]> wrote:
> > >>> No need to use crashkernel_high, we can just cashkernel=X@Y instead.
> > >>
> > >> crashkernel=X@Y is little different. It assumes user knows the memory
> > >> map and location "Y" is fixed. There might not be any memory at "Y".
> > >
> > > then use crashkernel=4G?
> >
> > I re attached -v2 and -v3.
> >
> > I'm ok that any one is applied or two get applied all together.
> >
> > only affect 64bit
> > -v2: try under under 896M, then 4G, then MAXMEM
> > -v3: auto set crashkernel_low with swiotlb size.
>
> So, what's the resolution on this issue. Is the verdict that we always
> reserve high. Update your kexec-tools to cope with that. For swiotlb
> issue, modify kexec-tools to copy low memory in backup region and give
> low memory to second kernel for software iotlb?
>
> I am assuming that updating tools will not help with loading and using
> of 32bit bzImage. So user needs to use crashkernel=X@Y syntax to cope
> with that? (Given the fact that copying code around after crash carries
> the greater risk of ongoing DMA on destination location).

Thinking more about it, if ongoing DMA is an issue, then setting up
software iotlb in those areas is also prone to being overwritten by
those DMAs. Hence, reserving memory low where no DMA is setup by first
kernel, seems somewhat safer.

Thanks
Vivek

2013-03-18 19:05:26

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 18, 2013 at 8:33 AM, Vivek Goyal <[email protected]> wrote:
> Thinking more about it, if ongoing DMA is an issue, then setting up
> software iotlb in those areas is also prone to being overwritten by
> those DMAs. Hence, reserving memory low where no DMA is setup by first
> kernel, seems somewhat safer.

So stick with crashkernel_low ? aka if system does not support iommu
with kdump, they need to append
crashkernel_low in first kernel.

I think that could be way to force vendor to double check to find out
why their system does not support
iommu with kdump.

BTW, our systems do support iommu with kdump without problem.

Thanks

Yinghai

2013-03-18 19:13:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/18/2013 08:33 AM, Vivek Goyal wrote:
>
> Thinking more about it, if ongoing DMA is an issue, then setting up
> software iotlb in those areas is also prone to being overwritten by
> those DMAs. Hence, reserving memory low where no DMA is setup by first
> kernel, seems somewhat safer.
>

Agreed. We really should reserve some memory low.

-hpa

2013-03-18 20:00:57

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 18, 2013 at 12:10:47PM -0700, H. Peter Anvin wrote:
> On 03/18/2013 08:33 AM, Vivek Goyal wrote:
> >
> > Thinking more about it, if ongoing DMA is an issue, then setting up
> > software iotlb in those areas is also prone to being overwritten by
> > those DMAs. Hence, reserving memory low where no DMA is setup by first
> > kernel, seems somewhat safer.
> >
>
> Agreed. We really should reserve some memory low.

So which approach do you like for reserving some memory low.

- User specifies crashkernel_low=X to reserve some memory. Biggest problem
here is how does user know how much memory is required for setting up
swiotlb.

- Take yinghai's patch where by default low memory for swiotlb is reserved
and a user need to opt out of it using crashkernel_low=0 if system has
iommu enabled.

- crashkernel=X by default first looks for specified memory in low
memory area.


I kind of like yinghai's approach. It is little wasteful of memory when
memory is reserved high but atleast user does not have know how much memory
to reserve low it works both when memory is reserved low (system does
not have any RAM mapped above 4G) and when memory is reserved high.

Thanks
Vivek

2013-03-18 21:17:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, kdump: Set crashkernel_low automatically

On 03/18/2013 01:00 PM, Vivek Goyal wrote:
> On Mon, Mar 18, 2013 at 12:10:47PM -0700, H. Peter Anvin wrote:
>> On 03/18/2013 08:33 AM, Vivek Goyal wrote:
>>>
>>> Thinking more about it, if ongoing DMA is an issue, then setting up
>>> software iotlb in those areas is also prone to being overwritten by
>>> those DMAs. Hence, reserving memory low where no DMA is setup by first
>>> kernel, seems somewhat safer.
>>>
>>
>> Agreed. We really should reserve some memory low.
>
> So which approach do you like for reserving some memory low.
>
> - User specifies crashkernel_low=X to reserve some memory. Biggest problem
> here is how does user know how much memory is required for setting up
> swiotlb.
>
> - Take yinghai's patch where by default low memory for swiotlb is reserved
> and a user need to opt out of it using crashkernel_low=0 if system has
> iommu enabled.
>
> - crashkernel=X by default first looks for specified memory in low
> memory area.
>
> I kind of like yinghai's approach. It is little wasteful of memory when
> memory is reserved high but atleast user does not have know how much memory
> to reserve low it works both when memory is reserved low (system does
> not have any RAM mapped above 4G) and when memory is reserved high.
>

I would agree, I think it is the most user friendly.

-hpa

2013-03-18 21:26:34

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3] x86, kdump: Set crashkernel_low automatically

Current code does not set low range for crashkernel if the user
does not specify that.

That cause regressions on system that does not support intel_iommu
properly.

Chao said that his system does work well on 3.8 without extra parameter.
even iommu does not work with kdump.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.

Reported-by: WANG Chao <[email protected]>
Tested-by: WANG Chao <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/setup.c | 15 ++++++++++++---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 19 +++++++++++++++----
3 files changed, 28 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -522,19 +522,28 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /* swiotlb size and etc 8M */
+ low_size = swiotlb_size_or_default() + (8UL<<20);
+ auto_set = true;
+ } else {
+ /* passed with crashkernel_low=0 ? */
+ if (!low_size)
+ return;
+ }

low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);

if (!low_base) {
- pr_info("crashkernel low reservation failed - No suitable area found.\n");
+ if (!auto_set)
+ pr_info("crashkernel low reservation failed - No suitable area found.\n");

return;
}
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -25,6 +25,7 @@ extern int swiotlb_force;
extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
+unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);

/*
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
if (!strcmp(str, "force"))
swiotlb_force = 1;

- return 1;
+ return 0;
}
-__setup("swiotlb=", setup_io_tlb_npages);
+early_param("swiotlb", setup_io_tlb_npages);
/* make io_tlb_overflow tunable too? */

unsigned long swiotlb_nr_tbl(void)
@@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
return io_tlb_nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
+
+/* default to 64MB */
+#define IO_TLB_DEFAULT_SIZE (64UL<<20)
+unsigned long swiotlb_size_or_default(void)
+{
+ unsigned long size;
+
+ size = io_tlb_nslabs << IO_TLB_SHIFT;
+
+ return size ? size : (IO_TLB_DEFAULT_SIZE);
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
void __init
swiotlb_init(int verbose)
{
- /* default to 64MB */
- size_t default_size = 64UL<<20;
+ size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;

2013-03-18 22:54:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3] x86, kdump: Set crashkernel_low automatically

On 03/18/2013 02:25 PM, Yinghai Lu wrote:
> Current code does not set low range for crashkernel if the user
> does not specify that.
>
> That cause regressions on system that does not support intel_iommu
> properly.
>
> Chao said that his system does work well on 3.8 without extra parameter.
> even iommu does not work with kdump.
>
> Set crashkernel_low automatically if the user does not specify that.
>
> For system that does support IOMMU with kdump properly, user could
> specify crashkernel_low=0 to save that 72M low ram.
>
> -v3: add swiotlb_size() according to Konrad.
>
> Reported-by: WANG Chao <[email protected]>
> Tested-by: WANG Chao <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>

Can we get a bit more of an explanation instead of "and etc 8M"? At
least a hint of what kind of objects would go in there...

-hpa

2013-03-18 23:26:59

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v3] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 18, 2013 at 3:52 PM, H. Peter Anvin <[email protected]> wrote:
> On 03/18/2013 02:25 PM, Yinghai Lu wrote:
>> Current code does not set low range for crashkernel if the user
>> does not specify that.
>>
>> That cause regressions on system that does not support intel_iommu
>> properly.
>>
>> Chao said that his system does work well on 3.8 without extra parameter.
>> even iommu does not work with kdump.
>>
>> Set crashkernel_low automatically if the user does not specify that.
>>
>> For system that does support IOMMU with kdump properly, user could
>> specify crashkernel_low=0 to save that 72M low ram.
>>
>> -v3: add swiotlb_size() according to Konrad.
>>
>> Reported-by: WANG Chao <[email protected]>
>> Tested-by: WANG Chao <[email protected]>
>> Signed-off-by: Yinghai Lu <[email protected]>
>
> Can we get a bit more of an explanation instead of "and etc 8M"? At
> least a hint of what kind of objects would go in there...

now only have:
swiotlb overflow buffer

v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
PAGE_ALIGN(io_tlb_overflow));

and
/*
* When the IOMMU overflows we return a fallback buffer. This
sets the size.
*/
static unsigned long io_tlb_overflow = 32*1024;

so it is 32K, and I round it to 8M.

2013-03-19 00:29:57

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3] x86, kdump: Set crashkernel_low automatically

On 03/18/2013 04:26 PM, Yinghai Lu wrote:
>
> now only have:
> swiotlb overflow buffer
>
> v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
> PAGE_ALIGN(io_tlb_overflow));
>
> and
> /*
> * When the IOMMU overflows we return a fallback buffer. This
> sets the size.
> */
> static unsigned long io_tlb_overflow = 32*1024;
>
> so it is 32K, and I round it to 8M.
>

So put that into prose, understandable by someone who hasn't followed
this discussion (say, five years from now), and make that part of the
commit.

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-19 01:04:43

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v4] x86, kdump: Set crashkernel_low automatically

Current code does not set low range for crashkernel if the user
does not specify that.

That cause regressions on system that does not support intel_iommu
properly.

Chao said that his system does work well on 3.8 without extra parameter.
even iommu does not work with kdump.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt

Reported-by: WANG Chao <[email protected]>
Tested-by: WANG Chao <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
Documentation/kernel-parameters.txt | 15 ++++++++++++---
arch/x86/kernel/setup.c | 20 +++++++++++++++++---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 19 +++++++++++++++----
4 files changed, 45 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /*
+ * two parts from lib/swiotlb.c:
+ * swiotlb size: user specified with swiotlb= or default.
+ * swiotlb overflow buffer: now is hardcoded to 32k,
+ * round to 8M to cover more others.
+ */
+ low_size = swiotlb_size_or_default() + (8UL<<20);
+ auto_set = true;
+ } else {
+ /* passed with crashkernel_low=0 ? */
+ if (!low_size)
+ return;
+ }

low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);

if (!low_base) {
- pr_info("crashkernel low reservation failed - No suitable area found.\n");
+ if (!auto_set)
+ pr_info("crashkernel low reservation failed - No suitable area found.\n");

return;
}
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -25,6 +25,7 @@ extern int swiotlb_force;
extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
+unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);

/*
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
if (!strcmp(str, "force"))
swiotlb_force = 1;

- return 1;
+ return 0;
}
-__setup("swiotlb=", setup_io_tlb_npages);
+early_param("swiotlb", setup_io_tlb_npages);
/* make io_tlb_overflow tunable too? */

unsigned long swiotlb_nr_tbl(void)
@@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
return io_tlb_nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
+
+/* default to 64MB */
+#define IO_TLB_DEFAULT_SIZE (64UL<<20)
+unsigned long swiotlb_size_or_default(void)
+{
+ unsigned long size;
+
+ size = io_tlb_nslabs << IO_TLB_SHIFT;
+
+ return size ? size : (IO_TLB_DEFAULT_SIZE);
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
void __init
swiotlb_init(int verbose)
{
- /* default to 64MB */
- size_t default_size = 64UL<<20;
+ size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.

- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_low=size[KMG]
+ [KNL, x86_64] range under 4G. When crashkernel= is
+ passed, kernel allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that need swiotlb later. Kernel would try to allocate
+ some region below 4G automatically. This one let
+ user to specify own low range under 4G for second
+ kernel instead.
+ 0: to disable low allocation on systems that do not
+ need swiotlb, that will save 72M low ram in first
+ kernel.
+
cs89x0_dma= [HW,NET]
Format: <dma>

2013-03-19 13:33:45

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v4] x86, kdump: Set crashkernel_low automatically

On Mon, Mar 18, 2013 at 06:04:08PM -0700, Yinghai Lu wrote:
> Current code does not set low range for crashkernel if the user
> does not specify that.

Hi Yinghai,

While we are modifying changelog, it will also be beneficial to
mention that how did we end up in this situation. Can we mention
changelog little more explanatory, like as follows.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available). And this will cause regression if iommu is not enabled.
Without iommu, swiotlb needs to be setup in first 4G and there is no
low memory available to second kernel.

thanks
Vivek
>
> That cause regressions on system that does not support intel_iommu
> properly.
>
> Chao said that his system does work well on 3.8 without extra parameter.
> even iommu does not work with kdump.
>
> Set crashkernel_low automatically if the user does not specify that.
>
> For system that does support IOMMU with kdump properly, user could
> specify crashkernel_low=0 to save that 72M low ram.
>
> -v3: add swiotlb_size() according to Konrad.
> -v4: add comments what 8M is for according to hpa.
> also update more crashkernel_low= in kernel-parameters.txt
>
> Reported-by: WANG Chao <[email protected]>
> Tested-by: WANG Chao <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>
>
> ---
> Documentation/kernel-parameters.txt | 15 ++++++++++++---
> arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> include/linux/swiotlb.h | 1 +
> lib/swiotlb.c | 19 +++++++++++++++----
> 4 files changed, 45 insertions(+), 10 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
> unsigned long long low_base = 0, low_size = 0;
> unsigned long total_low_mem;
> unsigned long long base;
> + bool auto_set = false;
> int ret;
>
> total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
> ret = parse_crashkernel_low(boot_command_line, total_low_mem,
> &low_size, &base);
> - if (ret != 0 || low_size <= 0)
> - return;
> + if (ret != 0) {
> + /*
> + * two parts from lib/swiotlb.c:
> + * swiotlb size: user specified with swiotlb= or default.
> + * swiotlb overflow buffer: now is hardcoded to 32k,
> + * round to 8M to cover more others.
> + */
> + low_size = swiotlb_size_or_default() + (8UL<<20);
> + auto_set = true;
> + } else {
> + /* passed with crashkernel_low=0 ? */
> + if (!low_size)
> + return;
> + }
>
> low_base = memblock_find_in_range(low_size, (1ULL<<32),
> low_size, alignment);
>
> if (!low_base) {
> - pr_info("crashkernel low reservation failed - No suitable area found.\n");
> + if (!auto_set)
> + pr_info("crashkernel low reservation failed - No suitable area found.\n");
>
> return;
> }
> Index: linux-2.6/include/linux/swiotlb.h
> ===================================================================
> --- linux-2.6.orig/include/linux/swiotlb.h
> +++ linux-2.6/include/linux/swiotlb.h
> @@ -25,6 +25,7 @@ extern int swiotlb_force;
> extern void swiotlb_init(int verbose);
> int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> extern unsigned long swiotlb_nr_tbl(void);
> +unsigned long swiotlb_size_or_default(void);
> extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>
> /*
> Index: linux-2.6/lib/swiotlb.c
> ===================================================================
> --- linux-2.6.orig/lib/swiotlb.c
> +++ linux-2.6/lib/swiotlb.c
> @@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
> if (!strcmp(str, "force"))
> swiotlb_force = 1;
>
> - return 1;
> + return 0;
> }
> -__setup("swiotlb=", setup_io_tlb_npages);
> +early_param("swiotlb", setup_io_tlb_npages);
> /* make io_tlb_overflow tunable too? */
>
> unsigned long swiotlb_nr_tbl(void)
> @@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
> return io_tlb_nslabs;
> }
> EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
> +
> +/* default to 64MB */
> +#define IO_TLB_DEFAULT_SIZE (64UL<<20)
> +unsigned long swiotlb_size_or_default(void)
> +{
> + unsigned long size;
> +
> + size = io_tlb_nslabs << IO_TLB_SHIFT;
> +
> + return size ? size : (IO_TLB_DEFAULT_SIZE);
> +}
> +
> /* Note that this doesn't work with highmem page */
> static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
> volatile void *address)
> @@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
> void __init
> swiotlb_init(int verbose)
> {
> - /* default to 64MB */
> - size_t default_size = 64UL<<20;
> + size_t default_size = IO_TLB_DEFAULT_SIZE;
> unsigned char *vstart;
> unsigned long bytes;
>
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
> is selected automatically. Check
> Documentation/kdump/kdump.txt for further details.
>
> - crashkernel_low=size[KMG]
> - [KNL, x86] parts under 4G.
> -
> crashkernel=range1:size1[,range2:size2,...][@offset]
> [KNL] Same as above, but depends on the memory
> in the running system. The syntax of range is
> @@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
> a memory unit (amount[KMG]). See also
> Documentation/kdump/kdump.txt for an example.
>
> + crashkernel_low=size[KMG]
> + [KNL, x86_64] range under 4G. When crashkernel= is
> + passed, kernel allocate physical memory region
> + above 4G, that cause second kernel crash on system
> + that need swiotlb later. Kernel would try to allocate
> + some region below 4G automatically. This one let
> + user to specify own low range under 4G for second
> + kernel instead.
> + 0: to disable low allocation on systems that do not
> + need swiotlb, that will save 72M low ram in first
> + kernel.
> +
> cs89x0_dma= [HW,NET]
> Format: <dma>
>

2013-03-19 15:06:02

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v5] x86, kdump: Set crashkernel_low automatically

Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.

Reported-by: WANG Chao <[email protected]>
Tested-by: WANG Chao <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
Documentation/kernel-parameters.txt | 15 ++++++++++++---
arch/x86/kernel/setup.c | 20 +++++++++++++++++---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 19 +++++++++++++++----
4 files changed, 45 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /*
+ * two parts from lib/swiotlb.c:
+ * swiotlb size: user specified with swiotlb= or default.
+ * swiotlb overflow buffer: now is hardcoded to 32k,
+ * round to 8M to cover more others.
+ */
+ low_size = swiotlb_size_or_default() + (8UL<<20);
+ auto_set = true;
+ } else {
+ /* passed with crashkernel_low=0 ? */
+ if (!low_size)
+ return;
+ }

low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);

if (!low_base) {
- pr_info("crashkernel low reservation failed - No suitable area found.\n");
+ if (!auto_set)
+ pr_info("crashkernel low reservation failed - No suitable area found.\n");

return;
}
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -25,6 +25,7 @@ extern int swiotlb_force;
extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
+unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);

/*
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
if (!strcmp(str, "force"))
swiotlb_force = 1;

- return 1;
+ return 0;
}
-__setup("swiotlb=", setup_io_tlb_npages);
+early_param("swiotlb", setup_io_tlb_npages);
/* make io_tlb_overflow tunable too? */

unsigned long swiotlb_nr_tbl(void)
@@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
return io_tlb_nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
+
+/* default to 64MB */
+#define IO_TLB_DEFAULT_SIZE (64UL<<20)
+unsigned long swiotlb_size_or_default(void)
+{
+ unsigned long size;
+
+ size = io_tlb_nslabs << IO_TLB_SHIFT;
+
+ return size ? size : (IO_TLB_DEFAULT_SIZE);
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
void __init
swiotlb_init(int verbose)
{
- /* default to 64MB */
- size_t default_size = 64UL<<20;
+ size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.

- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_low=size[KMG]
+ [KNL, x86_64] range under 4G. When crashkernel= is
+ passed, kernel allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that need swiotlb later. Kernel would try to allocate
+ some region below 4G automatically. This one let
+ user to specify own low range under 4G for second
+ kernel instead.
+ 0: to disable low allocation on systems that do not
+ need swiotlb, that will save 72M low ram in first
+ kernel.
+
cs89x0_dma= [HW,NET]
Format: <dma>

2013-03-20 13:09:08

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v5] x86, kdump: Set crashkernel_low automatically

On Tue, Mar 19, 2013 at 08:05:26AM -0700, Yinghai Lu wrote:
> Chao said that kdump does does work well on his system on 3.8
> without extra parameter, even iommu does not work with kdump.
> And now have to append crashkernel_low=Y in first kernel to make
> kdump work.
>
> We have now modified crashkernel=X to allocate memory beyong 4G (if
> available) and do not allocate low range for crashkernel if the user
> does not specify that with crashkernel_low=Y. This causes regression
> if iommu is not enabled. Without iommu, swiotlb needs to be setup in
> first 4G and there is no low memory available to second kernel.
>
> Set crashkernel_low automatically if the user does not specify that.
>
> For system that does support IOMMU with kdump properly, user could
> specify crashkernel_low=0 to save that 72M low ram.

Hi Yinghai,

Have a general question about crashkernel_low. Why does it need to
show up as "Crash kernel low" in /proc/iomem. Will it not be better
that all memory reserved for crashkernel (whether high or low), shows
as "Crash Kernel" and let kexec-tools decide whether to load kernel
high or low etc.

IOW, there should not be any need to differentiate between "Crash kernel"
and "Crash kernel low". There are address ranges associated and looking
at addresses it is obivious that certain memory is below 4G.

Thanks
Vivek

>
> -v3: add swiotlb_size() according to Konrad.
> -v4: add comments what 8M is for according to hpa.
> also update more crashkernel_low= in kernel-parameters.txt
> -v5: update changelog according to Vivek.
>
> Reported-by: WANG Chao <[email protected]>
> Tested-by: WANG Chao <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>
>
> ---
> Documentation/kernel-parameters.txt | 15 ++++++++++++---
> arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> include/linux/swiotlb.h | 1 +
> lib/swiotlb.c | 19 +++++++++++++++----
> 4 files changed, 45 insertions(+), 10 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
> unsigned long long low_base = 0, low_size = 0;
> unsigned long total_low_mem;
> unsigned long long base;
> + bool auto_set = false;
> int ret;
>
> total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
> ret = parse_crashkernel_low(boot_command_line, total_low_mem,
> &low_size, &base);
> - if (ret != 0 || low_size <= 0)
> - return;
> + if (ret != 0) {
> + /*
> + * two parts from lib/swiotlb.c:
> + * swiotlb size: user specified with swiotlb= or default.
> + * swiotlb overflow buffer: now is hardcoded to 32k,
> + * round to 8M to cover more others.
> + */
> + low_size = swiotlb_size_or_default() + (8UL<<20);
> + auto_set = true;
> + } else {
> + /* passed with crashkernel_low=0 ? */
> + if (!low_size)
> + return;
> + }
>
> low_base = memblock_find_in_range(low_size, (1ULL<<32),
> low_size, alignment);
>
> if (!low_base) {
> - pr_info("crashkernel low reservation failed - No suitable area found.\n");
> + if (!auto_set)
> + pr_info("crashkernel low reservation failed - No suitable area found.\n");
>
> return;
> }
> Index: linux-2.6/include/linux/swiotlb.h
> ===================================================================
> --- linux-2.6.orig/include/linux/swiotlb.h
> +++ linux-2.6/include/linux/swiotlb.h
> @@ -25,6 +25,7 @@ extern int swiotlb_force;
> extern void swiotlb_init(int verbose);
> int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> extern unsigned long swiotlb_nr_tbl(void);
> +unsigned long swiotlb_size_or_default(void);
> extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>
> /*
> Index: linux-2.6/lib/swiotlb.c
> ===================================================================
> --- linux-2.6.orig/lib/swiotlb.c
> +++ linux-2.6/lib/swiotlb.c
> @@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
> if (!strcmp(str, "force"))
> swiotlb_force = 1;
>
> - return 1;
> + return 0;
> }
> -__setup("swiotlb=", setup_io_tlb_npages);
> +early_param("swiotlb", setup_io_tlb_npages);
> /* make io_tlb_overflow tunable too? */
>
> unsigned long swiotlb_nr_tbl(void)
> @@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
> return io_tlb_nslabs;
> }
> EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
> +
> +/* default to 64MB */
> +#define IO_TLB_DEFAULT_SIZE (64UL<<20)
> +unsigned long swiotlb_size_or_default(void)
> +{
> + unsigned long size;
> +
> + size = io_tlb_nslabs << IO_TLB_SHIFT;
> +
> + return size ? size : (IO_TLB_DEFAULT_SIZE);
> +}
> +
> /* Note that this doesn't work with highmem page */
> static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
> volatile void *address)
> @@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
> void __init
> swiotlb_init(int verbose)
> {
> - /* default to 64MB */
> - size_t default_size = 64UL<<20;
> + size_t default_size = IO_TLB_DEFAULT_SIZE;
> unsigned char *vstart;
> unsigned long bytes;
>
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
> is selected automatically. Check
> Documentation/kdump/kdump.txt for further details.
>
> - crashkernel_low=size[KMG]
> - [KNL, x86] parts under 4G.
> -
> crashkernel=range1:size1[,range2:size2,...][@offset]
> [KNL] Same as above, but depends on the memory
> in the running system. The syntax of range is
> @@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
> a memory unit (amount[KMG]). See also
> Documentation/kdump/kdump.txt for an example.
>
> + crashkernel_low=size[KMG]
> + [KNL, x86_64] range under 4G. When crashkernel= is
> + passed, kernel allocate physical memory region
> + above 4G, that cause second kernel crash on system
> + that need swiotlb later. Kernel would try to allocate
> + some region below 4G automatically. This one let
> + user to specify own low range under 4G for second
> + kernel instead.
> + 0: to disable low allocation on systems that do not
> + need swiotlb, that will save 72M low ram in first
> + kernel.
> +
> cs89x0_dma= [HW,NET]
> Format: <dma>
>

2013-03-20 15:53:31

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v5] x86, kdump: Set crashkernel_low automatically

On Wed, Mar 20, 2013 at 6:08 AM, Vivek Goyal <[email protected]> wrote:

> Have a general question about crashkernel_low. Why does it need to
> show up as "Crash kernel low" in /proc/iomem. Will it not be better
> that all memory reserved for crashkernel (whether high or low), shows
> as "Crash Kernel" and let kexec-tools decide whether to load kernel
> high or low etc.
>
> IOW, there should not be any need to differentiate between "Crash kernel"
> and "Crash kernel low". There are address ranges associated and looking
> at addresses it is obivious that certain memory is below 4G.

yes. it is doable.
but
1. will need to add more code to expand parse_iomem_single to handle
multiple "Crash kernel" in kexec-tools.
2. also we already have "crashkernel_low=" in command line, so it is
good to keep them consistent in /proc/iomem.

Thanks

Yinghai

2013-03-20 16:03:43

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v5] x86, kdump: Set crashkernel_low automatically

On Wed, Mar 20, 2013 at 08:53:29AM -0700, Yinghai Lu wrote:
> On Wed, Mar 20, 2013 at 6:08 AM, Vivek Goyal <[email protected]> wrote:
>
> > Have a general question about crashkernel_low. Why does it need to
> > show up as "Crash kernel low" in /proc/iomem. Will it not be better
> > that all memory reserved for crashkernel (whether high or low), shows
> > as "Crash Kernel" and let kexec-tools decide whether to load kernel
> > high or low etc.
> >
> > IOW, there should not be any need to differentiate between "Crash kernel"
> > and "Crash kernel low". There are address ranges associated and looking
> > at addresses it is obivious that certain memory is below 4G.
>
> yes. it is doable.
> but
> 1. will need to add more code to expand parse_iomem_single to handle
> multiple "Crash kernel" in kexec-tools.
> 2. also we already have "crashkernel_low=" in command line, so it is
> good to keep them consistent in /proc/iomem.

I think command line and /proc/iomem output are very different.
crashkernel_low is just enforcing that reserve it below 4G and memory
type still remains "Crash Kernel".

So to me, /proc/iomem is showing ranges and memory type and both the
memory types should be "Crash Kernel".

IMHO, we should add code in kexec-tools to deal with it (multiple
entries for memory type "Crash Kernel"), instead of especial casing
"Crash Kernel Low". Who knows down the line we end up reserving more
crash kernel memory which is not contiguous. Keeping all reserved
memory of same type will help then.

Thanks
Vivek

2013-03-20 16:21:33

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH v5] x86, kdump: Set crashkernel_low automatically

On Wed, Mar 20, 2013 at 9:03 AM, Vivek Goyal <[email protected]> wrote:
> On Wed, Mar 20, 2013 at 08:53:29AM -0700, Yinghai Lu wrote:
>> On Wed, Mar 20, 2013 at 6:08 AM, Vivek Goyal <[email protected]> wrote:
>>
>> > Have a general question about crashkernel_low. Why does it need to
>> > show up as "Crash kernel low" in /proc/iomem. Will it not be better
>> > that all memory reserved for crashkernel (whether high or low), shows
>> > as "Crash Kernel" and let kexec-tools decide whether to load kernel
>> > high or low etc.
>> >
>> > IOW, there should not be any need to differentiate between "Crash kernel"
>> > and "Crash kernel low". There are address ranges associated and looking
>> > at addresses it is obivious that certain memory is below 4G.
>>
>> yes. it is doable.
>> but
>> 1. will need to add more code to expand parse_iomem_single to handle
>> multiple "Crash kernel" in kexec-tools.
>> 2. also we already have "crashkernel_low=" in command line, so it is
>> good to keep them consistent in /proc/iomem.
>
> I think command line and /proc/iomem output are very different.
> crashkernel_low is just enforcing that reserve it below 4G and memory
> type still remains "Crash Kernel".
>
> So to me, /proc/iomem is showing ranges and memory type and both the
> memory types should be "Crash Kernel".
>
> IMHO, we should add code in kexec-tools to deal with it (multiple
> entries for memory type "Crash Kernel"), instead of especial casing
> "Crash Kernel Low". Who knows down the line we end up reserving more
> crash kernel memory which is not contiguous. Keeping all reserved
> memory of same type will help then.

ok.

Need to fix kexec-tools at first, and the drop Low in kernel.

Before v3.9 and kexec-tools 2.0.4?

Thanks

Yinghai

2013-03-20 16:31:50

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v5] x86, kdump: Set crashkernel_low automatically

On Wed, Mar 20, 2013 at 09:21:31AM -0700, Yinghai Lu wrote:
> On Wed, Mar 20, 2013 at 9:03 AM, Vivek Goyal <[email protected]> wrote:
> > On Wed, Mar 20, 2013 at 08:53:29AM -0700, Yinghai Lu wrote:
> >> On Wed, Mar 20, 2013 at 6:08 AM, Vivek Goyal <[email protected]> wrote:
> >>
> >> > Have a general question about crashkernel_low. Why does it need to
> >> > show up as "Crash kernel low" in /proc/iomem. Will it not be better
> >> > that all memory reserved for crashkernel (whether high or low), shows
> >> > as "Crash Kernel" and let kexec-tools decide whether to load kernel
> >> > high or low etc.
> >> >
> >> > IOW, there should not be any need to differentiate between "Crash kernel"
> >> > and "Crash kernel low". There are address ranges associated and looking
> >> > at addresses it is obivious that certain memory is below 4G.
> >>
> >> yes. it is doable.
> >> but
> >> 1. will need to add more code to expand parse_iomem_single to handle
> >> multiple "Crash kernel" in kexec-tools.
> >> 2. also we already have "crashkernel_low=" in command line, so it is
> >> good to keep them consistent in /proc/iomem.
> >
> > I think command line and /proc/iomem output are very different.
> > crashkernel_low is just enforcing that reserve it below 4G and memory
> > type still remains "Crash Kernel".
> >
> > So to me, /proc/iomem is showing ranges and memory type and both the
> > memory types should be "Crash Kernel".
> >
> > IMHO, we should add code in kexec-tools to deal with it (multiple
> > entries for memory type "Crash Kernel"), instead of especial casing
> > "Crash Kernel Low". Who knows down the line we end up reserving more
> > crash kernel memory which is not contiguous. Keeping all reserved
> > memory of same type will help then.
>
> ok.
>
> Need to fix kexec-tools at first, and the drop Low in kernel.
>
> Before v3.9 and kexec-tools 2.0.4?

I think so. We need to do this in 3.9 otherwise it becomes another
backward compatibility issue.

Thanks
Vivek

2013-03-20 19:22:41

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] kexec: use Crash kernel for Crash kernel low

We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
instead.

So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.

Suggested-by: Vivek Goyal <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
kernel/kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -55,7 +55,7 @@ struct resource crashk_res = {
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
struct resource crashk_low_res = {
- .name = "Crash kernel low",
+ .name = "Crash kernel",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM

2013-03-25 19:43:01

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Wed, Mar 20, 2013 at 12:22:09PM -0700, Yinghai Lu wrote:
> We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
> instead.
>
> So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.
>
> Suggested-by: Vivek Goyal <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>

Hi Yinghai,

This patch along with second version of kexec-tools patch works for me.
I had a small concern.

- Older version of kexec-tools do not work when multiple "Crash Kernel"
entries show up in /proc/iomem. They error out with following.

Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel

I am assuming that crashkernel=X changes will break older kexec-tools
anyway on most of the machines (As it reserves memory as high as possible
by default) and older kexec-tools will not be able to load kernel that
high.

So it is a forgone conclusion that these new kernel changes to
crashkernel=X in 3.9 are incompatible with older kexec-tools and one
needs to upgrade kexec-tools.

Other syntax of crashkernel=X@Y will continue to work though.

Thanks
Vivek

>
> ---
> kernel/kexec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/kexec.c
> ===================================================================
> --- linux-2.6.orig/kernel/kexec.c
> +++ linux-2.6/kernel/kexec.c
> @@ -55,7 +55,7 @@ struct resource crashk_res = {
> .flags = IORESOURCE_BUSY | IORESOURCE_MEM
> };
> struct resource crashk_low_res = {
> - .name = "Crash kernel low",
> + .name = "Crash kernel",
> .start = 0,
> .end = 0,
> .flags = IORESOURCE_BUSY | IORESOURCE_MEM

2013-03-25 21:50:20

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Mar 25, 2013 at 12:42 PM, Vivek Goyal <[email protected]> wrote:
> So it is a forgone conclusion that these new kernel changes to
> crashkernel=X in 3.9 are incompatible with older kexec-tools and one
> needs to upgrade kexec-tools.

I thought that you and hpa all agreed that user need to update kexec-tools with
new kernel v3.9. It that still right?

Thanks

Yinghai

2013-03-26 18:15:18

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Mar 25, 2013 at 02:50:18PM -0700, Yinghai Lu wrote:
> On Mon, Mar 25, 2013 at 12:42 PM, Vivek Goyal <[email protected]> wrote:
> > So it is a forgone conclusion that these new kernel changes to
> > crashkernel=X in 3.9 are incompatible with older kexec-tools and one
> > needs to upgrade kexec-tools.
>
> I thought that you and hpa all agreed that user need to update kexec-tools with
> new kernel v3.9. It that still right?

I can update kexec-tools and I don't have problems with that. I am only
concerned about some xyz user complaining that new kernel stopped working
with old kexec-tools and then possibly face the rant from Linus about
breaking user space. :-)

To me we could maintain backward compatibility by retaining the existing
behavior of crashkernle=X. That is look for specificied memory below
896M first and then go higher.

And hide new semantics behind new kernel parameters or by extending
existing parameter (say crashkernel=X:search_high_first) to specify how
to search for reserved memory.

In both the cases we should probably retain the logic of auto reserving
low memory for software iotlb and let user opt out if there is no need.

So we don't have a strong reason that why we should break existing
kexec-tools. So I would prefer not to break it.

But I think this is hpa's decision.

Thanks
Vivek

2013-04-01 13:34:43

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Mar 26, 2013 at 02:14:18PM -0400, Vivek Goyal wrote:
> On Mon, Mar 25, 2013 at 02:50:18PM -0700, Yinghai Lu wrote:
> > On Mon, Mar 25, 2013 at 12:42 PM, Vivek Goyal <[email protected]> wrote:
> > > So it is a forgone conclusion that these new kernel changes to
> > > crashkernel=X in 3.9 are incompatible with older kexec-tools and one
> > > needs to upgrade kexec-tools.
> >
> > I thought that you and hpa all agreed that user need to update kexec-tools with
> > new kernel v3.9. It that still right?
>
> I can update kexec-tools and I don't have problems with that. I am only
> concerned about some xyz user complaining that new kernel stopped working
> with old kexec-tools and then possibly face the rant from Linus about
> breaking user space. :-)
>
> To me we could maintain backward compatibility by retaining the existing
> behavior of crashkernle=X. That is look for specificied memory below
> 896M first and then go higher.
>
> And hide new semantics behind new kernel parameters or by extending
> existing parameter (say crashkernel=X:search_high_first) to specify how
> to search for reserved memory.
>
> In both the cases we should probably retain the logic of auto reserving
> low memory for software iotlb and let user opt out if there is no need.
>
> So we don't have a strong reason that why we should break existing
> kexec-tools. So I would prefer not to break it.
>
> But I think this is hpa's decision.

hpa,

ping. Any thoughts on this?

Thanks
Vivek

2013-04-01 18:33:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On 04/01/2013 06:34 AM, Vivek Goyal wrote:
> On Tue, Mar 26, 2013 at 02:14:18PM -0400, Vivek Goyal wrote:
>> On Mon, Mar 25, 2013 at 02:50:18PM -0700, Yinghai Lu wrote:
>>> On Mon, Mar 25, 2013 at 12:42 PM, Vivek Goyal <[email protected]> wrote:
>>>> So it is a forgone conclusion that these new kernel changes to
>>>> crashkernel=X in 3.9 are incompatible with older kexec-tools and one
>>>> needs to upgrade kexec-tools.
>>>
>>> I thought that you and hpa all agreed that user need to update kexec-tools with
>>> new kernel v3.9. It that still right?
>>
>> I can update kexec-tools and I don't have problems with that. I am only
>> concerned about some xyz user complaining that new kernel stopped working
>> with old kexec-tools and then possibly face the rant from Linus about
>> breaking user space. :-)
>>
>> To me we could maintain backward compatibility by retaining the existing
>> behavior of crashkernle=X. That is look for specificied memory below
>> 896M first and then go higher.
>>
>> And hide new semantics behind new kernel parameters or by extending
>> existing parameter (say crashkernel=X:search_high_first) to specify how
>> to search for reserved memory.
>>
>> In both the cases we should probably retain the logic of auto reserving
>> low memory for software iotlb and let user opt out if there is no need.
>>
>> So we don't have a strong reason that why we should break existing
>> kexec-tools. So I would prefer not to break it.
>>
>> But I think this is hpa's decision.
>
> hpa,
>
> ping. Any thoughts on this?

Pardon me while I retch. The whole kdump dependency mess is making me
sick to my stomach.

The fundamental problem you have is that the user has to take an action
that doesn't make any sense, because there isn't any sane method to
backflow the requirements from the crashkernel to the command line.

The only way I can think how to deal with that in a sane way that
doesn't require that the user has to understand a whole bunch of things
about their system that no user should ever have to be burdened with is
to have the packaging system feed back information about the crashkernel
and kexec-tools that will be used. That interface probably needs
serious architecting.

I wouldn't object to crashkernel=<size>,<option>,<option>... being the
format for that and it has the plus that it doesn't break less. So
something like "crashkernel=800M,high" (:search_high_first is
inconsistent with other options and completely pointlessly wordy... this
isn't Multics, and the command line space is a limited resource) would
make sense.

However, I really strongly suggest you work on a mechanism to take the
information about the crashkernel and utilities that the kernel can't
know and feed it into the bootloader configuration. This can be as
simple as a set of scripts.

-hpa

2013-04-01 19:26:23

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 01, 2013 at 11:33:13AM -0700, H. Peter Anvin wrote:
> On 04/01/2013 06:34 AM, Vivek Goyal wrote:
> > On Tue, Mar 26, 2013 at 02:14:18PM -0400, Vivek Goyal wrote:
> >> On Mon, Mar 25, 2013 at 02:50:18PM -0700, Yinghai Lu wrote:
> >>> On Mon, Mar 25, 2013 at 12:42 PM, Vivek Goyal <[email protected]> wrote:
> >>>> So it is a forgone conclusion that these new kernel changes to
> >>>> crashkernel=X in 3.9 are incompatible with older kexec-tools and one
> >>>> needs to upgrade kexec-tools.
> >>>
> >>> I thought that you and hpa all agreed that user need to update kexec-tools with
> >>> new kernel v3.9. It that still right?
> >>
> >> I can update kexec-tools and I don't have problems with that. I am only
> >> concerned about some xyz user complaining that new kernel stopped working
> >> with old kexec-tools and then possibly face the rant from Linus about
> >> breaking user space. :-)
> >>
> >> To me we could maintain backward compatibility by retaining the existing
> >> behavior of crashkernle=X. That is look for specificied memory below
> >> 896M first and then go higher.
> >>
> >> And hide new semantics behind new kernel parameters or by extending
> >> existing parameter (say crashkernel=X:search_high_first) to specify how
> >> to search for reserved memory.
> >>
> >> In both the cases we should probably retain the logic of auto reserving
> >> low memory for software iotlb and let user opt out if there is no need.
> >>
> >> So we don't have a strong reason that why we should break existing
> >> kexec-tools. So I would prefer not to break it.
> >>
> >> But I think this is hpa's decision.
> >
> > hpa,
> >
> > ping. Any thoughts on this?
>
> Pardon me while I retch. The whole kdump dependency mess is making me
> sick to my stomach.
>
> The fundamental problem you have is that the user has to take an action
> that doesn't make any sense, because there isn't any sane method to
> backflow the requirements from the crashkernel to the command line.
>
> The only way I can think how to deal with that in a sane way that
> doesn't require that the user has to understand a whole bunch of things
> about their system that no user should ever have to be burdened with is
> to have the packaging system feed back information about the crashkernel
> and kexec-tools that will be used. That interface probably needs
> serious architecting.
>
> I wouldn't object to crashkernel=<size>,<option>,<option>... being the
> format for that and it has the plus that it doesn't break less. So
> something like "crashkernel=800M,high" (:search_high_first is
> inconsistent with other options and completely pointlessly wordy... this
> isn't Multics, and the command line space is a limited resource) would
> make sense.
>
> However, I really strongly suggest you work on a mechanism to take the
> information about the crashkernel and utilities that the kernel can't
> know and feed it into the bootloader configuration. This can be as
> simple as a set of scripts.

Hi Peter,

I agree that this dependency on crashkernel is creating lots of problems
and there should be a better way to manage it.

Sorry, but I did not fully understand your suggestion on how to handle the
problem. IIUC, you are suggesting that kexec-tools has the memory
requirements and when that package installs, it should automatically
update boot loader command line to communicate that requirement to kernel?
Or are you suggesting something else entirely.

Where to reserve is not entirely tied to kexec-tools version. It also
depends on the environment and run time user options.

For example kernel image being loaded and entry point selected. So if one
decides to load take 32bit entry point of bzImage then memory has to be
reserved in first 4G (despite the fact that kexec-tools is able to load 64bit
bzImage and handle higher reserved ranges).

Also whether to reserve memory for software iotlb will depend on whether
we have chosen to enable iommu in second kernel or not.

So sorting out all the memory reservation requirements based on
kexec-tools version and during installation time alone might not work.

To make user's life easier, we can probably modify "kdump" service and
provide another option which can calculate the right crashkernel= command
line option based on user selected option and system environment and
display it to user (or possibly propogate to bootloader command line and
system is rebooted).

All this will only address the issue of where to reserve memory. It will
still not solve the issue of how much memory to reserve. We have no way
to know. It is all heuristics.

crashkernel=<size>,<option>,<option>.. and crashkernel=800M,high sound
good to me.

So atleast for 3.9 kernel, shall we hide new semantics behind
crashkernel=XM,high and by default crashkernel=XM tries to emulate
crashkernel=XM,low to retain backward compatibility?

Thanks
Vivek

2013-04-01 20:48:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On 04/01/2013 12:26 PM, Vivek Goyal wrote:
>
> Hi Peter,
>
> I agree that this dependency on crashkernel is creating lots of problems
> and there should be a better way to manage it.
>
> Sorry, but I did not fully understand your suggestion on how to handle the
> problem. IIUC, you are suggesting that kexec-tools has the memory
> requirements and when that package installs, it should automatically
> update boot loader command line to communicate that requirement to kernel?
> Or are you suggesting something else entirely.

Yes. Or rather, kexec-tools should provice a script or utility to
calculate the proper options and let a distribution-specific script add
it to the bootloader configuration.

> Where to reserve is not entirely tied to kexec-tools version. It also
> depends on the environment and run time user options.

Yes, and the user cannot reasonably be expected to know what that should
be. You're basically telling the user "guess and pray".

> For example kernel image being loaded and entry point selected. So if one
> decides to load take 32bit entry point of bzImage then memory has to be
> reserved in first 4G (despite the fact that kexec-tools is able to load 64bit
> bzImage and handle higher reserved ranges).
>
> Also whether to reserve memory for software iotlb will depend on whether
> we have chosen to enable iommu in second kernel or not.
>
> So sorting out all the memory reservation requirements based on
> kexec-tools version and during installation time alone might not work.
>
> To make user's life easier, we can probably modify "kdump" service and
> provide another option which can calculate the right crashkernel= command
> line option based on user selected option and system environment and
> display it to user (or possibly propogate to bootloader command line and
> system is rebooted).

Yes, I think we need to so something.

> All this will only address the issue of where to reserve memory. It will
> still not solve the issue of how much memory to reserve. We have no way
> to know. It is all heuristics.

At least heuristics in a script is better than telling the user "guess
and pray".

> crashkernel=<size>,<option>,<option>.. and crashkernel=800M,high sound
> good to me.
>
> So atleast for 3.9 kernel, shall we hide new semantics behind
> crashkernel=XM,high and by default crashkernel=XM tries to emulate
> crashkernel=XM,low to retain backward compatibility?

Yes, I suspect so.

-hpa



2013-04-01 21:10:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 1, 2013 at 1:47 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/01/2013 12:26 PM, Vivek Goyal wrote:
>
>> crashkernel=<size>,<option>,<option>.. and crashkernel=800M,high sound
>> good to me.
>>
>> So atleast for 3.9 kernel, shall we hide new semantics behind
>> crashkernel=XM,high and by default crashkernel=XM tries to emulate
>> crashkernel=XM,low to retain backward compatibility?
>
> Yes, I suspect so.

current we have:
1. crashkernel=XM
2. crashkernel=XM crashkernel_low=YM

so you want to change to
1. crashkernel=XM,low or crashkernel=XM
2. crashkernel=XM,high
3. crashkernel=XM,high crashkernel=YM,low

looks like you change your mind, now you are agreeing on
some could low and some could be high.

Yinghai

2013-04-01 22:02:22

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On 04/01/2013 02:10 PM, Yinghai Lu wrote:
> On Mon, Apr 1, 2013 at 1:47 PM, H. Peter Anvin <[email protected]> wrote:
>> On 04/01/2013 12:26 PM, Vivek Goyal wrote:
>>
>>> crashkernel=<size>,<option>,<option>.. and crashkernel=800M,high sound
>>> good to me.
>>>
>>> So atleast for 3.9 kernel, shall we hide new semantics behind
>>> crashkernel=XM,high and by default crashkernel=XM tries to emulate
>>> crashkernel=XM,low to retain backward compatibility?
>>
>> Yes, I suspect so.
>
> current we have:
> 1. crashkernel=XM
> 2. crashkernel=XM crashkernel_low=YM
>
> so you want to change to
> 1. crashkernel=XM,low or crashkernel=XM
> 2. crashkernel=XM,high
> 3. crashkernel=XM,high crashkernel=YM,low
>
> looks like you change your mind, now you are agreeing on
> some could low and some could be high.
>

It sounds that the "never DMA'd to memory" notion requires that we have
some low memory for the iommu, no?

Or am I misunderstanding what you are asking here?

-hpa

2013-04-01 22:17:09

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 1, 2013 at 3:02 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/01/2013 02:10 PM, Yinghai Lu wrote:
>> On Mon, Apr 1, 2013 at 1:47 PM, H. Peter Anvin <[email protected]> wrote:

> It sounds that the "never DMA'd to memory" notion requires that we have
> some low memory for the iommu, no?
>
> Or am I misunderstanding what you are asking here?

No. just find where to put second kernel.

When vivek raise the problem at beginning, he suggested
1. try 896M under at first, if it fails, will allocate under 4G, and
if it still fail
try above 4G.
2. or keep crashkernel=XM, only to low, and add crashkernel_high if someone
really want it to be high.

and then you said, we should keep consistent to keep all above 4G.

Now Vivek says we should use crashkernel=XM,low and crashkernel=XM,high
, also crashkernel=XM is treated as low.

And his last suggestion is just as his old second suggestion.

I just check the code again, it looks it is easy to change it to support:
1. crashkernel=XM
2. crashkernel_high=XM
3. crashkernel_high=XM crashkernel_low=YM

Yinghai

2013-04-01 22:21:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On 04/01/2013 03:17 PM, Yinghai Lu wrote:
>
> And his last suggestion is just as his old second suggestion.
>
> I just check the code again, it looks it is easy to change it to support:
> 1. crashkernel=XM
> 2. crashkernel_high=XM
> 3. crashkernel_high=XM crashkernel_low=YM
>

Yes... my objections that you are giving the user the headache of
dealing with this very much remains, but I don't think we have any good
options. However, the <size>,<options>.... syntax is at least
extensible, which the above syntax is not.

-hpa

2013-04-01 22:40:53

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 1, 2013 at 3:20 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/01/2013 03:17 PM, Yinghai Lu wrote:
>>
>> And his last suggestion is just as his old second suggestion.
>>
>> I just check the code again, it looks it is easy to change it to support:
>> 1. crashkernel=XM
>> 2. crashkernel_high=XM
>> 3. crashkernel_high=XM crashkernel_low=YM
>>
>
> Yes... my objections that you are giving the user the headache of
> dealing with this very much remains, but I don't think we have any good
> options. However, the <size>,<options>.... syntax is at least
> extensible, which the above syntax is not.

ok, will check crashkernel=XM,high

2013-04-02 01:11:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 1, 2013 at 3:40 PM, Yinghai Lu <[email protected]> wrote:
> On Mon, Apr 1, 2013 at 3:20 PM, H. Peter Anvin <[email protected]> wrote:
>> On 04/01/2013 03:17 PM, Yinghai Lu wrote:
>>>
>>> And his last suggestion is just as his old second suggestion.
>>>
>>> I just check the code again, it looks it is easy to change it to support:
>>> 1. crashkernel=XM
>>> 2. crashkernel_high=XM
>>> 3. crashkernel_high=XM crashkernel_low=YM
>>>
>>
>> Yes... my objections that you are giving the user the headache of
>> dealing with this very much remains, but I don't think we have any good
>> options. However, the <size>,<options>.... syntax is at least
>> extensible, which the above syntax is not.
>
> ok, will check crashkernel=XM,high

Please check attached four patches that should get into upstream for 3.9.
I sent first and second before.
other two is addressing old kexec-tools with kdump on new kernel.

Thanks

Yinghai


Attachments:
fix_crashkernel_low_v3.patch (5.60 kB)
crashkernel_low_remove.patch (823.00 B)
crashkernel_high_low_1.patch (5.01 kB)
crashkernel_high_low_2.patch (4.41 kB)
Download all attachments

2013-04-02 13:30:54

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 01, 2013 at 01:47:58PM -0700, H. Peter Anvin wrote:

[..]
> > All this will only address the issue of where to reserve memory. It will
> > still not solve the issue of how much memory to reserve. We have no way
> > to know. It is all heuristics.
>
> At least heuristics in a script is better than telling the user "guess
> and pray".

Current heuristics are outside the kernel. That is we run tests on bunch
of machines and try to figure out what's a reasonable default amount of
reserved memory (with our kernel and with our initramfs and default
settings).

So we carry a patch in kernel which supports crashkernel=auto option.
Installer puts this option on boot loader command line during installation.

And in kernel we have hardcoded the per arch memory reservation requirements
and we update these limits if something significant changes either in
kernel or in user space in terms of memory consumption.

Initially we had tried to guess second kernel's memory usage based on
first kernel's memory usage, but that did not work. There were many
factors.

- First kernel brought up all the cpus and that inreases the memory
consumption. While we bring up only one cpu (with nr_cpus=1) in
kdump kernel.

- We disable memory cgroup controller while first kernel has it enabled
and that can change memory requirement significantly on large machines.

- We don't bring up all the devices in second kernel while we do in
first kernel. So module memory usage can be significantly higher in
first kernel.

So it has been very tricky to come up with some kind of guidelines in
an automated manner.

But I am more than willing to look into it if there are more ideas on
how one can go about figuring out how much memory to reserve and where
to reserve.

Thanks
Vivek

2013-04-02 13:50:34

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Mon, Apr 01, 2013 at 06:11:38PM -0700, Yinghai Lu wrote:
> On Mon, Apr 1, 2013 at 3:40 PM, Yinghai Lu <[email protected]> wrote:
> > On Mon, Apr 1, 2013 at 3:20 PM, H. Peter Anvin <[email protected]> wrote:
> >> On 04/01/2013 03:17 PM, Yinghai Lu wrote:
> >>>
> >>> And his last suggestion is just as his old second suggestion.
> >>>
> >>> I just check the code again, it looks it is easy to change it to support:
> >>> 1. crashkernel=XM
> >>> 2. crashkernel_high=XM
> >>> 3. crashkernel_high=XM crashkernel_low=YM
> >>>
> >>
> >> Yes... my objections that you are giving the user the headache of
> >> dealing with this very much remains, but I don't think we have any good
> >> options. However, the <size>,<options>.... syntax is at least
> >> extensible, which the above syntax is not.
> >
> > ok, will check crashkernel=XM,high
>
> Please check attached four patches that should get into upstream for 3.9.
> I sent first and second before.
> other two is addressing old kexec-tools with kdump on new kernel.

Hi Yinghai,

I think there is still little confusion. What does crashkernel=X,high
mean. Currently it seems to mean that memory is allocated from region
above 4G and if it is not available, allocation fails.

I thought what would be more useful if it means that we start search
for memory from higher range of addresses and continue down till we
find a suitable memory area.

That means memory could either come from higher memory regions (above
4G) or from low memory regions (below 4G) depending on how much physical
RAM system has.

Similary crashkernel=X or crashkernel=X,low will mean that we start
scanning for free memory from low memory area first. And if sufficient
amount of memory is not available below 4G, memory could very well
come from above 4G.

That way a distribution could decide its default memory requirement
(say 128M) and they could simply say, crashkernel=128M or
crashkernel=128M,high (depending on whether they support 64bit bzImage
or not).

To achieve the behavior where we want to enforce that memory either
comes from low or high area only otherwise allocation fails, we could
probably use.

crashkernel=X,high_only
crashkernel=X,low_only

And crashkernel_low could be replaced with crashkernel=X,low_only

I think it is reasonable to continue to reserve low memory automatically
for swiotlb if crash memory reservation happens above 4G. Users should
be able to opt out of it using crashkernel=0M,low_only.

Thanks
Vivek

2013-04-02 14:17:53

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 02, 2013 at 09:50:01AM -0400, Vivek Goyal wrote:

[..]
> To achieve the behavior where we want to enforce that memory either
> comes from low or high area only otherwise allocation fails, we could
> probably use.
>
> crashkernel=X,high_only
> crashkernel=X,low_only

Thinking more about it. We have following existing syntax.

crashkernel=range1:size1[,range2:size2,...][@offset]

Which uses ',' as delimiter for range:size pairs.

May be we can use a different delimiter, say ';'.

crashkernel=<amount_of_memory_to_reserve>;<option1>;<option2>

All the existing crashkernel= options should fall into the category of
<amount_of_memory_to_reserve>.

option1 could specify whether to search for memory from higher addresses
or from lower addresses. (valid values are high or low)

option2 could specify the range of memory search should be performed in.
syntax to specify range could be XM:[YM]. So one could possibly specify.

4G-8G
<4G
>4G:<8G
>8G

Now crashkernel_low=0 could be emulated by

crashkernel=0M;;<4G

If we want to reserve 128MB of memory between 4G and 8G and starting scan
from high, we could say.

crashkernel=128M;high;>4G:<8G

If this is deemed too generic and not worth it. Then we could simplify
option 2 to take values "high_only" and "low_only" and that leaves the
scope of implementing proper ranges down the line if need be.

So crashkernel_low=0 will be emulated by.

crashkernel=0M;;low_only

Thanks
Vivek

2013-04-02 14:45:40

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 2, 2013 at 6:50 AM, Vivek Goyal <[email protected]> wrote:
> On Mon, Apr 01, 2013 at 06:11:38PM -0700, Yinghai Lu wrote:
>> On Mon, Apr 1, 2013 at 3:40 PM, Yinghai Lu <[email protected]> wrote:
>> > On Mon, Apr 1, 2013 at 3:20 PM, H. Peter Anvin <[email protected]> wrote:
>> >> On 04/01/2013 03:17 PM, Yinghai Lu wrote:
>> >>>
>> >>> And his last suggestion is just as his old second suggestion.
>> >>>
>> >>> I just check the code again, it looks it is easy to change it to support:
>> >>> 1. crashkernel=XM
>> >>> 2. crashkernel_high=XM
>> >>> 3. crashkernel_high=XM crashkernel_low=YM
>> >>>
>> >>
>> >> Yes... my objections that you are giving the user the headache of
>> >> dealing with this very much remains, but I don't think we have any good
>> >> options. However, the <size>,<options>.... syntax is at least
>> >> extensible, which the above syntax is not.
>> >
>> > ok, will check crashkernel=XM,high
>>
>> Please check attached four patches that should get into upstream for 3.9.
>> I sent first and second before.
>> other two is addressing old kexec-tools with kdump on new kernel.
>
> Hi Yinghai,
>
> I think there is still little confusion. What does crashkernel=X,high
> mean. Currently it seems to mean that memory is allocated from region
> above 4G and if it is not available, allocation fails.
>
> I thought what would be more useful if it means that we start search
> for memory from higher range of addresses and continue down till we
> find a suitable memory area.
>
> That means memory could either come from higher memory regions (above
> 4G) or from low memory regions (below 4G) depending on how much physical
> RAM system has.
>
> Similary crashkernel=X or crashkernel=X,low will mean that we start
> scanning for free memory from low memory area first. And if sufficient
> amount of memory is not available below 4G, memory could very well
> come from above 4G.
>
> That way a distribution could decide its default memory requirement
> (say 128M) and they could simply say, crashkernel=128M or
> crashkernel=128M,high (depending on whether they support 64bit bzImage
> or not).
>
> To achieve the behavior where we want to enforce that memory either
> comes from low or high area only otherwise allocation fails, we could
> probably use.
>
> crashkernel=X,high_only
> crashkernel=X,low_only
>
> And crashkernel_low could be replaced with crashkernel=X,low_only
>
> I think it is reasonable to continue to reserve low memory automatically
> for swiotlb if crash memory reservation happens above 4G. Users should
> be able to opt out of it using crashkernel=0M,low_only.

No, that make the logic too complicated.

After those four patches:
if the user still use old kexec-tools, they are still with
crashkernel=X, nothing is changed.
if the user want to use crashkernel=X,high, they should update kexec-tools.
when high is used, memblock will search from top to low.
if the allocated one is above 4G, kernel will try to auto allocate
72M under 4G for swiotlb
user could crashkernel=Y,low to change 72M to other value.

The whole point is:
1. keep the transition from old kexec-tools to new one.
2. keep thing unified when new kexec-tools is used: always high.

Thanks

Yinghai

2013-04-02 14:50:45

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 2, 2013 at 7:17 AM, Vivek Goyal <[email protected]> wrote:
> On Tue, Apr 02, 2013 at 09:50:01AM -0400, Vivek Goyal wrote:
>
> [..]
>> To achieve the behavior where we want to enforce that memory either
>> comes from low or high area only otherwise allocation fails, we could
>> probably use.
>>
>> crashkernel=X,high_only
>> crashkernel=X,low_only
>
> Thinking more about it. We have following existing syntax.
>
> crashkernel=range1:size1[,range2:size2,...][@offset]
>
> Which uses ',' as delimiter for range:size pairs.
>
> May be we can use a different delimiter, say ';'.
>
> crashkernel=<amount_of_memory_to_reserve>;<option1>;<option2>
>
> All the existing crashkernel= options should fall into the category of
> <amount_of_memory_to_reserve>.
>
> option1 could specify whether to search for memory from higher addresses
> or from lower addresses. (valid values are high or low)
>
> option2 could specify the range of memory search should be performed in.
> syntax to specify range could be XM:[YM]. So one could possibly specify.
>
> 4G-8G
> <4G
>>4G:<8G
>>8G
>
> Now crashkernel_low=0 could be emulated by
>
> crashkernel=0M;;<4G
>
> If we want to reserve 128MB of memory between 4G and 8G and starting scan
> from high, we could say.
>
> crashkernel=128M;high;>4G:<8G
>
> If this is deemed too generic and not worth it. Then we could simplify
> option 2 to take values "high_only" and "low_only" and that leaves the
> scope of implementing proper ranges down the line if need be.
>
> So crashkernel_low=0 will be emulated by.
>
> crashkernel=0M;;low_only

Oh, no. the grammar for crashkernel= looks crazy now.

You should not burden user like this.

Thanks

Yinghai

2013-04-02 14:58:36

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 02, 2013 at 07:45:36AM -0700, Yinghai Lu wrote:

[..]
> No, that make the logic too complicated.
>
> After those four patches:
> if the user still use old kexec-tools, they are still with
> crashkernel=X, nothing is changed.
> if the user want to use crashkernel=X,high, they should update kexec-tools.
> when high is used, memblock will search from top to low.
> if the allocated one is above 4G, kernel will try to auto allocate
> 72M under 4G for swiotlb
> user could crashkernel=Y,low to change 72M to other value.

Hm...,

- Ok so atleast use a different delimiter. Otherwise one could specify
rage1:size1,range2:size2,high which is confusing.

- I think one can look at above as follows.

crashkernel=<memory_to_reserve>;[<option1>];....

where option1 specifies range of memory where to look for specified amount
of memory. Current valid values are high/low. High means look for memory
above 4G or fail. Low means look for memory below 4G or fail. We can
always extend range syntax later if need be. Similarly we can always
add option2 to emulate where to begin search (high/low). So this still
falls into the generic syntax category and extendable, if need be.

I will test your patches again.

Thanks
Vivek

2013-04-02 15:40:18

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 02, 2013 at 07:45:36AM -0700, Yinghai Lu wrote:

[..]
> 2. keep thing unified when new kexec-tools is used: always high.

I think this is wrong. What if system does not have more than 4G of
memory. crashkernel=x,high will fail. So just because we have new version
of kexec-tools, it does not mean that one can always use crashkernel=x,high.

I think we should do two things.

- Extend crashkernel=X to search for memory below 896M, 4G, MAXMEM in
that order. To me, this will work best for most of the users.

- For users who really care to not allocate memory in low memory ranges,
they can use crashkernel=X,high. But even there syntax should be that
look for memory in higher ranges first otherwise allocate from low
regions.

Otherwise it makes life very difficult for a user when there is no
memory available in higher addresses but it is available in low addresses.
How does one automate that.

In current form, I double crashkernel=X,high is going to be very useful.

Thanks
Vivek

2013-04-02 15:44:20

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 2, 2013 at 7:58 AM, Vivek Goyal <[email protected]> wrote:
> On Tue, Apr 02, 2013 at 07:45:36AM -0700, Yinghai Lu wrote:
>
> - Ok so atleast use a different delimiter. Otherwise one could specify
> rage1:size1,range2:size2,high which is confusing.
>
> - I think one can look at above as follows.
>
> crashkernel=<memory_to_reserve>;[<option1>];....

ok, please check attached one that replace last one of four patches.

Thanks

Yinghai


Attachments:
crashkernel_high_low_2_v2.patch (4.46 kB)

2013-04-02 15:46:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 2, 2013 at 8:39 AM, Vivek Goyal <[email protected]> wrote:
> On Tue, Apr 02, 2013 at 07:45:36AM -0700, Yinghai Lu wrote:
>
> [..]
>> 2. keep thing unified when new kexec-tools is used: always high.
>
> I think this is wrong. What if system does not have more than 4G of
> memory. crashkernel=x,high will fail. So just because we have new version
> of kexec-tools, it does not mean that one can always use crashkernel=x,high.

no, it will not fail.

If the system have less 4G, and memblock will still find the ram for us.

Thanks

Yinghai

2013-04-02 15:51:15

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 02, 2013 at 08:46:16AM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 8:39 AM, Vivek Goyal <[email protected]> wrote:
> > On Tue, Apr 02, 2013 at 07:45:36AM -0700, Yinghai Lu wrote:
> >
> > [..]
> >> 2. keep thing unified when new kexec-tools is used: always high.
> >
> > I think this is wrong. What if system does not have more than 4G of
> > memory. crashkernel=x,high will fail. So just because we have new version
> > of kexec-tools, it does not mean that one can always use crashkernel=x,high.
>
> no, it will not fail.
>
> If the system have less 4G, and memblock will still find the ram for us.

Ok, then description in kernel-parameters.txt is wrong.

+ crashkernel_high=size[KMG]
+ [KNL, x86_64] range above 4G. kernel allocate
physical
+ memory region above 4G.


Can you please inline your patches. Otherwise how one is supposed to give
review comments?

Thanks
Vivek

2013-04-02 17:21:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] kexec: use Crash kernel for Crash kernel low

On Tue, Apr 2, 2013 at 8:50 AM, Vivek Goyal <[email protected]> wrote:

> Can you please inline your patches. Otherwise how one is supposed to give
> review comments?

Just sent the whole updated four patches. Please check them.

Thanks

Yinghai