2012-10-30 14:08:59

by Mitsuhiro Tanino

[permalink] [raw]
Subject: [Patch 0/2] Exclude hwpoison page from vmcore dump

Hi All,
Please find a set of patches that introduce a new "-p" option into
"makedumpfile" to exclude hwpoison page from vmcore dump.

Details as described below.

Problem
-------
Recently, according to increase large memory systems, possibility of
failures which come from memory crash are also increasing.
Regarding this, Linux has a hwpoison feature and this can isolate
uncorrectable error in memory which are reported as SRAO machine check.

However, when a user gets a core dump file using kdump, dump kernel
does not know which memory has uncorrectable error(SRAO) and
dump kernel touches memory which has uncorrectable error.
As a result, a fatal machine check occurs and a user fails to get vmcore.

This problem was previously discussed in the kexec community, with a
proposal to Slimdump framework (refer: mail threads pertaining to
http://lists.infradead.org/pipermail/kexec/2011-October/005586.html).

Solution
--------
As Vivek mentioned in the above threads, "makedumpfile" has a
filtering function and this can exclude some types of pages,
like zero page, free page, user data, etc, without saving the whole dump.
This function checks "pageflags" of struct page arrays and if
target page has a flag which is specified the "makedumpfile" option,
the page is excluded.
Using this function, "makedumpfile" can exclude poisoned pages which
has PG_hwpoison flag.

These patches introduce a new "-p" option into "makedumpfile" to
exclude hwpoison page from vmcore.


Test Results
------------
These patches are tested on 3.6.0-rc6 kernel and makedumpfile-1.5.0
using software pseudo MCE injection from KVM host to guest.


**** Host OS Screen logs(SRAO Machine Check injection)
Inject software pseudo MCE into guest qemu process.

(1) Load mce-inject module
# modprobe mce-inject

(2) Find a PID of target qemu-kvm and page struct
# ps -C qemu-kvm -o pid=
3612
9392

(3) Edit software pseudo MCE data
Choose a offset of page struct and insert the offset to ADDR line in mce-file.

# ./page-types -p 3612 -LN -b anon | head
voffset offset flags
8cb 86b98d ___U_lA____Ma_b___________________
8cc 86b8ef ___U_lA____Ma_b___________________
8cd 86ca04 ___U_lA____Ma_b___________________
8cf 86bb11 ___U_lA____Ma_b___________________
8d0 86bac7 ___U_lA____Ma_b___________________
8d2 86b0c4 ___U_lA____Ma_b___________________
8d4 86ab8d ___U_lA____Ma_b___________________
8d7 86c5e1 ___U_lA____Ma_b___________________
8d8 86c5e3 ___U_lA____Ma_b___________________

# vi mce-file
CPU 0 BANK 2
STATUS UNCORRECTED SRAO 0x17a
MCGSTATUS MCIP RIPV
MISC 0x8c
ADDR 0x86b98d000
EOF

(4) Inject MCE
# mce-inject mce-file

Try step (3) to (4) a couple of times

**** Guest OS Screen logs(kdump)
Guest catches MCE injection from qemu.
Then, run "echo c > /proc/sysrq-trigger" in order to execute makedumpfile.
-------------
[root@fedora17x64 ~]# uname -a
Linux fedora17x64 3.6.0+ #3 SMP Sat Sep 29 14:42:23 JST 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@fedora17x64 ~]# [ 245.348147] Disabling lock debugging due to kernel taint
[ 245.348147] mce: [Hardware Error]: Machine check events logged
[ 245.850863] MCE 0xbb706: non LRU page recovery: Ignored
[ 246.348113] mce: [Hardware Error]: Machine check events logged
[ 246.848190] MCE 0xbb709: non LRU page recovery: Ignored
[ 249.847472] MCE 0xbb70a: non LRU page recovery: Ignored
[ 250.336716] MCE 0xbb70b: non LRU page recovery: Ignored
[ 252.847280] MCE 0xb8ff8: clean LRU page recovery: Recovered
[ 253.847251] MCE 0xb8ff9: clean LRU page recovery: Recovered
[ 256.051190] MCE 0xb68e8: clean LRU page recovery: Recovered
[ 257.000764] MCE 0xb68e9: clean LRU page recovery: Recovered

[root@fedora17x64 ~]# [ 276.980192] MCE 0xb66e8: LRU page recovery: Recovered
[ 277.847269] MCE 0xb66e9: corrupted page was clean: dropped without side effects
[ 277.848360] MCE 0xb66e9: clean LRU page recovery: Recovered

[root@fedora17x64 ~]# echo c > /proc/sysrq-trigger
[ 299.612689] SysRq : Trigger a crash
[ 299.613339] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 299.613339] IP: [<ffffffff81373606>] sysrq_handle_crash+0x16/0x20
[ 299.613339] PGD ba732067 PUD babc2067 PMD 0
[ 299.613339] Oops: 0002 [#1] SMP
..............
................
.................
[ 299.613339] Call Trace:
[ 299.613339] [<ffffffff81373d27>] __handle_sysrq+0x127/0x190
[ 299.613339] [<ffffffff81373dda>] write_sysrq_trigger+0x4a/0x50
[ 299.613339] [<ffffffff811dd6d8>] proc_reg_write+0x78/0xb0
[ 299.613339] [<ffffffff8117b83c>] vfs_write+0xac/0x180
[ 299.613339] [<ffffffff8117bb6a>] sys_write+0x4a/0x90
[ 299.613339] [<ffffffff815fa329>] system_call_fastpath+0x16/0x1b
[ 299.613339] Code: 65 2c 75 cd 4c 89 ef e8 89 f7 ff ff eb c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 01 a5 ab 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 55 48 89 e5 53 48 83 ec 08 0f 1f
[ 299.613339] RIP [<ffffffff81373606>] sysrq_handle_crash+0x16/0x20
[ 299.613339] RSP <ffff880037a77e38>
[ 299.613339] CR2: 0000000000000000
..............
................
.................
++ KDUMP_PATH=/var/crash
++ CORE_COLLECTOR='makedumpfile -d 31 -c'
++ DEFAULT_ACTION=dump_rootfs
+++ date +%d.%m.%y-%T
++ DATEDIR=29.10.12-15:32:02
++ DUMP_INSTRUCTION=
++ read_kdump_conf
++ local conf_file=/etc/kdump.conf
++ '[' -f /etc/kdump.conf ']'
++ read config_opt config_val
++ case "$config_opt" in
++ read config_opt config_val
++ case "$config_opt" in
++ read config_opt config_val
++ case "$config_opt" in
++ CORE_COLLECTOR='makedumpfile -c -d 30 -p -D --message-level 31'
++ read config_opt config_val
++ '[' -n '' ']'
++ dump_rootfs
++ mount -o remount,rw /sysroot/
[ 1.796062] EXT4-fs (dm-1): re-mounted. Opts: (null)
++ mkdir -p /sysroot//var/crash/29.10.12-15:32:02
++ makedumpfile -c -d 30 -p -D --message-level 31 /proc/vmcore /sysroot//var/crash/29.10.12-15:32:02/vmcore
sadump: does not have partition header
sadump: read dump device as unknown format
sadump: unknown format
..............
................
.................
Excluding free pages : [100 %] STEP [Excluding free pages ] : 0.085096 seconds
Excluding unnecessary pages : [100 %] STEP [Excluding unnecessary pages] : 0.561497 seconds
Excluding free pages : [100 %] STEP [Excluding free pages ] : 0.081891 seconds
Excluding unnecessary pages : [100 %] STEP [Excluding unnecessary pages] : 0.531003 seconds
Copying data : [100 %] STEP [Copying data ] : 5.206374 seconds

Writing erase info...
offset_eraseinfo: 16225d7, size_eraseinfo: 0

Original pages : 0x00000000000b133c
Excluded pages : 0x00000000000a76bf
Pages filled with zero : 0x0000000000000000
Cache pages : 0x0000000000006df7
Cache pages + private : 0x0000000000003451
User process data pages : 0x0000000000002e03
Free pages : 0x000000000009a660
Hwpoison pages : 0x0000000000000014
Remaining pages : 0x0000000000009c7d
(The number of pages is reduced to 5%.)
Memory Hole : 0x000000000000ecc1
--------------------------------------------------
Total pages : 0x00000000000bfffd


The dumpfile is saved to /sysroot//var/crash/29.10.12-15:32:02/vmcore.

makedumpfile Completed.
++ sync
++ reboot -f
Rebooting.
[ 8.176645] Restarting system.
[ 8.177463] reboot: machine restart


2012-10-30 14:37:59

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Patch 0/2] Exclude hwpoison page from vmcore dump

On Tue, Oct 30, 2012 at 11:06:43PM +0900, Mitsuhiro Tanino wrote:

[..]
> These patches introduce a new "-p" option into "makedumpfile" to
> exclude hwpoison page from vmcore.

Why to introduce this option "-p"? Until and unless there are serious
side effects, this should be default functionality. Isn't it? Who would
like to touch/save poisoned pages and run into MCE?

Thanks
Vivek

2012-10-31 13:52:44

by Mitsuhiro Tanino

[permalink] [raw]
Subject: Re: [Patch 0/2] Exclude hwpoison page from vmcore dump

Hi Vivek,

(2012/10/30 23:37), Vivek Goyal wrote:
> Why to introduce this option "-p"? Until and unless there are serious
> side effects, this should be default functionality. Isn't it? Who would
> like to touch/save poisoned pages and run into MCE?

Thank you for your review of my patch.

In my understanding, hwpoison is a function which is only supported at
high end servers and most of users do not use this function.
Therefore, I think this functionality was better for option.

On the other hand, as you say, nobody wants to touch/save poisoned pages
and run into MCE, and it is desirable for users to exclude hwpoison pages
automatically. I agree with you.

I will post fixed patch without "-p" option. Please help to review it.

Subject:
[PATCH 0/2 v2] Exclude hwpoison page from vmcore dump
[PATCH 1/2 v2] makedumpfile: Add a default action to exclude hwpoison page from vmcore
[PATCH 2/2 v2] kexec: Export PG_hwpoison flag into vmcoreinfo

Regards,
Mitshuhiro Tanino ([email protected])

2012-10-31 14:05:04

by Mitsuhiro Tanino

[permalink] [raw]
Subject: [PATCH 0/2 v2] Exclude hwpoison page from vmcore dump

Hi All,

Please find a set of patches that introduce a function into
"makedumpfile" to exclude hwpoison page from vmcore dump.

Changes from v1 to v2:
Patch1: Remove "-p" option.

Details as described below.

Problem
-------
Recently, according to increase large memory systems, possibility of
failures which come from memory crash are also increasing.
Regarding this, Linux has a hwpoison feature and this can isolate
uncorrectable error in memory which are reported as SRAO machine check.

However, when a user gets a core dump file using kdump, dump kernel
does not know which memory has uncorrectable error(SRAO) and
dump kernel touches memory which has uncorrectable error.
As a result, a fatal machine check occurs and a user fails to get vmcore.

This problem was previously discussed in the kexec community, with a
proposal to Slimdump framework (refer: mail threads pertaining to
http://lists.infradead.org/pipermail/kexec/2011-October/005586.html).

Solution
--------
As Vivek mentioned in the above threads, "makedumpfile" has a
filtering function and this can exclude some types of pages,
like zero page, free page, user data, etc, without saving the whole dump.
This function checks "pageflags" of struct page arrays and if
target page has a flag which is specified the "makedumpfile" option,
the page is excluded.
Using this function, "makedumpfile" can exclude poisoned pages which
has PG_hwpoison flag.

These patches introduce a function into "makedumpfile" to
exclude hwpoison page from vmcore.


Test Results
------------
These patches are tested on 3.6.0-rc6 kernel and makedumpfile-1.5.0
using software pseudo MCE injection from KVM host to guest.


**** Host OS Screen logs(SRAO Machine Check injection)
Inject software pseudo MCE into guest qemu process.

(1) Load mce-inject module
# modprobe mce-inject

(2) Find a PID of target qemu-kvm and page struct
# ps -C qemu-kvm -o pid=
3612
9392

(3) Edit software pseudo MCE data
Choose a offset of page struct and insert the offset to ADDR line in mce-file.

# ./page-types -p 3612 -LN -b anon | head
voffset offset flags
8cb 86b98d ___U_lA____Ma_b___________________
8cc 86b8ef ___U_lA____Ma_b___________________
8cd 86ca04 ___U_lA____Ma_b___________________
8cf 86bb11 ___U_lA____Ma_b___________________
8d0 86bac7 ___U_lA____Ma_b___________________
8d2 86b0c4 ___U_lA____Ma_b___________________
8d4 86ab8d ___U_lA____Ma_b___________________
8d7 86c5e1 ___U_lA____Ma_b___________________
8d8 86c5e3 ___U_lA____Ma_b___________________

# vi mce-file
CPU 0 BANK 2
STATUS UNCORRECTED SRAO 0x17a
MCGSTATUS MCIP RIPV
MISC 0x8c
ADDR 0x86b98d000
EOF

(4) Inject MCE
# mce-inject mce-file

Try step (3) to (4) a couple of times

**** Guest OS Screen logs(kdump)
Guest catches MCE injection from qemu.
Then, run "echo c > /proc/sysrq-trigger" in order to execute makedumpfile.
-------------
[root@fedora17x64 ~]# uname -a
Linux fedora17x64 3.6.0+ #3 SMP Sat Sep 29 14:42:23 JST 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@fedora17x64 ~]# [ 245.348147] Disabling lock debugging due to kernel taint
[ 245.348147] mce: [Hardware Error]: Machine check events logged
[ 245.850863] MCE 0xbb706: non LRU page recovery: Ignored
[ 246.348113] mce: [Hardware Error]: Machine check events logged
[ 246.848190] MCE 0xbb709: non LRU page recovery: Ignored
[ 249.847472] MCE 0xbb70a: non LRU page recovery: Ignored
[ 250.336716] MCE 0xbb70b: non LRU page recovery: Ignored
[ 252.847280] MCE 0xb8ff8: clean LRU page recovery: Recovered
[ 253.847251] MCE 0xb8ff9: clean LRU page recovery: Recovered
[ 256.051190] MCE 0xb68e8: clean LRU page recovery: Recovered
[ 257.000764] MCE 0xb68e9: clean LRU page recovery: Recovered

[root@fedora17x64 ~]# [ 276.980192] MCE 0xb66e8: LRU page recovery: Recovered
[ 277.847269] MCE 0xb66e9: corrupted page was clean: dropped without side effects
[ 277.848360] MCE 0xb66e9: clean LRU page recovery: Recovered

[root@fedora17x64 ~]# echo c > /proc/sysrq-trigger
[ 299.612689] SysRq : Trigger a crash
[ 299.613339] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 299.613339] IP: [<ffffffff81373606>] sysrq_handle_crash+0x16/0x20
[ 299.613339] PGD ba732067 PUD babc2067 PMD 0
[ 299.613339] Oops: 0002 [#1] SMP
..............
................
.................
[ 299.613339] Call Trace:
[ 299.613339] [<ffffffff81373d27>] __handle_sysrq+0x127/0x190
[ 299.613339] [<ffffffff81373dda>] write_sysrq_trigger+0x4a/0x50
[ 299.613339] [<ffffffff811dd6d8>] proc_reg_write+0x78/0xb0
[ 299.613339] [<ffffffff8117b83c>] vfs_write+0xac/0x180
[ 299.613339] [<ffffffff8117bb6a>] sys_write+0x4a/0x90
[ 299.613339] [<ffffffff815fa329>] system_call_fastpath+0x16/0x1b
[ 299.613339] Code: 65 2c 75 cd 4c 89 ef e8 89 f7 ff ff eb c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 01 a5 ab 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 55 48 89 e5 53 48 83 ec 08 0f 1f
[ 299.613339] RIP [<ffffffff81373606>] sysrq_handle_crash+0x16/0x20
[ 299.613339] RSP <ffff880037a77e38>
[ 299.613339] CR2: 0000000000000000
..............
................
.................
++ KDUMP_PATH=/var/crash
++ CORE_COLLECTOR='makedumpfile -d 31 -c'
++ DEFAULT_ACTION=dump_rootfs
+++ date +%d.%m.%y-%T
++ DATEDIR=29.10.12-15:32:02
++ DUMP_INSTRUCTION=
++ read_kdump_conf
++ local conf_file=/etc/kdump.conf
++ '[' -f /etc/kdump.conf ']'
++ read config_opt config_val
++ case "$config_opt" in
++ read config_opt config_val
++ case "$config_opt" in
++ read config_opt config_val
++ case "$config_opt" in
++ CORE_COLLECTOR='makedumpfile -c -d 30 -D --message-level 31'
++ read config_opt config_val
++ '[' -n '' ']'
++ dump_rootfs
++ mount -o remount,rw /sysroot/
[ 1.796062] EXT4-fs (dm-1): re-mounted. Opts: (null)
++ mkdir -p /sysroot//var/crash/29.10.12-15:32:02
++ makedumpfile -c -d 30 -D --message-level 31 /proc/vmcore /sysroot//var/crash/29.10.12-15:32:02/vmcore
sadump: does not have partition header
sadump: read dump device as unknown format
sadump: unknown format
..............
................
.................
Excluding free pages : [100 %] STEP [Excluding free pages ] : 0.085096 seconds
Excluding unnecessary pages : [100 %] STEP [Excluding unnecessary pages] : 0.561497 seconds
Excluding free pages : [100 %] STEP [Excluding free pages ] : 0.081891 seconds
Excluding unnecessary pages : [100 %] STEP [Excluding unnecessary pages] : 0.531003 seconds
Copying data : [100 %] STEP [Copying data ] : 5.206374 seconds

Writing erase info...
offset_eraseinfo: 16225d7, size_eraseinfo: 0

Original pages : 0x00000000000b133c
Excluded pages : 0x00000000000a76bf
Pages filled with zero : 0x0000000000000000
Cache pages : 0x0000000000006df7
Cache pages + private : 0x0000000000003451
User process data pages : 0x0000000000002e03
Free pages : 0x000000000009a660
Hwpoison pages : 0x0000000000000014
Remaining pages : 0x0000000000009c7d
(The number of pages is reduced to 5%.)
Memory Hole : 0x000000000000ecc1
--------------------------------------------------
Total pages : 0x00000000000bfffd


The dumpfile is saved to /sysroot//var/crash/29.10.12-15:32:02/vmcore.

makedumpfile Completed.
++ sync
++ reboot -f
Rebooting.
[ 8.176645] Restarting system.
[ 8.177463] reboot: machine restart

2012-10-31 14:05:09

by Mitsuhiro Tanino

[permalink] [raw]
Subject: [PATCH 1/2 v2] makedumpfile: Add a default action to exclude hwpoison page from vmcore

This patch introduces a function which excludes hwpoison pages
from vmcore as a default action for makedumpfile.

Signed-off-by: Mitsuhiro Tanino <[email protected]>
diff -uprN a/makedumpfile.c b/makedumpfile.c
--- a/makedumpfile.c 2012-10-01 15:26:54.510354074 +0900
+++ b/makedumpfile.c 2012-10-29 22:32:24.913057535 +0900
@@ -43,6 +43,7 @@ unsigned long long pfn_cache;
unsigned long long pfn_cache_private;
unsigned long long pfn_user;
unsigned long long pfn_free;
+unsigned long long pfn_hwpoison;

unsigned long long num_dumped;

@@ -969,6 +970,7 @@ get_structure_info(void)
ENUM_NUMBER_INIT(PG_lru, "PG_lru");
ENUM_NUMBER_INIT(PG_private, "PG_private");
ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
+ ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");

TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");

@@ -1371,6 +1373,7 @@ write_vmcoreinfo_data(void)
WRITE_NUMBER("PG_lru", PG_lru);
WRITE_NUMBER("PG_private", PG_private);
WRITE_NUMBER("PG_swapcache", PG_swapcache);
+ WRITE_NUMBER("PG_hwpoison", PG_hwpoison);

/*
* write the source file of 1st kernel
@@ -1659,6 +1662,7 @@ read_vmcoreinfo(void)
READ_NUMBER("PG_lru", PG_lru);
READ_NUMBER("PG_private", PG_private);
READ_NUMBER("PG_swapcache", PG_swapcache);
+ READ_NUMBER("PG_hwpoison", PG_hwpoison);

READ_SRCFILE("pud_t", pud_t);

@@ -3856,6 +3860,13 @@ __exclude_unnecessary_pages(unsigned lon
if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
pfn_user++;
}
+ /*
+ * Exclude the hwpoison page.
+ */
+ else if (isHWPOISON(flags)) {
+ clear_bit_on_2nd_bitmap_for_kernel(pfn);
+ pfn_hwpoison++;
+ }
}
return TRUE;
}
@@ -3914,11 +3925,13 @@ exclude_unnecessary_pages_cyclic(void)
return FALSE;

/*
- * Exclude cache pages, cache private pages, user data pages, and free pages.
+ * Exclude cache pages, cache private pages, user data pages,
+ free pages and hwpoison pages.
*/
if (info->dump_level & DL_EXCLUDE_CACHE ||
info->dump_level & DL_EXCLUDE_CACHE_PRI ||
- info->dump_level & DL_EXCLUDE_USER_DATA) {
+ info->dump_level & DL_EXCLUDE_USER_DATA ||
+ (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {

gettimeofday(&tv_start, NULL);

@@ -4018,11 +4031,13 @@ create_2nd_bitmap(void)
}

/*
- * Exclude cache pages, cache private pages, user data pages.
+ * Exclude cache pages, cache private pages, user data pages,
+ * and hwpoison pages.
*/
if (info->dump_level & DL_EXCLUDE_CACHE ||
info->dump_level & DL_EXCLUDE_CACHE_PRI ||
- info->dump_level & DL_EXCLUDE_USER_DATA) {
+ info->dump_level & DL_EXCLUDE_USER_DATA ||
+ (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {
if (!exclude_unnecessary_pages()) {
ERRMSG("Can't exclude unnecessary pages.\n");
return FALSE;
@@ -5062,7 +5077,8 @@ write_elf_pages_cyclic(struct cache_data
/*
* Reset counter for debug message.
*/
- pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
+ pfn_zero = pfn_cache = pfn_cache_private = 0;
+ pfn_user = pfn_free = pfn_hwpoison = 0;
pfn_memhole = info->max_mapnr;

info->cyclic_start_pfn = 0;
@@ -5902,7 +5918,8 @@ write_kdump_pages_and_bitmap_cyclic(stru
/*
* Reset counter for debug message.
*/
- pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
+ pfn_zero = pfn_cache = pfn_cache_private = 0;
+ pfn_user = pfn_free = pfn_hwpoison = 0;
pfn_memhole = info->max_mapnr;

cd_header->offset
@@ -6687,7 +6704,7 @@ print_report(void)
pfn_original = info->max_mapnr - pfn_memhole;

pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
- + pfn_user + pfn_free;
+ + pfn_user + pfn_free + pfn_hwpoison;
shrinking = (pfn_original - pfn_excluded) * 100;
shrinking = shrinking / pfn_original;

@@ -6700,6 +6717,7 @@ print_report(void)
pfn_cache_private);
REPORT_MSG(" User process data pages : 0x%016llx\n", pfn_user);
REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free);
+ REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison);
REPORT_MSG(" Remaining pages : 0x%016llx\n",
pfn_original - pfn_excluded);
REPORT_MSG(" (The number of pages is reduced to %lld%%.)\n",
diff -uprN a/makedumpfile.h b/makedumpfile.h
--- a/makedumpfile.h 2012-10-01 15:26:54.512354076 +0900
+++ b/makedumpfile.h 2012-10-29 20:59:19.723015190 +0900
@@ -107,6 +107,8 @@ test_bit(int nr, unsigned long addr)
#define isLRU(flags) test_bit(NUMBER(PG_lru), flags)
#define isPrivate(flags) test_bit(NUMBER(PG_private), flags)
#define isSwapCache(flags) test_bit(NUMBER(PG_swapcache), flags)
+#define isHWPOISON(flags) (test_bit(NUMBER(PG_hwpoison), flags) \
+ && (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER))

static inline int
isAnon(unsigned long mapping)
@@ -1244,6 +1246,7 @@ struct number_table {
long PG_lru;
long PG_private;
long PG_swapcache;
+ long PG_hwpoison;
};

struct srcfile_table {

2012-10-31 14:05:26

by Mitsuhiro Tanino

[permalink] [raw]
Subject: [PATCH 2/2 v2] kexec: Export PG_hwpoison flag into vmcoreinfo

This patch exports a PG_hwpoison into vmcoreinfo when
CONFIG_MEMORY_FAILURE is defined.
"makedumpfile" needs to read information of memory, such as
'mem_section', 'zone', 'pageflags' from vmcore.

We introduce a function into "makedumpfile" to exclude
hwpoison page from vmcore dump.
In order to introduce this function, PG_hwpoison flag have
to export into vmcoreinfo.

Signed-off-by: Mitsuhiro Tanino <[email protected]>
---
kernel/kexec.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 0668d58..0d5d6bc 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1513,6 +1513,9 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_lru);
VMCOREINFO_NUMBER(PG_private);
VMCOREINFO_NUMBER(PG_swapcache);
+#ifdef CONFIG_MEMORY_FAILURE
+ VMCOREINFO_NUMBER(PG_hwpoison);
+#endif

arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
--
1.7.10.1

2012-10-31 14:14:32

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Patch 0/2] Exclude hwpoison page from vmcore dump

On Wed, Oct 31, 2012 at 10:51:55PM +0900, Mitsuhiro Tanino wrote:
> Hi Vivek,
>
> (2012/10/30 23:37), Vivek Goyal wrote:
> > Why to introduce this option "-p"? Until and unless there are serious
> > side effects, this should be default functionality. Isn't it? Who would
> > like to touch/save poisoned pages and run into MCE?
>
> Thank you for your review of my patch.
>
> In my understanding, hwpoison is a function which is only supported at
> high end servers and most of users do not use this function.
> Therefore, I think this functionality was better for option.

If hwpoision functionality is not available in hardware, then respective
bit will not be even set in struct page and it will be saved by default.
So it should not matter whether hardware has hwpoision functionality
or not.

>
> On the other hand, as you say, nobody wants to touch/save poisoned pages
> and run into MCE, and it is desirable for users to exclude hwpoison pages
> automatically. I agree with you.
>
> I will post fixed patch without "-p" option. Please help to review it.

I think that removing hwpoisno pages by default makes sense. If somebody
does have a reasonable case of not doing so, then we could either
introduce anther filtering level (based on type) or add another command
line option like (--no-hwposion-filtering) etc.

Thanks
Vivek

2012-11-01 10:32:49

by Mitsuhiro Tanino

[permalink] [raw]
Subject: Re: [Patch 0/2] Exclude hwpoison page from vmcore dump

Hi Vivek,

(2012/10/31 23:14), Vivek Goyal wrote:
> If hwpoision functionality is not available in hardware, then respective
> bit will not be even set in struct page and it will be saved by default.
> So it should not matter whether hardware has hwpoision functionality
> or not.

Thanks, I understand.

> I think that removing hwpoisno pages by default makes sense. If somebody
> does have a reasonable case of not doing so, then we could either
> introduce anther filtering level (based on type) or add another command
> line option like (--no-hwposion-filtering) etc.

I agree with you.
If somebody requests to support for disabling the filter of hwpoison pages,
this option will be discussed.

Regards,
Mitshuhiro Tanino ([email protected])

2012-11-01 10:41:06

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] Exclude hwpoison page from vmcore dump

Mitsuhiro Tanino <[email protected]> writes:

> Hi Vivek,
>
> (2012/10/31 23:14), Vivek Goyal wrote:
>> If hwpoision functionality is not available in hardware, then respective
>> bit will not be even set in struct page and it will be saved by default.
>> So it should not matter whether hardware has hwpoision functionality
>> or not.
>
> Thanks, I understand.
>
>> I think that removing hwpoisno pages by default makes sense. If somebody
>> does have a reasonable case of not doing so, then we could either
>> introduce anther filtering level (based on type) or add another command
>> line option like (--no-hwposion-filtering) etc.
>
> I agree with you.
> If somebody requests to support for disabling the filter of hwpoison pages,
> this option will be discussed.


I agree. If we are performing filtering filtering out of poisoned pages
seems very reasonable. So ack on the basic concept.

Eric

2012-11-05 06:23:09

by Atsushi Kumagai

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] makedumpfile: Add a default action to exclude hwpoison page from vmcore

Hello Tanino-san,

On Wed, 31 Oct 2012 23:05:01 +0900
Mitsuhiro Tanino <[email protected]> wrote:

> This patch introduces a function which excludes hwpoison pages
> from vmcore as a default action for makedumpfile.
>
> Signed-off-by: Mitsuhiro Tanino <[email protected]>

Thank you for your work. I think it's good feature.

I will merge this patch into makedumpfile-1.5.2 with the small change below.
Of course, I will accept --no-hwposion-filtering option when it's needed.


diff --git a/makedumpfile.c b/makedumpfile.c
index 30cf130..fcf42f6 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3864,8 +3864,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
* Exclude the hwpoison page.
*/
else if (isHWPOISON(flags)) {
- clear_bit_on_2nd_bitmap_for_kernel(pfn);
- pfn_hwpoison++;
+ if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+ pfn_hwpoison++;
}
}
return TRUE;


Thanks
Atsushi Kumagai


> diff -uprN a/makedumpfile.c b/makedumpfile.c
> --- a/makedumpfile.c 2012-10-01 15:26:54.510354074 +0900
> +++ b/makedumpfile.c 2012-10-29 22:32:24.913057535 +0900
> @@ -43,6 +43,7 @@ unsigned long long pfn_cache;
> unsigned long long pfn_cache_private;
> unsigned long long pfn_user;
> unsigned long long pfn_free;
> +unsigned long long pfn_hwpoison;
>
> unsigned long long num_dumped;
>
> @@ -969,6 +970,7 @@ get_structure_info(void)
> ENUM_NUMBER_INIT(PG_lru, "PG_lru");
> ENUM_NUMBER_INIT(PG_private, "PG_private");
> ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
> + ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");
>
> TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");
>
> @@ -1371,6 +1373,7 @@ write_vmcoreinfo_data(void)
> WRITE_NUMBER("PG_lru", PG_lru);
> WRITE_NUMBER("PG_private", PG_private);
> WRITE_NUMBER("PG_swapcache", PG_swapcache);
> + WRITE_NUMBER("PG_hwpoison", PG_hwpoison);
>
> /*
> * write the source file of 1st kernel
> @@ -1659,6 +1662,7 @@ read_vmcoreinfo(void)
> READ_NUMBER("PG_lru", PG_lru);
> READ_NUMBER("PG_private", PG_private);
> READ_NUMBER("PG_swapcache", PG_swapcache);
> + READ_NUMBER("PG_hwpoison", PG_hwpoison);
>
> READ_SRCFILE("pud_t", pud_t);
>
> @@ -3856,6 +3860,13 @@ __exclude_unnecessary_pages(unsigned lon
> if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> pfn_user++;
> }
> + /*
> + * Exclude the hwpoison page.
> + */
> + else if (isHWPOISON(flags)) {
> + clear_bit_on_2nd_bitmap_for_kernel(pfn);
> + pfn_hwpoison++;
> + }
> }
> return TRUE;
> }
> @@ -3914,11 +3925,13 @@ exclude_unnecessary_pages_cyclic(void)
> return FALSE;
>
> /*
> - * Exclude cache pages, cache private pages, user data pages, and free pages.
> + * Exclude cache pages, cache private pages, user data pages,
> + free pages and hwpoison pages.
> */
> if (info->dump_level & DL_EXCLUDE_CACHE ||
> info->dump_level & DL_EXCLUDE_CACHE_PRI ||
> - info->dump_level & DL_EXCLUDE_USER_DATA) {
> + info->dump_level & DL_EXCLUDE_USER_DATA ||
> + (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {
>
> gettimeofday(&tv_start, NULL);
>
> @@ -4018,11 +4031,13 @@ create_2nd_bitmap(void)
> }
>
> /*
> - * Exclude cache pages, cache private pages, user data pages.
> + * Exclude cache pages, cache private pages, user data pages,
> + * and hwpoison pages.
> */
> if (info->dump_level & DL_EXCLUDE_CACHE ||
> info->dump_level & DL_EXCLUDE_CACHE_PRI ||
> - info->dump_level & DL_EXCLUDE_USER_DATA) {
> + info->dump_level & DL_EXCLUDE_USER_DATA ||
> + (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {
> if (!exclude_unnecessary_pages()) {
> ERRMSG("Can't exclude unnecessary pages.\n");
> return FALSE;
> @@ -5062,7 +5077,8 @@ write_elf_pages_cyclic(struct cache_data
> /*
> * Reset counter for debug message.
> */
> - pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
> + pfn_zero = pfn_cache = pfn_cache_private = 0;
> + pfn_user = pfn_free = pfn_hwpoison = 0;
> pfn_memhole = info->max_mapnr;
>
> info->cyclic_start_pfn = 0;
> @@ -5902,7 +5918,8 @@ write_kdump_pages_and_bitmap_cyclic(stru
> /*
> * Reset counter for debug message.
> */
> - pfn_zero = pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
> + pfn_zero = pfn_cache = pfn_cache_private = 0;
> + pfn_user = pfn_free = pfn_hwpoison = 0;
> pfn_memhole = info->max_mapnr;
>
> cd_header->offset
> @@ -6687,7 +6704,7 @@ print_report(void)
> pfn_original = info->max_mapnr - pfn_memhole;
>
> pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
> - + pfn_user + pfn_free;
> + + pfn_user + pfn_free + pfn_hwpoison;
> shrinking = (pfn_original - pfn_excluded) * 100;
> shrinking = shrinking / pfn_original;
>
> @@ -6700,6 +6717,7 @@ print_report(void)
> pfn_cache_private);
> REPORT_MSG(" User process data pages : 0x%016llx\n", pfn_user);
> REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free);
> + REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison);
> REPORT_MSG(" Remaining pages : 0x%016llx\n",
> pfn_original - pfn_excluded);
> REPORT_MSG(" (The number of pages is reduced to %lld%%.)\n",
> diff -uprN a/makedumpfile.h b/makedumpfile.h
> --- a/makedumpfile.h 2012-10-01 15:26:54.512354076 +0900
> +++ b/makedumpfile.h 2012-10-29 20:59:19.723015190 +0900
> @@ -107,6 +107,8 @@ test_bit(int nr, unsigned long addr)
> #define isLRU(flags) test_bit(NUMBER(PG_lru), flags)
> #define isPrivate(flags) test_bit(NUMBER(PG_private), flags)
> #define isSwapCache(flags) test_bit(NUMBER(PG_swapcache), flags)
> +#define isHWPOISON(flags) (test_bit(NUMBER(PG_hwpoison), flags) \
> + && (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER))
>
> static inline int
> isAnon(unsigned long mapping)
> @@ -1244,6 +1246,7 @@ struct number_table {
> long PG_lru;
> long PG_private;
> long PG_swapcache;
> + long PG_hwpoison;
> };
>
> struct srcfile_table {
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec