Introduce the capability to dynamically configure the maximum file
note size for ELF core dumps via sysctl.
Why is this being done?
We have observed that during a crash when there are more than 65k mmaps
in memory, the existing fixed limit on the size of the ELF notes section
becomes a bottleneck. The notes section quickly reaches its capacity,
leading to incomplete memory segment information in the resulting coredump.
This truncation compromises the utility of the coredumps, as crucial
information about the memory state at the time of the crash might be
omitted.
This enhancement removes the previous static limit of 4MB, allowing
system administrators to adjust the size based on system-specific
requirements or constraints.
Eg:
$ sysctl -a | grep core_file_note_size_min
kernel.core_file_note_size_max = 4194304
$ sysctl -n kernel.core_file_note_size_min
4194304
$echo 519304 > /proc/sys/kernel/core_file_note_size_min
$sysctl -n kernel.core_file_note_size_min
519304
Attempting to write beyond the ceiling value of 16MB
$echo 17194304 > /proc/sys/kernel/core_file_note_size_min
bash: echo: write error: Invalid argument
Signed-off-by: Vijay Nag <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
---
Changes in v4:
- Rename core_file_note_size_max to core_file_note_size_min [kees]
- Rename core_file_note_size_max to MAX_FILE_NOTE_SIZE to
CORE_FILE_NOTE_SIZE_DEFAULT and MAX_ALLOWED_NOTE_SIZE to
CORE_FILE_NOTE_SIZE_MAX [Kees]
- change core_file_note_size_allowed to static and const [Kees]
Changes in v3:
- Fix commit message to reflect the correct sysctl knob [Kees]
- Add a ceiling for maximum pssible note size(16M) [Allen]
- Add a pr_warn_once() [Kees]
Changes in v2:
- Move new sysctl to fs/coredump.c [Luis & Kees]
- rename max_file_note_size to core_file_note_size_max [kees]
- Capture "why this is being done?" int he commit message [Luis & Kees]
---
fs/binfmt_elf.c | 7 +++++--
fs/coredump.c | 15 +++++++++++++++
include/linux/coredump.h | 1 +
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5397b552fbeb..4dc7eb265a97 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1564,7 +1564,6 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
}
-#define MAX_FILE_NOTE_SIZE (4*1024*1024)
/*
* Format of NT_FILE note:
*
@@ -1592,8 +1591,12 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
names_ofs = (2 + 3 * count) * sizeof(data[0]);
alloc:
- if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
+ /* paranoia check */
+ if (size >= core_file_note_size_min) {
+ pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
+ size);
return -EINVAL;
+ }
size = round_up(size, PAGE_SIZE);
/*
* "size" can be 0 here legitimately.
diff --git a/fs/coredump.c b/fs/coredump.c
index be6403b4b14b..20807c3c5477 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -56,10 +56,16 @@
static bool dump_vma_snapshot(struct coredump_params *cprm);
static void free_vma_snapshot(struct coredump_params *cprm);
+#define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
+/* Define a reasonable max cap */
+#define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
+
static int core_uses_pid;
static unsigned int core_pipe_limit;
static char core_pattern[CORENAME_MAX_SIZE] = "core";
static int core_name_size = CORENAME_MAX_SIZE;
+static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
+unsigned int core_file_note_size_min = CORE_FILE_NOTE_SIZE_DEFAULT;
struct core_name {
char *corename;
@@ -1020,6 +1026,15 @@ static struct ctl_table coredump_sysctls[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+ {
+ .procname = "core_file_note_size_min",
+ .data = &core_file_note_size_min,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = &core_file_note_size_min,
+ .extra2 = (unsigned int *) &core_file_note_size_max,
+ },
};
static int __init init_fs_coredump_sysctls(void)
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index d3eba4360150..f6be9fd2aea7 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -46,6 +46,7 @@ static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
#endif
#if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
+extern unsigned int core_file_note_size_min;
extern void validate_coredump_safety(void);
#else
static inline void validate_coredump_safety(void) {}
--
2.17.1
Hi Allen,
kernel test robot noticed the following build errors:
[auto build test ERROR on kees/for-next/execve]
[also build test ERROR on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
patch link: https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
config: loongarch-randconfig-001-20240507 (https://download.01.org/0day-ci/archive/20240507/[email protected]/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240507/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
fs/binfmt_elf.c: In function 'fill_files_note':
>> fs/binfmt_elf.c:1598:21: error: 'core_file_note_size_min' undeclared (first use in this function)
1598 | if (size >= core_file_note_size_min) {
| ^~~~~~~~~~~~~~~~~~~~~~~
fs/binfmt_elf.c:1598:21: note: each undeclared identifier is reported only once for each function it appears in
vim +/core_file_note_size_min +1598 fs/binfmt_elf.c
1569
1570 /*
1571 * Format of NT_FILE note:
1572 *
1573 * long count -- how many files are mapped
1574 * long page_size -- units for file_ofs
1575 * array of [COUNT] elements of
1576 * long start
1577 * long end
1578 * long file_ofs
1579 * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
1580 */
1581 static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm)
1582 {
1583 unsigned count, size, names_ofs, remaining, n;
1584 user_long_t *data;
1585 user_long_t *start_end_ofs;
1586 char *name_base, *name_curpos;
1587 int i;
1588
1589 /* *Estimated* file count and total data size needed */
1590 count = cprm->vma_count;
1591 if (count > UINT_MAX / 64)
1592 return -EINVAL;
1593 size = count * 64;
1594
1595 names_ofs = (2 + 3 * count) * sizeof(data[0]);
1596 alloc:
1597 /* paranoia check */
> 1598 if (size >= core_file_note_size_min) {
1599 pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
1600 size);
1601 return -EINVAL;
1602 }
1603 size = round_up(size, PAGE_SIZE);
1604 /*
1605 * "size" can be 0 here legitimately.
1606 * Let it ENOMEM and omit NT_FILE section which will be empty anyway.
1607 */
1608 data = kvmalloc(size, GFP_KERNEL);
1609 if (ZERO_OR_NULL_PTR(data))
1610 return -ENOMEM;
1611
1612 start_end_ofs = data + 2;
1613 name_base = name_curpos = ((char *)data) + names_ofs;
1614 remaining = size - names_ofs;
1615 count = 0;
1616 for (i = 0; i < cprm->vma_count; i++) {
1617 struct core_vma_metadata *m = &cprm->vma_meta[i];
1618 struct file *file;
1619 const char *filename;
1620
1621 file = m->file;
1622 if (!file)
1623 continue;
1624 filename = file_path(file, name_curpos, remaining);
1625 if (IS_ERR(filename)) {
1626 if (PTR_ERR(filename) == -ENAMETOOLONG) {
1627 kvfree(data);
1628 size = size * 5 / 4;
1629 goto alloc;
1630 }
1631 continue;
1632 }
1633
1634 /* file_path() fills at the end, move name down */
1635 /* n = strlen(filename) + 1: */
1636 n = (name_curpos + remaining) - filename;
1637 remaining = filename - name_curpos;
1638 memmove(name_curpos, filename, n);
1639 name_curpos += n;
1640
1641 *start_end_ofs++ = m->start;
1642 *start_end_ofs++ = m->end;
1643 *start_end_ofs++ = m->pgoff;
1644 count++;
1645 }
1646
1647 /* Now we know exact count of files, can store it */
1648 data[0] = count;
1649 data[1] = PAGE_SIZE;
1650 /*
1651 * Count usually is less than mm->map_count,
1652 * we need to move filenames down.
1653 */
1654 n = cprm->vma_count - count;
1655 if (n != 0) {
1656 unsigned shift_bytes = n * 3 * sizeof(data[0]);
1657 memmove(name_base - shift_bytes, name_base,
1658 name_curpos - name_base);
1659 name_curpos -= shift_bytes;
1660 }
1661
1662 size = name_curpos - (char *)data;
1663 fill_note(note, "CORE", NT_FILE, size, data);
1664 return 0;
1665 }
1666
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on kees/for-next/execve]
> [also build test ERROR on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
> base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
> patch link: https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
> patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
> config: loongarch-randconfig-001-20240507 (https://download.01.org/0day-ci/archive/20240507/[email protected]/config)
> compiler: loongarch64-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240507/[email protected]/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
>
Thanks for reporting. The kernel builds fine with
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
for-next/execve.
The issue is with loongarch-randconfig, which has CONFIG_SYSCTL set to
"n". It is needed for this patch.
Thanks,
Allen
> All errors (new ones prefixed by >>):
>
> fs/binfmt_elf.c: In function 'fill_files_note':
> >> fs/binfmt_elf.c:1598:21: error: 'core_file_note_size_min' undeclared (first use in this function)
> 1598 | if (size >= core_file_note_size_min) {
> | ^~~~~~~~~~~~~~~~~~~~~~~
> fs/binfmt_elf.c:1598:21: note: each undeclared identifier is reported only once for each function it appears in
>
>
> vim +/core_file_note_size_min +1598 fs/binfmt_elf.c
>
> 1569
> 1570 /*
> 1571 * Format of NT_FILE note:
> 1572 *
> 1573 * long count -- how many files are mapped
> 1574 * long page_size -- units for file_ofs
> 1575 * array of [COUNT] elements of
> 1576 * long start
> 1577 * long end
> 1578 * long file_ofs
> 1579 * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
> 1580 */
> 1581 static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm)
> 1582 {
> 1583 unsigned count, size, names_ofs, remaining, n;
> 1584 user_long_t *data;
> 1585 user_long_t *start_end_ofs;
> 1586 char *name_base, *name_curpos;
> 1587 int i;
> 1588
> 1589 /* *Estimated* file count and total data size needed */
> 1590 count = cprm->vma_count;
> 1591 if (count > UINT_MAX / 64)
> 1592 return -EINVAL;
> 1593 size = count * 64;
> 1594
> 1595 names_ofs = (2 + 3 * count) * sizeof(data[0]);
> 1596 alloc:
> 1597 /* paranoia check */
> > 1598 if (size >= core_file_note_size_min) {
> 1599 pr_warn_once("coredump Note size too large: %u (does kernel.core_file_note_size_min sysctl need adjustment?\n",
> 1600 size);
> 1601 return -EINVAL;
> 1602 }
> 1603 size = round_up(size, PAGE_SIZE);
> 1604 /*
> 1605 * "size" can be 0 here legitimately.
> 1606 * Let it ENOMEM and omit NT_FILE section which will be empty anyway.
> 1607 */
> 1608 data = kvmalloc(size, GFP_KERNEL);
> 1609 if (ZERO_OR_NULL_PTR(data))
> 1610 return -ENOMEM;
> 1611
> 1612 start_end_ofs = data + 2;
> 1613 name_base = name_curpos = ((char *)data) + names_ofs;
> 1614 remaining = size - names_ofs;
> 1615 count = 0;
> 1616 for (i = 0; i < cprm->vma_count; i++) {
> 1617 struct core_vma_metadata *m = &cprm->vma_meta[i];
> 1618 struct file *file;
> 1619 const char *filename;
> 1620
> 1621 file = m->file;
> 1622 if (!file)
> 1623 continue;
> 1624 filename = file_path(file, name_curpos, remaining);
> 1625 if (IS_ERR(filename)) {
> 1626 if (PTR_ERR(filename) == -ENAMETOOLONG) {
> 1627 kvfree(data);
> 1628 size = size * 5 / 4;
> 1629 goto alloc;
> 1630 }
> 1631 continue;
> 1632 }
> 1633
> 1634 /* file_path() fills at the end, move name down */
> 1635 /* n = strlen(filename) + 1: */
> 1636 n = (name_curpos + remaining) - filename;
> 1637 remaining = filename - name_curpos;
> 1638 memmove(name_curpos, filename, n);
> 1639 name_curpos += n;
> 1640
> 1641 *start_end_ofs++ = m->start;
> 1642 *start_end_ofs++ = m->end;
> 1643 *start_end_ofs++ = m->pgoff;
> 1644 count++;
> 1645 }
> 1646
> 1647 /* Now we know exact count of files, can store it */
> 1648 data[0] = count;
> 1649 data[1] = PAGE_SIZE;
> 1650 /*
> 1651 * Count usually is less than mm->map_count,
> 1652 * we need to move filenames down.
> 1653 */
> 1654 n = cprm->vma_count - count;
> 1655 if (n != 0) {
> 1656 unsigned shift_bytes = n * 3 * sizeof(data[0]);
> 1657 memmove(name_base - shift_bytes, name_base,
> 1658 name_curpos - name_base);
> 1659 name_curpos -= shift_bytes;
> 1660 }
> 1661
> 1662 size = name_curpos - (char *)data;
> 1663 fill_note(note, "CORE", NT_FILE, size, data);
> 1664 return 0;
> 1665 }
> 1666
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
--
- Allen
Hi Allen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on kees/for-next/execve]
[also build test WARNING on brauner-vfs/vfs.all linus/master v6.9-rc7 next-20240507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Allen-Pais/fs-coredump-Enable-dynamic-configuration-of-max-file-note-size/20240507-033907
base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
patch link: https://lore.kernel.org/r/20240506193700.7884-1-apais%40linux.microsoft.com
patch subject: [PATCH v4] fs/coredump: Enable dynamic configuration of max file note size
config: arm64-randconfig-r054-20240507 (https://download.01.org/0day-ci/archive/20240508/[email protected]/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240508/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
>> fs/coredump.c:67:27: warning: 'core_file_note_size_max' defined but not used [-Wunused-const-variable=]
67 | static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
| ^~~~~~~~~~~~~~~~~~~~~~~
vim +/core_file_note_size_max +67 fs/coredump.c
62
63 static int core_uses_pid;
64 static unsigned int core_pipe_limit;
65 static char core_pattern[CORENAME_MAX_SIZE] = "core";
66 static int core_name_size = CORENAME_MAX_SIZE;
> 67 static const unsigned int core_file_note_size_max = CORE_FILE_NOTE_SIZE_MAX;
68 unsigned int core_file_note_size_min = CORE_FILE_NOTE_SIZE_DEFAULT;
69
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, 06 May 2024 19:37:00 +0000, Allen Pais wrote:
> Introduce the capability to dynamically configure the maximum file
> note size for ELF core dumps via sysctl.
>
> Why is this being done?
> We have observed that during a crash when there are more than 65k mmaps
> in memory, the existing fixed limit on the size of the ELF notes section
> becomes a bottleneck. The notes section quickly reaches its capacity,
> leading to incomplete memory segment information in the resulting coredump.
> This truncation compromises the utility of the coredumps, as crucial
> information about the memory state at the time of the crash might be
> omitted.
>
> [...]
I adjusted file names, but put it in -next. I had given some confusing
feedback on v3, but I didn't realize until later; apologies for that! The
end result is the sysctl is named kernel.core_file_note_size_limit and
the internal const min/max variables have the _min and _max suffixes.
Applied to for-next/execve, thanks!
[1/1] fs/coredump: Enable dynamic configuration of max file note size
https://git.kernel.org/kees/c/81e238b1299e
Take care,
--
Kees Cook
> On Mon, 06 May 2024 19:37:00 +0000, Allen Pais wrote:
> > Introduce the capability to dynamically configure the maximum file
> > note size for ELF core dumps via sysctl.
> >
> > Why is this being done?
> > We have observed that during a crash when there are more than 65k mmaps
> > in memory, the existing fixed limit on the size of the ELF notes section
> > becomes a bottleneck. The notes section quickly reaches its capacity,
> > leading to incomplete memory segment information in the resulting coredump.
> > This truncation compromises the utility of the coredumps, as crucial
> > information about the memory state at the time of the crash might be
> > omitted.
> >
> > [...]
>
> I adjusted file names, but put it in -next. I had given some confusing
> feedback on v3, but I didn't realize until later; apologies for that! The
> end result is the sysctl is named kernel.core_file_note_size_limit and
> the internal const min/max variables have the _min and _max suffixes.
>
> Applied to for-next/execve, thanks!
>
> [1/1] fs/coredump: Enable dynamic configuration of max file note size
> https://git.kernel.org/kees/c/81e238b1299e
>
I should have put some thought into the feedback.
Thank you for reviewing and fixing the patch.
--
- Allen