/proc/kallsyms is very useful for tracers and other tools that need to
map kernel symbols to addresses.
It would be useful if there were a mapping between kernel symbol and module
name that only changed when the kernel source code is changed. This mapping
should not change simply because a module becomes built into the kernel, so
that it's not broken by changes in user configuration. (DTrace for Linux
already uses the approach in this patch for this purpose.)
It might also be useful if there were reliable symbol size information to
determine whether an address is within a symbol or outside it, especially
given that there could be huge gaps between symbols.
Fix this by introducing a new config parameter CONFIG_KALLMODSYMS, which
when set results in output in /proc/kallmodsyms that looks like this:
ffffffff8b013d20 409 t pt_buffer_setup_aux
ffffffff8b014130 11f T intel_pt_interrupt
ffffffff8b014250 2d T cpu_emergency_stop_pt
ffffffff8b014280 13a t rapl_pmu_event_init [intel_rapl_perf]
ffffffff8b0143c0 bb t rapl_event_update [intel_rapl_perf]
ffffffff8b014480 10 t rapl_pmu_event_read [intel_rapl_perf]
ffffffff8b014490 a3 t rapl_cpu_offline [intel_rapl_perf]
ffffffff8b014540 24 t __rapl_event_show [intel_rapl_perf]
ffffffff8b014570 f2 t rapl_pmu_event_stop [intel_rapl_perf]
This is emitted even if intel_rapl_perf is built into the kernel.
Further down, we see what happens when object files are reused by
multiple modules, all of which are built in to the kernel:
ffffffffa22b3aa0 ab t handle_timestamp [liquidio]
ffffffffa22b3b50 4a t free_netbuf [liquidio]
ffffffffa22b3ba0 8d t liquidio_ptp_settime [liquidio]
ffffffffa22b3c30 b3 t liquidio_ptp_adjfreq [liquidio]
[...]
ffffffffa22b9490 203 t lio_vf_rep_create [liquidio]
ffffffffa22b96a0 16b t lio_vf_rep_destroy [liquidio]
ffffffffa22b9810 1f t lio_vf_rep_modinit [liquidio]
ffffffffa22b9830 1f t lio_vf_rep_modexit [liquidio]
ffffffffa22b9850 d2 t lio_ethtool_get_channels [liquidio] [liquidio_vf]
ffffffffa22b9930 9c t lio_ethtool_get_ringparam [liquidio] [liquidio_vf]
ffffffffa22b99d0 11 t lio_get_msglevel [liquidio] [liquidio_vf]
ffffffffa22b99f0 11 t lio_vf_set_msglevel [liquidio] [liquidio_vf]
ffffffffa22b9a10 2b t lio_get_pauseparam [liquidio] [liquidio_vf]
ffffffffa22b9a40 738 t lio_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba180 368 t lio_vf_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba4f0 37 t lio_get_regs_len [liquidio] [liquidio_vf]
ffffffffa22ba530 18 t lio_get_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba550 2e t lio_set_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba580 69 t lio_set_fecparam [liquidio] [liquidio_vf]
ffffffffa22ba5f0 92 t lio_get_fecparam [liquidio] [liquidio_vf]
[...]
ffffffffa22cbd10 175 t liquidio_set_mac [liquidio_vf]
ffffffffa22cbe90 ab t handle_timestamp [liquidio_vf]
ffffffffa22cbf40 4a t free_netbuf [liquidio_vf]
ffffffffa22cbf90 2b t octnet_link_status_change [liquidio_vf]
ffffffffa22cbfc0 7e t liquidio_vxlan_port_command.constprop.0 [liquidio_vf]
Much more detail and information on the (nearly nonexistent) memory usage
impact below.
We have to do several things to make this work, to figure out which
object files are in which modules, then which address ranges correspond
to these object files, then turn this into per-symbol output.
First, generate a file "modules_thick.builtin" that maps from the thin
archives that make up built-in modules to their constituent object files.
(This reintroduces the machinery that used to be used to generate
modules.builtin. I am not wedded to this mechanism: if someone can figure
out a mechanism that does not require recursing over the entire build tree,
I'm happy to use it, but I suspect that no such mechanism exists, since the
only place the mapping from object file to module exists is in the makefiles
themselves. Regardless, this is fairly cheap, adding less than a second to
a typical hot-cache build of a large enterprise kernel. This is true even
though it needs to be run unconditionally whenever the .config changes.)
Generate a linker map ".tmp_vmlinux.map", converting it into a new file
".tmp_vmlinux.ranges", mapping address ranges to object files.
Have scripts/kallsyms read these two new files to map symbol addresses
to built-in-module names and then write a mapping from object file
address to module name to the *.s output file.
The mapping consists of three new symbols:
- kallsyms_module_addresses/kallsyms_module_offsets encodes the
address/offset of each object file (derived from the linker map), in
exactly the same way as kallsyms_addresses/kallsyms_offsets does
for symbols. There is no size: instead, the object files are
assumed to tile the address space. (This is slightly more
space-efficient than using a size). Non-text-section addresses are
skipped: for now, all the users of this interface only need
module/non-module information for instruction pointer addresses, not
absolute-addressed symbols and the like. This restriction can
easily be lifted in future. (For why this isn't called
kallsyms_objfiles, see two entries below.)
- kallsyms_module_names encodes the name of each module in a modified
form of strtab: notably, if an object file appears in *multiple*
modules, all of which are built in, this is encoded via a zero byte,
a one-byte module count, then a series of that many null-terminated
strings. Object files which appear in only one module in such a
multi-module list are redirected to point inside that list, so that
modules which contain some object files shared with other modules
and some object files exclusive to them do not double up the module
name. (There might still be some duplication between multiple
multi-module lists, but this is an extremely marginal size effect,
and resolving it would require an extra layer of lookup tables which
would be even more complex, and incompressible to boot). As a
special case, the table starts with a single zero byte which does
*not* represent the start of a multi-module list.
- kallsyms_modules connects the two, encoding a table associated 1:1
with kallsyms_module_addresses / kallsyms_module_offsets, pointing
at an offset in kallsyms_module_names describing which module (or
modules, for a multi-module list) the code occupying this address
range is part of. If an address range is part of no module (always
built-in) it points at 0 (the null byte at the start of the
kallsyms_module_names list). Entries in this list that would
contain the same value are fused together, along with their
corresponding kallsyms_module_addresses/offsets entries. Due to
this fusion process, and because object files can be split apart into
multiple parts by the linker for hot/cold partitioning and the like,
entries in here do not really correspond to an object file, but more
to some contiguous range of addresses which are guaranteed to belong
to a single built-in module: so it seems best to call the symbols
kallsyms_modules*. (The generator has a data structure that does
correspond more closely to object files, from which kallsyms_modules
is generated, and that does use 'objfiles' terminology.)
Emit a new /proc/kallmodsyms file akin to /proc/kallsyms but with built-in
module names, using a new kallsyms_builtin_module_address() almost identical
to kallsyms_sym_address() to get the address corresponding to a given
.kallsyms_modules index, and a new get_builtin_module_idx quite similar to
get_symbol_pos to determine the index in the .kallsyms_modules array that
relates to a given address. Save a little time by exploiting the fact that
all callers will only ever traverse this list from start to end by allowing
them to pass in the previous index returned from this function as a hint:
thus very few bsearches are actually needed. (In theory this could change
to just walk straight down kallsyms_module_addresses/offsets and not bother
bsearching at all, but doing it this way is hardly any slower and much more
robust.)
The display process is complicated a little by the weird format of the
.kallsyms_module_names table: we have to look for multimodule entries
and print them as space-separated lists of module names.
Like /proc/kallsyms, the output is driven by address, so keeps the
curious property of /proc/kallsyms that symbols (like free_netbuf above)
may appear repeatedly with different addresses: but now, unlike in
/proc/kallsyms, we can see that those symbols appear repeatedly because
they are *different symbols* that ultimately belong to different
modules, all of which are built in to the kernel.
Those symbols that come from object files that are genuinely reused and
that appear only once in meory get a /proc/kallmodsyms line with
[multiple] [modules] on it: consumers will have to be ready to handle
such lines.
Also, kernel symbols for built-in modules will probably appear
interspersed with other symbols that are part of different modules and
non-modular always-built-in symbols, which, as usual, have no
square-bracketed module denotation.
As with /proc/kallsyms, non-root usage produces addresses that are
all zero.
I am open to changing the name and/or format of /proc/kallmodsyms, but felt
it best to split it out of /proc/kallsyms to avoid breaking existing
kallsyms parsers. Another possible syntax might be to use {curly brackets}
or something to denote built-in modules: it might be possible to drop
/proc/kallmodsyms and make /proc/kallsyms emit things in this format.
(Equally, now kallmodsyms data uses very little space, the
CONFIG_KALLMODSYMS config option might be something people don't want to
bother with.)
The size impact of all of this is minimal: for the case above, the
kallsyms2.S file went from 14107772 to 14137245 bytes, a gain of 29743
bytes, or 0.16%: vmlinux gained 10824 bytes, a gain of .017%, and the
compressed vmlinux only 7552 bytes, a gain of .08%: though the latter two
values are very configuration-dependent, they seem likely to scale roughly
with the kernel they are part of.
The last patch is an RFC to see if the idea is considered to be worth
spending more time optimizing the representation, which adds a new
kallsyms_sizes section that gives the size of each symbol, and uses this
info to report reliable symbol sizes to in-kernel users, and (via a new
column in /proc/kallmodsyms) to out-of-kernel users too. Having reliable
size info lets us identify inter-symbol gaps and sort symbols so that
start/end-marker and overlapping symbols are consistently ordered with
respect to the symbols they overlap. This certainly uses too much space
right now, 200KiB--1MiB: a better representation is certainly needed. One
that springs to mind is making the table sparse (pairs of symbol
index/size), and recording explicit sizes only for those symbols that
are not immediately followed by a subsequent symbol.
Differences from v6, November:
- Adjust for rewrite of confdata machinery in v5.16 (tristate.conf
handling is now more of a rewrite than a reversion)
Differences from v5, October:
- Fix generation of mapfiles under UML
Differences from v4, September:
- Fix building of tristate.conf if missing (usually concealed by the
syncconfig being run for other reasons, but not always: the kernel
test robot spotted it).
- Forward-port atop v5.15-rc3.
Differences from v3, August:
- Fix a kernel test robot warning in get_ksymbol_core (possible
use of uninitialized variable if kallmodsyms was wanted but
kallsyms_module_offsets was not present, which is most unlikely).
Differences from v2, June:
- Split the series up. In particular, the size impact of the table
optimizer is now quantified, and the symbol-size patch is split out and
turned into an RFC patch, with the /proc/kallmodsyms format before that
patch lacking a size column. Some speculation on how to make the symbol
sizes less space-wasteful is added (but not yet implemented).
- Drop a couple of unnecessary #includes, one unnecessarily exported
symbol, and a needless de-staticing.
Differences from v1, a year or so back:
- Move from a straight symbol->module name mapping to a mapping from
address-range to TU to module name list, bringing major space savings
over the previous approach and support for object files used by many
built-in modules at the same time, at the cost of a slightly more complex
approach (unavoidably so, I think, given that we have to merge three data
sources together: the link map in .tmp_vmlinux.ranges, the nm output on
stdin, and the mapping from TU name to module names in
modules_thick.builtin).
We do opportunistic merging of TUs if they cite the same modules and
reuse module names where doing so is simple: see optimize_obj2mod
below. I considered more extensive searches for mergeable entries and
more intricate encodings of the module name list allowing TUs that are
used by overlapping sets of modules to share their names, but such
modules are rare enough (and such overlapping sharings are vanishingly
rare) that it seemed likely to save only a few bytes at the cost of much
more hard-to-test code. This is doubly true now that the tables needed
are only a few kilobytes in length.
Signed-off-by: Nick Alcock <[email protected]>
Signed-off-by: Eugene Loh <[email protected]>
Reviewed-by: Kris Van Hees <[email protected]>
Nick Alcock (7):
kbuild: bring back tristate.conf
kbuild: add modules_thick.builtin
kbuild: generate an address ranges map at vmlinux link time
kallsyms: introduce sections needed to map symbols to built-in modules
kallsyms: optimize .kallsyms_modules*
kallsyms: add /proc/kallmodsyms
kallsyms: add reliable symbol size info
.gitignore | 1 +
Documentation/dontdiff | 1 +
Documentation/kbuild/kconfig.rst | 5 +
Makefile | 23 +-
include/linux/module.h | 7 +-
init/Kconfig | 8 +
kernel/kallsyms.c | 304 ++++++++++++---
kernel/module.c | 4 +-
scripts/Kbuild.include | 6 +
scripts/Makefile | 6 +
scripts/Makefile.modbuiltin | 56 +++
scripts/kallsyms.c | 642 ++++++++++++++++++++++++++++++-
scripts/kconfig/confdata.c | 36 +-
scripts/link-vmlinux.sh | 22 +-
scripts/modules_thick.c | 200 ++++++++++
scripts/modules_thick.h | 48 +++
16 files changed, 1298 insertions(+), 71 deletions(-)
create mode 100644 scripts/Makefile.modbuiltin
create mode 100644 scripts/modules_thick.c
create mode 100644 scripts/modules_thick.h
--
2.34.0.258.gc900572c39
This emits a new file, .tmp_vmlinux.ranges, which maps address
range/size pairs in vmlinux to the object files which make them up,
e.g., in part:
0x0000000000000000 0x30 arch/x86/kernel/cpu/common.o
0x0000000000001000 0x1000 arch/x86/events/intel/ds.o
0x0000000000002000 0x4000 arch/x86/kernel/irq_64.o
0x0000000000006000 0x5000 arch/x86/kernel/process.o
0x000000000000b000 0x1000 arch/x86/kernel/cpu/common.o
0x000000000000c000 0x5000 arch/x86/mm/cpu_entry_area.o
0x0000000000011000 0x10 arch/x86/kernel/espfix_64.o
0x0000000000011010 0x2 arch/x86/kernel/cpu/common.o
[...]
In my simple tests this seems to work with clang too, but if I'm not
sure how stable the format of clang's linker mapfiles is: if it turns
out not to work in some versions, the mapfile-massaging awk script added
here might need some adjustment.
Signed-off-by: Nick Alcock <[email protected]>
---
Notes:
v6: use ${wl} where appropriate to avoid failure on UML
scripts/link-vmlinux.sh | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 5cdd9bc5c385..5301f3e77116 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -196,7 +196,7 @@ vmlinux_link()
${ld} ${ldflags} -o ${output} \
${wl}--whole-archive ${objs} ${wl}--no-whole-archive \
${wl}--start-group ${libs} ${wl}--end-group \
- $@ ${ldlibs}
+ ${wl}-Map=.tmp_vmlinux.map $@ ${ldlibs}
}
# generate .BTF typeinfo from DWARF debuginfo
@@ -239,6 +239,19 @@ kallsyms()
{
local kallsymopt;
+ # read the linker map to identify ranges of addresses:
+ # - for each *.o file, report address, size, pathname
+ # - most such lines will have four fields
+ # - but sometimes there is a line break after the first field
+ # - start reading at "Linker script and memory map"
+ # - stop reading at ".brk"
+ ${AWK} '
+ /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
+ /^Linker script and memory map/ { start = 1 }
+ /^\.brk/ { exit(0) }
+ ' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
+
+ # get kallsyms options
if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
kallsymopt="${kallsymopt} --all-symbols"
fi
--
2.34.0.258.gc900572c39
Use the tables added in the previous commits to introduce a new
/proc/kallmodsyms, in which [module names] are also given for things
that *could* have been modular had they not been built in to the kernel.
So symbols that are part of, say, ext4 are reported as [ext4] even if
ext4 happens to be buiilt in to the kernel in this configuration.
Symbols that are part of multiple modules at the same time are shown
with [multiple] [module names]: consumers will have to be ready to
handle such lines. Also, kernel symbols for built-in modules will be
sorted by size, as usual for the core kernel, so will probably appear
interspersed with other symbols that are part of different modules and
non-modular always-built-in symbols, which, as usual, have no
square-bracketed module denotation. This differs from /proc/kallsyms,
where all symbols associated with a module will always appear in a group
(and randomly ordered).
The result looks like this:
ffffffff8b013d20 t pt_buffer_setup_aux
ffffffff8b014130 T intel_pt_interrupt
ffffffff8b014250 T cpu_emergency_stop_pt
ffffffff8b014280 t rapl_pmu_event_init [intel_rapl_perf]
ffffffff8b0143c0 t rapl_event_update [intel_rapl_perf]
ffffffff8b014480 t rapl_pmu_event_read [intel_rapl_perf]
ffffffff8b014490 t rapl_cpu_offline [intel_rapl_perf]
ffffffff8b014540 t __rapl_event_show [intel_rapl_perf]
ffffffff8b014570 t rapl_pmu_event_stop [intel_rapl_perf]
This is emitted even if intel_rapl_perf is built into the kernel (but,
obviously, not if it's not in the .config at all, or is in a module that
is not loaded).
Further down, we see what happens when object files are reused by
multiple modules, all of which are built in to the kernel:
ffffffffa22b3aa0 t handle_timestamp [liquidio]
ffffffffa22b3b50 t free_netbuf [liquidio]
ffffffffa22b3ba0 t liquidio_ptp_settime [liquidio]
ffffffffa22b3c30 t liquidio_ptp_adjfreq [liquidio]
[...]
ffffffffa22b9490 t lio_vf_rep_create [liquidio]
ffffffffa22b96a0 t lio_vf_rep_destroy [liquidio]
ffffffffa22b9810 t lio_vf_rep_modinit [liquidio]
ffffffffa22b9830 t lio_vf_rep_modexit [liquidio]
ffffffffa22b9850 t lio_ethtool_get_channels [liquidio] [liquidio_vf]
ffffffffa22b9930 t lio_ethtool_get_ringparam [liquidio] [liquidio_vf]
ffffffffa22b99d0 t lio_get_msglevel [liquidio] [liquidio_vf]
ffffffffa22b99f0 t lio_vf_set_msglevel [liquidio] [liquidio_vf]
ffffffffa22b9a10 t lio_get_pauseparam [liquidio] [liquidio_vf]
ffffffffa22b9a40 t lio_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba180 t lio_vf_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba4f0 t lio_get_regs_len [liquidio] [liquidio_vf]
ffffffffa22ba530 t lio_get_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba550 t lio_set_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba580 t lio_set_fecparam [liquidio] [liquidio_vf]
ffffffffa22ba5f0 t lio_get_fecparam [liquidio] [liquidio_vf]
[...]
ffffffffa22cbd10 t liquidio_set_mac [liquidio_vf]
ffffffffa22cbe90 t handle_timestamp [liquidio_vf]
ffffffffa22cbf40 t free_netbuf [liquidio_vf]
ffffffffa22cbf90 t octnet_link_status_change [liquidio_vf]
ffffffffa22cbfc0 t liquidio_vxlan_port_command.constprop.0 [liquidio_vf]
Like /proc/kallsyms, the output is driven by address, so keeps the
curious property of /proc/kallsyms that symbols (like free_netbuf above)
may appear repeatedly with different addresses: but now, unlike in
/proc/kallsyms, we can see that those symbols appear repeatedly because
they are *different symbols* that ultimately belong to different
modules, all of which are built in to the kernel.
As with /proc/kallsyms, non-root usage produces addresses that are
all zero.
I am not wedded to the name or format of /proc/kallmodsyms, but felt it
best to split it out of /proc/kallsyms to avoid breaking existing
kallsyms parsers. Another possible syntax might be to use {curly
brackets} or something to denote built-in modules: it might be possible
to drop /proc/kallmodsyms and make /proc/kallsyms emit things in this
format. (Equally, now kallmodsyms data uses very little space, the
CONFIG_KALLMODSYMS config option might be something people don't want to
bother with.)
Internally, this uses a new kallsyms_builtin_module_address() almost
identical to kallsyms_sym_address() to get the address corresponding to
a given .kallsyms_modules index, and a new get_builtin_module_idx quite
similar to get_symbol_pos to determine the index in the
.kallsyms_modules array that relates to a given address. Save a little
time by exploiting the fact that all callers will only ever traverse
this list from start to end by allowing them to pass in the previous
index returned from this function as a hint: thus very few bsearches are
actually needed. (In theory this could change to just walk straight
down kallsyms_module_addresses/offsets and not bother bsearching at all,
but doing it this way is hardly any slower and much more robust.)
The display process is complicated a little by the weird format of the
.kallsyms_module_names table: we have to look for multimodule entries
and print them as space-separated lists of module names.
Signed-off-by: Nick Alcock <[email protected]>
---
kernel/kallsyms.c | 242 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 227 insertions(+), 15 deletions(-)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 3011bc33a5ba..c81610ffc4ba 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -48,8 +48,18 @@ __section(".rodata") __attribute__((weak));
extern const unsigned long kallsyms_relative_base
__section(".rodata") __attribute__((weak));
+extern const unsigned long kallsyms_num_modules
+__section(".rodata") __attribute__((weak));
+
+extern const unsigned long kallsyms_module_names_len
+__section(".rodata") __attribute__((weak));
+
extern const char kallsyms_token_table[] __weak;
extern const u16 kallsyms_token_index[] __weak;
+extern const unsigned long kallsyms_module_addresses[] __weak;
+extern const int kallsyms_module_offsets[] __weak;
+extern const u32 kallsyms_modules[] __weak;
+extern const char kallsyms_module_names[] __weak;
extern const unsigned int kallsyms_markers[] __weak;
@@ -205,6 +215,25 @@ static bool cleanup_symbol_name(char *s)
return false;
}
+#ifdef CONFIG_KALLMODSYMS
+static unsigned long kallsyms_builtin_module_address(int idx)
+{
+ if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
+ return kallsyms_module_addresses[idx];
+
+ /* values are unsigned offsets if --absolute-percpu is not in effect */
+ if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
+ return kallsyms_relative_base + (u32)kallsyms_module_offsets[idx];
+
+ /* ...otherwise, positive offsets are absolute values */
+ if (kallsyms_module_offsets[idx] >= 0)
+ return kallsyms_module_offsets[idx];
+
+ /* ...and negative offsets are relative to kallsyms_relative_base - 1 */
+ return kallsyms_relative_base - 1 - kallsyms_module_offsets[idx];
+}
+#endif
+
/* Lookup the address for this symbol. Returns 0 if not found. */
unsigned long kallsyms_lookup_name(const char *name)
{
@@ -308,6 +337,54 @@ static unsigned long get_symbol_pos(unsigned long addr,
return low;
}
+/*
+ * The caller passes in an address, and we return an index to the corresponding
+ * builtin module index in .kallsyms_modules, or (unsigned long) -1 if none
+ * match.
+ *
+ * The hint_idx, if set, is a hint as to the possible return value, to handle
+ * the common case in which consecutive runs of addresses relate to the same
+ * index.
+ */
+#ifdef CONFIG_KALLMODSYMS
+static unsigned long get_builtin_module_idx(unsigned long addr, unsigned long hint_idx)
+{
+ unsigned long low, high, mid;
+
+ if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
+ BUG_ON(!kallsyms_module_addresses);
+ else
+ BUG_ON(!kallsyms_module_offsets);
+
+ /*
+ * Do a binary search on the sorted kallsyms_modules array. The last
+ * entry in this array indicates the end of the text section, not an
+ * object file.
+ */
+ low = 0;
+ high = kallsyms_num_modules - 1;
+
+ if (hint_idx > low && hint_idx < (high - 1) &&
+ addr >= kallsyms_builtin_module_address(hint_idx) &&
+ addr < kallsyms_builtin_module_address(hint_idx + 1))
+ return hint_idx;
+
+ if (addr >= kallsyms_builtin_module_address(low)
+ && addr < kallsyms_builtin_module_address(high)) {
+ while (high - low > 1) {
+ mid = low + (high - low) / 2;
+ if (kallsyms_builtin_module_address(mid) <= addr)
+ low = mid;
+ else
+ high = mid;
+ }
+ return low;
+ }
+
+ return (unsigned long) -1;
+}
+#endif
+
/*
* Lookup an address but don't bother to find any names.
*/
@@ -579,6 +656,8 @@ struct kallsym_iter {
char type;
char name[KSYM_NAME_LEN];
char module_name[MODULE_NAME_LEN];
+ const char *builtin_module_names;
+ unsigned long hint_builtin_module_idx;
int exported;
int show_value;
};
@@ -609,6 +688,8 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
&iter->value, &iter->type,
iter->name, iter->module_name,
&iter->exported);
+ iter->builtin_module_names = NULL;
+
if (ret < 0) {
iter->pos_mod_end = iter->pos;
return 0;
@@ -628,6 +709,8 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
&iter->value, &iter->type,
iter->name, iter->module_name,
&iter->exported);
+ iter->builtin_module_names = NULL;
+
if (ret < 0) {
iter->pos_ftrace_mod_end = iter->pos;
return 0;
@@ -642,6 +725,7 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
iter->exported = 0;
+ iter->builtin_module_names = NULL;
ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
&iter->value, &iter->type,
iter->name);
@@ -662,23 +746,53 @@ static int get_ksymbol_kprobe(struct kallsym_iter *iter)
{
strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
iter->exported = 0;
+ iter->builtin_module_names = NULL;
return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
&iter->value, &iter->type,
iter->name) < 0 ? 0 : 1;
}
/* Returns space to next name. */
-static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
+static unsigned long get_ksymbol_core(struct kallsym_iter *iter, int kallmodsyms)
{
unsigned off = iter->nameoff;
- iter->module_name[0] = '\0';
+ iter->exported = 0;
iter->value = kallsyms_sym_address(iter->pos);
iter->type = kallsyms_get_symbol_type(off);
+ iter->module_name[0] = '\0';
+ iter->builtin_module_names = NULL;
+
off = kallsyms_expand_symbol(off, iter->name, ARRAY_SIZE(iter->name));
+#ifdef CONFIG_KALLMODSYMS
+ if (kallmodsyms) {
+ unsigned long mod_idx = (unsigned long) -1;
+
+ if (kallsyms_module_offsets)
+ mod_idx =
+ get_builtin_module_idx(iter->value,
+ iter->hint_builtin_module_idx);
+ /*
+ * This is a built-in module iff the tables of built-in modules
+ * (address->module name mappings) and module names are known,
+ * and if the address was found there, and if the corresponding
+ * module index is nonzero. All other cases mean off the end of
+ * the binary or in a non-modular range in between one or more
+ * modules. (Also guard against a corrupt kallsyms_objfiles
+ * array pointing off the end of kallsyms_modules.)
+ */
+ if (kallsyms_modules != NULL && kallsyms_module_names != NULL &&
+ mod_idx != (unsigned long) -1 &&
+ kallsyms_modules[mod_idx] != 0 &&
+ kallsyms_modules[mod_idx] < kallsyms_module_names_len)
+ iter->builtin_module_names =
+ &kallsyms_module_names[kallsyms_modules[mod_idx]];
+ iter->hint_builtin_module_idx = mod_idx;
+ }
+#endif
return off - iter->nameoff;
}
@@ -724,7 +838,7 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
}
/* Returns false if pos at or past end of file. */
-static int update_iter(struct kallsym_iter *iter, loff_t pos)
+static int update_iter(struct kallsym_iter *iter, loff_t pos, int kallmodsyms)
{
/* Module symbols can be accessed randomly. */
if (pos >= kallsyms_num_syms)
@@ -734,7 +848,7 @@ static int update_iter(struct kallsym_iter *iter, loff_t pos)
if (pos != iter->pos)
reset_iter(iter, pos);
- iter->nameoff += get_ksymbol_core(iter);
+ iter->nameoff += get_ksymbol_core(iter, kallmodsyms);
iter->pos++;
return 1;
@@ -744,14 +858,14 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
{
(*pos)++;
- if (!update_iter(m->private, *pos))
+ if (!update_iter(m->private, *pos, 0))
return NULL;
return p;
}
static void *s_start(struct seq_file *m, loff_t *pos)
{
- if (!update_iter(m->private, *pos))
+ if (!update_iter(m->private, *pos, 0))
return NULL;
return m->private;
}
@@ -760,7 +874,7 @@ static void s_stop(struct seq_file *m, void *p)
{
}
-static int s_show(struct seq_file *m, void *p)
+static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
{
void *value;
struct kallsym_iter *iter = m->private;
@@ -771,23 +885,67 @@ static int s_show(struct seq_file *m, void *p)
value = iter->show_value ? (void *)iter->value : NULL;
- if (iter->module_name[0]) {
+ /*
+ * Real module, or built-in module and /proc/kallsyms being shown.
+ */
+ if (iter->module_name[0] != '\0' ||
+ (iter->builtin_module_names != NULL && kallmodsyms != 0)) {
char type;
/*
- * Label it "global" if it is exported,
- * "local" if not exported.
+ * Label it "global" if it is exported, "local" if not exported.
*/
type = iter->exported ? toupper(iter->type) :
tolower(iter->type);
- seq_printf(m, "%px %c %s\t[%s]\n", value,
- type, iter->name, iter->module_name);
+#ifdef CONFIG_KALLMODSYMS
+ if (kallmodsyms) {
+ /*
+ * /proc/kallmodsyms, built as a module.
+ */
+ if (iter->builtin_module_names == NULL)
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name,
+ iter->module_name);
+ /*
+ * /proc/kallmodsyms, single-module symbol.
+ */
+ else if (*iter->builtin_module_names != '\0')
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name,
+ iter->builtin_module_names);
+ /*
+ * /proc/kallmodsyms, multimodule symbol. Formatted
+ * as \0MODULE_COUNTmodule-1\0module-2\0, where
+ * MODULE_COUNT is a single byte, 2 or higher.
+ */
+ else {
+ size_t i = *(char *)(iter->builtin_module_names + 1);
+ const char *walk = iter->builtin_module_names + 2;
+
+ seq_printf(m, "%px %c %s\t[%s]", value,
+ type, iter->name, walk);
+
+ while (--i > 0) {
+ walk += strlen(walk) + 1;
+ seq_printf (m, " [%s]", walk);
+ }
+ seq_printf(m, "\n");
+ }
+ } else /* !kallmodsyms */
+#endif /* CONFIG_KALLMODSYMS */
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name, iter->module_name);
} else
seq_printf(m, "%px %c %s\n", value,
iter->type, iter->name);
return 0;
}
+static int s_show(struct seq_file *m, void *p)
+{
+ return s_show_internal(m, p, 0);
+}
+
static const struct seq_operations kallsyms_op = {
.start = s_start,
.next = s_next,
@@ -795,6 +953,35 @@ static const struct seq_operations kallsyms_op = {
.show = s_show
};
+#ifdef CONFIG_KALLMODSYMS
+static int s_mod_show(struct seq_file *m, void *p)
+{
+ return s_show_internal(m, p, 1);
+}
+static void *s_mod_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ (*pos)++;
+
+ if (!update_iter(m->private, *pos, 1))
+ return NULL;
+ return p;
+}
+
+static void *s_mod_start(struct seq_file *m, loff_t *pos)
+{
+ if (!update_iter(m->private, *pos, 1))
+ return NULL;
+ return m->private;
+}
+
+static const struct seq_operations kallmodsyms_op = {
+ .start = s_mod_start,
+ .next = s_mod_next,
+ .stop = s_stop,
+ .show = s_mod_show
+};
+#endif
+
static inline int kallsyms_for_perf(void)
{
#ifdef CONFIG_PERF_EVENTS
@@ -830,7 +1017,8 @@ bool kallsyms_show_value(const struct cred *cred)
}
}
-static int kallsyms_open(struct inode *inode, struct file *file)
+static int kallsyms_open_internal(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
/*
* We keep iterator in m->private, since normal case is to
@@ -838,7 +1026,7 @@ static int kallsyms_open(struct inode *inode, struct file *file)
* using get_symbol_offset for every symbol.
*/
struct kallsym_iter *iter;
- iter = __seq_open_private(file, &kallsyms_op, sizeof(*iter));
+ iter = __seq_open_private(file, ops, sizeof(*iter));
if (!iter)
return -ENOMEM;
reset_iter(iter, 0);
@@ -851,6 +1039,18 @@ static int kallsyms_open(struct inode *inode, struct file *file)
return 0;
}
+static int kallsyms_open(struct inode *inode, struct file *file)
+{
+ return kallsyms_open_internal(inode, file, &kallsyms_op);
+}
+
+#ifdef CONFIG_KALLMODSYMS
+static int kallmodsyms_open(struct inode *inode, struct file *file)
+{
+ return kallsyms_open_internal(inode, file, &kallmodsyms_op);
+}
+#endif
+
#ifdef CONFIG_KGDB_KDB
const char *kdb_walk_kallsyms(loff_t *pos)
{
@@ -861,7 +1061,7 @@ const char *kdb_walk_kallsyms(loff_t *pos)
reset_iter(&kdb_walk_kallsyms_iter, 0);
}
while (1) {
- if (!update_iter(&kdb_walk_kallsyms_iter, *pos))
+ if (!update_iter(&kdb_walk_kallsyms_iter, *pos, 0))
return NULL;
++*pos;
/* Some debugging symbols have no name. Ignore them. */
@@ -878,9 +1078,21 @@ static const struct proc_ops kallsyms_proc_ops = {
.proc_release = seq_release_private,
};
+#ifdef CONFIG_KALLMODSYMS
+static const struct proc_ops kallmodsyms_proc_ops = {
+ .proc_open = kallmodsyms_open,
+ .proc_read = seq_read,
+ .proc_lseek = seq_lseek,
+ .proc_release = seq_release_private,
+};
+#endif
+
static int __init kallsyms_init(void)
{
proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
+#ifdef CONFIG_KALLMODSYMS
+ proc_create("kallmodsyms", 0444, NULL, &kallmodsyms_proc_ops);
+#endif
return 0;
}
device_initcall(kallsyms_init);
--
2.34.0.258.gc900572c39
The existing mechanisms in get_symbol_pos to determine the end of a
symbol is an inaccurate heuristic. By passing nm -S output into
scripts/kallsyms.c and writing the symbol sizes to a new .kallsyms_sizes
section, we can get accurate sizes and sort the symbols accordingly,
reliably sorting zero-size symbols first (on the grounds that they are
usually e.g. section markers, and other symbols at the same address are
conceptually contained within them and should be sorted after them),
then larger symbols before smaller ones (so that overlapping symbols
print the containing symbol first, before its containees). We can
also use this to improve aliased symbol detection.
Emit the size info as an extra column in /proc/kallmodsyms (since its
format is not yet set in stone), and export it to iterator consumers.
The notable downside of this is that the new .kallsyms_sizes is pretty
big: a PTR per symbol, so vmlinux.o grows by almost a megabyte, though
it compresses pretty well, so bzImage grows by only a megabyte.
I'm not sure how to reduce this (perhaps using an array with elements
sized to be no larger than needed for the contents, so that almost
always two-byte entries would do? except that in my test kernel two
symbols are bigger than this: sme_workarea, at 400K, and __log_buf, at
100K: the latter seems often likely to be larger than 64K). A simple
scheme to reduce this would be to split the sizes array into several
arrays with differently-sized elements, and run-length-compress away the
zero bytes -- but that's not implemented yet, and might never be if
people think the whole idea of this is pointless.
In the absence of a way to shrink things, this should probably be hidden
behind a new config symbol if exposed at all, and kallmodsyms just shows
zero sizes if it's configured out (but this is enough of an RFC that
that's not yet done: possibly the benefits of this are too marginal to
be worth it, even if they do let kall(mod)syms consumers distinguish
symbols from padding, which was previously impossible).
Signed-off-by: Nick Alcock <[email protected]>
Signed-off-by: Eugene Loh <[email protected]>
---
include/linux/module.h | 7 ++--
kernel/kallsyms.c | 74 ++++++++++++++++++++++-------------------
kernel/module.c | 4 ++-
scripts/kallsyms.c | 29 +++++++++++++---
scripts/link-vmlinux.sh | 7 +++-
5 files changed, 77 insertions(+), 44 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index c9f1200b2312..b58f2de48957 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -590,7 +590,8 @@ struct module *find_module(const char *name);
/* Returns 0 and fills in value, defined and namebuf, or -ERANGE if
symnum out of range. */
int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
- char *name, char *module_name, int *exported);
+ char *name, char *module_name, unsigned long *size,
+ int *exported);
/* Look for this name: can be of form module:name. */
unsigned long module_kallsyms_lookup_name(const char *name);
@@ -768,8 +769,8 @@ static inline int lookup_module_symbol_attrs(unsigned long addr, unsigned long *
}
static inline int module_get_kallsym(unsigned int symnum, unsigned long *value,
- char *type, char *name,
- char *module_name, int *exported)
+ char *type, char *name, char *module_name,
+ unsigned long *size, int *exported)
{
return -ERANGE;
}
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index c81610ffc4ba..e234c659dfe9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -36,6 +36,7 @@
*/
extern const unsigned long kallsyms_addresses[] __weak;
extern const int kallsyms_offsets[] __weak;
+extern const unsigned long kallsyms_sizes[] __weak;
extern const u8 kallsyms_names[] __weak;
/*
@@ -277,12 +278,24 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
}
#endif /* CONFIG_LIVEPATCH */
+/*
+ * The caller passes in an address, and we return an index to the symbol --
+ * potentially also size and offset information.
+ * But an address might map to multiple symbols because:
+ * - some symbols might have zero size
+ * - some symbols might be aliases of one another
+ * - some symbols might span (encompass) others
+ * The symbols should already be ordered so that, for a particular address,
+ * we first have the zero-size ones, then the biggest, then the smallest.
+ * So we find the index by:
+ * - finding the last symbol with the target address
+ * - backing the index up so long as both the address and size are unchanged
+ */
static unsigned long get_symbol_pos(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset)
{
- unsigned long symbol_start = 0, symbol_end = 0;
- unsigned long i, low, high, mid;
+ unsigned long low, high, mid;
/* This kernel should never had been booted. */
if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
@@ -303,36 +316,17 @@ static unsigned long get_symbol_pos(unsigned long addr,
}
/*
- * Search for the first aliased symbol. Aliased
- * symbols are symbols with the same address.
+ * Search for the first aliased symbol.
*/
- while (low && kallsyms_sym_address(low-1) == kallsyms_sym_address(low))
+ while (low
+ && kallsyms_sym_address(low-1) == kallsyms_sym_address(low)
+ && kallsyms_sizes[low-1] == kallsyms_sizes[low])
--low;
- symbol_start = kallsyms_sym_address(low);
-
- /* Search for next non-aliased symbol. */
- for (i = low + 1; i < kallsyms_num_syms; i++) {
- if (kallsyms_sym_address(i) > symbol_start) {
- symbol_end = kallsyms_sym_address(i);
- break;
- }
- }
-
- /* If we found no next symbol, we use the end of the section. */
- if (!symbol_end) {
- if (is_kernel_inittext(addr))
- symbol_end = (unsigned long)_einittext;
- else if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
- symbol_end = (unsigned long)_end;
- else
- symbol_end = (unsigned long)_etext;
- }
-
if (symbolsize)
- *symbolsize = symbol_end - symbol_start;
+ *symbolsize = kallsyms_sizes[low];
if (offset)
- *offset = addr - symbol_start;
+ *offset = addr - kallsyms_sym_address(low);
return low;
}
@@ -653,6 +647,7 @@ struct kallsym_iter {
loff_t pos_bpf_end;
unsigned long value;
unsigned int nameoff; /* If iterating in core kernel symbols. */
+ unsigned long size;
char type;
char name[KSYM_NAME_LEN];
char module_name[MODULE_NAME_LEN];
@@ -687,7 +682,7 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
int ret = module_get_kallsym(iter->pos - iter->pos_arch_end,
&iter->value, &iter->type,
iter->name, iter->module_name,
- &iter->exported);
+ &iter->size, &iter->exported);
iter->builtin_module_names = NULL;
if (ret < 0) {
@@ -760,6 +755,7 @@ static unsigned long get_ksymbol_core(struct kallsym_iter *iter, int kallmodsyms
iter->exported = 0;
iter->value = kallsyms_sym_address(iter->pos);
+ iter->size = kallsyms_sizes[iter->pos];
iter->type = kallsyms_get_symbol_type(off);
iter->module_name[0] = '\0';
@@ -878,12 +874,14 @@ static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
{
void *value;
struct kallsym_iter *iter = m->private;
+ unsigned long size;
/* Some debugging symbols have no name. Ignore them. */
if (!iter->name[0])
return 0;
value = iter->show_value ? (void *)iter->value : NULL;
+ size = iter->show_value ? iter->size : 0;
/*
* Real module, or built-in module and /proc/kallsyms being shown.
@@ -903,15 +901,15 @@ static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
* /proc/kallmodsyms, built as a module.
*/
if (iter->builtin_module_names == NULL)
- seq_printf(m, "%px %c %s\t[%s]\n", value,
- type, iter->name,
+ seq_printf(m, "%px %lx %c %s\t[%s]\n", value,
+ size, type, iter->name,
iter->module_name);
/*
* /proc/kallmodsyms, single-module symbol.
*/
else if (*iter->builtin_module_names != '\0')
- seq_printf(m, "%px %c %s\t[%s]\n", value,
- type, iter->name,
+ seq_printf(m, "%px %lx %c %s\t[%s]\n", value,
+ size, type, iter->name,
iter->builtin_module_names);
/*
* /proc/kallmodsyms, multimodule symbol. Formatted
@@ -922,8 +920,8 @@ static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
size_t i = *(char *)(iter->builtin_module_names + 1);
const char *walk = iter->builtin_module_names + 2;
- seq_printf(m, "%px %c %s\t[%s]", value,
- type, iter->name, walk);
+ seq_printf(m, "%px %lx %c %s\t[%s]", value,
+ size, type, iter->name, walk);
while (--i > 0) {
walk += strlen(walk) + 1;
@@ -935,7 +933,13 @@ static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
#endif /* CONFIG_KALLMODSYMS */
seq_printf(m, "%px %c %s\t[%s]\n", value,
type, iter->name, iter->module_name);
- } else
+ /*
+ * Non-modular, /proc/kallmodsyms -> print size.
+ */
+ } else if (kallmodsyms)
+ seq_printf(m, "%px %lx %c %s\n", value, size,
+ iter->type, iter->name);
+ else
seq_printf(m, "%px %c %s\n", value,
iter->type, iter->name);
return 0;
diff --git a/kernel/module.c b/kernel/module.c
index 84a9141a5e15..311eaa8fd21c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4405,7 +4405,8 @@ int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size,
}
int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
- char *name, char *module_name, int *exported)
+ char *name, char *module_name, unsigned long *size,
+ int *exported)
{
struct module *mod;
@@ -4424,6 +4425,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);
strlcpy(module_name, mod->name, MODULE_NAME_LEN);
*exported = is_exported(name, *value, mod);
+ *size = kallsyms->symtab[symnum].st_size;
preempt_enable();
return 0;
}
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 93fdf0dcf587..fcb1d706809c 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,7 @@
* This software may be used and distributed according to the terms
* of the GNU General Public License, incorporated herein by reference.
*
- * Usage: nm -n vmlinux
+ * Usage: nm -n -S vmlinux
* | scripts/kallsyms [--all-symbols] [--absolute-percpu]
* [--base-relative] [--builtin=modules_thick.builtin]
* > symbols.S
@@ -38,6 +38,7 @@
struct sym_entry {
unsigned long long addr;
+ unsigned long long size;
unsigned int len;
unsigned int start_pos;
unsigned int percpu_absolute;
@@ -394,6 +395,7 @@ static bool is_ignored_symbol(const char *name, char type)
"kallsyms_addresses",
"kallsyms_offsets",
"kallsyms_relative_base",
+ "kallsyms_sizes",
"kallsyms_num_syms",
"kallsyms_num_modules",
"kallsyms_names",
@@ -507,10 +509,11 @@ static struct sym_entry *read_symbol(FILE *in)
unsigned long long addr;
unsigned int len;
struct sym_entry *sym;
- int rc;
+ int rc = 0;
+ unsigned long long size;
- rc = fscanf(in, "%llx %c %499s\n", &addr, &type, name);
- if (rc != 3) {
+ rc = fscanf(in, "%llx %llx %c %499s\n", &addr, &size, &type, name);
+ if (rc != 4) {
if (rc != EOF && fgets(name, 500, in) == NULL)
fprintf(stderr, "Read error or end of file.\n");
return NULL;
@@ -548,6 +551,7 @@ static struct sym_entry *read_symbol(FILE *in)
sym->sym[0] = type;
strcpy(sym_name(sym), name);
sym->percpu_absolute = 0;
+ sym->size = size;
return sym;
}
@@ -932,6 +936,11 @@ static void write_src(void)
printf("\n");
}
+ output_label("kallsyms_sizes");
+ for (i = 0; i < table_cnt; i++)
+ printf("\tPTR\t%#llx\n", table[i]->size);
+ printf("\n");
+
#ifdef CONFIG_KALLMODSYMS
output_kallmodsyms_modules();
output_kallmodsyms_objfiles();
@@ -1189,6 +1198,18 @@ static int compare_symbols(const void *a, const void *b)
if (sa->addr < sb->addr)
return -1;
+ /* zero-size markers before nonzero-size symbols */
+ if (sa->size > 0 && sb->size == 0)
+ return 1;
+ if (sa->size == 0 && sb->size > 0)
+ return -1;
+
+ /* sort by size (large size preceding symbols it encompasses) */
+ if (sa->size < sb->size)
+ return 1;
+ if (sa->size > sb->size)
+ return -1;
+
/* sort by "weakness" type */
wa = (sa->sym[0] == 'w') || (sa->sym[0] == 'W');
wb = (sb->sym[0] == 'w') || (sb->sym[0] == 'W');
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 5301f3e77116..55815937399b 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -265,7 +265,12 @@ kallsyms()
fi
info KSYMS ${2}
- ${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${2}
+ # "nm -S" does not print symbol size when size is 0
+ # Therefore use awk to regularize the data:
+ # - when there are only three fields, add an explicit "0"
+ # - when there are already four fields, pass through as is
+ ${NM} -n -S ${1} | ${AWK} 'NF==3 {print $1, 0, $2, $3}; NF==4' | \
+ scripts/kallsyms ${kallsymopt} > ${2}
}
# Perform one step in kallsyms generation, including temporary linking of
--
2.34.0.258.gc900572c39
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: a42fff4e29ff06e7c8e7f2e505787138b765976d ("[PATCH v7 7/7] kallsyms: add reliable symbol size info")
url: https://github.com/0day-ci/linux/commits/Nick-Alcock/kbuild-bring-back-tristate-conf/20211217-051935
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 136057256686de39cc3a07c2e39ef6bc43003ff6
patch link: https://lore.kernel.org/linux-modules/[email protected]
in testcase: leaking-addresses
version: leaking-addresses-x86_64-cf2a85e-1_20211208
with following parameters:
ucode: 0x28
on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 16G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
2021-12-18 07:48:44 ./leaking_addresses.pl --output-raw result/scan.out
2021-12-18 07:49:21 ./leaking_addresses.pl --input-raw result/scan.out --squash-by-filename
Total number of results from scan (incl dmesg): 331273
dmesg output:
[ 2.159320] mapped IOAPIC to ffffffffff5fb000 (fec00000)
Results squashed by filename (excl dmesg). Displaying [<number of results> <filename>], <example result>
...
[164752 kallmodsyms] ffffffff81000000 0 T _stext
...
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On Thu, Dec 16, 2021 at 08:19:12PM +0000, Nick Alcock wrote:
> /proc/kallsyms is very useful for tracers and other tools that need to
> map kernel symbols to addresses.
>
> It would be useful
It took me digging on archives to see to *who* this is useful to.
The short answer seeme to be dtrace. Can you work on getting use
of this for something (I don't know, maybe kernelshark?) that does
not taint the kernel? Last I checked using dtrace on linux taints the
kernel.
Without valid upstream users I see no need to add more complexity to the
kernel. And complexity added by tainting modules or not upstream modules
just implies maintaining something for someone who is not working
upstream. I don't want to add more code or "features" to create a
maintenance burden for code not upstream or code that taints the kernel.
module.c is already the second largest file on the kernel/ directory and
I want to ensure we keep it clean, not add fluff for speculated features
which no proper non-taining Linux tool is using.
Without a valid non-taining user being made very clear with a value-add,
I will have to ignore this.
Luis
On 12 Jan 2022, Luis Chamberlain stated:
> On Thu, Dec 16, 2021 at 08:19:12PM +0000, Nick Alcock wrote:
>> /proc/kallsyms is very useful for tracers and other tools that need to
>> map kernel symbols to addresses.
>>
>> It would be useful
>
> It took me digging on archives to see to *who* this is useful to.
> The short answer seeme to be dtrace. Can you work on getting use
> of this for something (I don't know, maybe kernelshark?) that does
> not taint the kernel? Last I checked using dtrace on linux taints the
> kernel.
It hasn't tainted the kernel for at least four years :) v1 (with a
kernel module) has been GPLv2 since 2017; v2 is pure-BPF and has no
DTrace-specific kernel modules, just using some new things we have to
add to the kernel, most of which seem plausibly useful to others too
(kallmodsyms, waitfd pro tem until pidfd supports ptracers, and CTF).
This is not a DTrace-specific feature in any case: all my submissions
have noted that it seems likely to be useful to anyone who wants a
stable reference to modules that doesn't change whenever the kernel
config changes, which probably means most tracers with support for
kernel modules which implement anything like a programming language.
> Without valid upstream users I see no need to add more complexity to the
> kernel. And complexity added by tainting modules or not upstream modules
We don't need any of those any more :) Even CTF is now generated by GCC
(once GCC 12 is released) and deduplicated by GNU ld: the CTF patch will
be only a few hundred lines long once GCC 12 is out and I drop the
DWARF->CTF translator.
> Without a valid non-taining user being made very clear with a value-add,
> I will have to ignore this.
I hope this gives you a reason to not ignore it! Have some links:
DTrace v1 (maintenance mode, fairly hefty GPL kernel module, UPL
userspace; fully-functional including fbt, kernel side will shrink):
https://github.com/oracle/dtrace-linux-kernel v1/5.15
https://github.com/oracle/dtrace-utils 1.x-branch-dev
DTrace v2 based on BPF, in progress, some features still missing (UPL
userspace and a few GPL kernel patches, including this one: needs a BPF
cross-compiler, which is a new GCC 12 target):
https://github.com/oracle/dtrace-linux-kernel v2/5.14.9
https://github.com/oracle/dtrace-utils 2.0-branch
(I'm going to respin all of these kernel branches against 5.17-rc once
the merge window closes, and bring the things both kernel trees have in
common into sync. I'll drop you a line once that's done.)
Config-wise both of these need kernels with CONFIG_KALLMODSYMS,
CONFIG_WAITFD and CONFIG_CTF turned on, and a kernel built with a 'make
ctf' done after 'make', and the kernel source tree available when DTrace
proper is built.
--
NULL && (void)
Cc'ing Steven Rostedt as well since he was part of the original discussion
on the merits of this kallsyms enhancement.
On Tue, Feb 01, 2022 at 07:09:23PM -0800, Luis Chamberlain wrote:
> CC'ing bfptrace folks for feedback.
>
> I'm pretty reluctant to merge any of this unless we have wide community
> desire to see this. I'm not quite seeing that yet.
>
> On Wed, Jan 12, 2022 at 04:59:45PM +0000, Nick Alcock wrote:
> > On 12 Jan 2022, Luis Chamberlain stated:
> >
> > > On Thu, Dec 16, 2021 at 08:19:12PM +0000, Nick Alcock wrote:
> > >> /proc/kallsyms is very useful for tracers and other tools that need to
> > >> map kernel symbols to addresses.
> > >>
> > >> It would be useful
> > >
> > > It took me digging on archives to see to *who* this is useful to.
> > > The short answer seeme to be dtrace. Can you work on getting use
> > > of this for something (I don't know, maybe kernelshark?) that does
> > > not taint the kernel? Last I checked using dtrace on linux taints the
> > > kernel.
> >
> > It hasn't tainted the kernel for at least four years :) v1 (with a
> > kernel module) has been GPLv2 since 2017; v2 is pure-BPF and has no
> > DTrace-specific kernel modules,
>
> I google for dtrace LInux and I end up here:
>
> https://www.oracle.com/linux/downloads/linux-dtrace.html
>
> It then has documentation dating back to year 2020, and I can't
> apt-get install any of these "dtrace-utils" or anything with dtrace.
>
> How do I get running with dtrace on debian? Typically this is a flag
> it has some funky license. You metioned dtrace is GPLv2 since 2017, does
> the same apply to the pure-BPF stuff?
>
> Note I see a bpftrace effort, can that be made to use your changes?
> At *least* I can install that on a regular distro. And it notes
> "The bpftrace language is inspired by awk and C, and predecessor tracers
> such as DTrace and SystemTap."
>
> I see on that page it says:
>
> Note that DTrace requires the Unbreakable Enterprise Kernel (UEK)
> release 5 or higher.
>
> > just using some new things we have to
> > add to the kernel, most of which seem plausibly useful to others too
> > (kallmodsyms, waitfd pro tem until pidfd supports ptracers, and CTF).
>
> All sounds nice, but I'd like to give this all a spin, but I can't
> find anything remotely close to anything sensible to try it out.
> I don't want to run any Oracle kernel. I want to run things upstream.
>
> > This is not a DTrace-specific feature in any case: all my submissions
> > have noted that it seems likely to be useful to anyone who wants a
> > stable reference to modules that doesn't change whenever the kernel
> > config changes, which probably means most tracers with support for
> > kernel modules which implement anything like a programming language.
>
> Great! But I'd like things to have tools
>
> > > Without valid upstream users I see no need to add more complexity to the
> > > kernel. And complexity added by tainting modules or not upstream modules
> >
> > We don't need any of those any more :) Even CTF is now generated by GCC
> > (once GCC 12 is released) and deduplicated by GNU ld: the CTF patch will
> > be only a few hundred lines long once GCC 12 is out and I drop the
> > DWARF->CTF translator.
>
> Great!
>
> > > Without a valid non-taining user being made very clear with a value-add,
> > > I will have to ignore this.
> >
> > I hope this gives you a reason to not ignore it! Have some links:
> >
> > DTrace v1 (maintenance mode, fairly hefty GPL kernel module, UPL
> > userspace; fully-functional including fbt, kernel side will shrink):
> >
> > https://github.com/oracle/dtrace-linux-kernel v1/5.15
> > https://github.com/oracle/dtrace-utils 1.x-branch-dev
> >
> > DTrace v2 based on BPF, in progress, some features still missing (UPL
> > userspace and a few GPL kernel patches, including this one: needs a BPF
> > cross-compiler, which is a new GCC 12 target):
> >
> > https://github.com/oracle/dtrace-linux-kernel v2/5.14.9
> > https://github.com/oracle/dtrace-utils 2.0-branch
>
> The "The Universal Permissive License (UPL)"? Really ? Anyway it seems
> to be at least GPL compatible. I'm curios why no distro has picked up
> any of this work?
>
> I don't see much traction based on what you have said on dtrace
> on anything other than Oracle Linux stuff, it would be nice if bpftrace
> folks were excited about your changes and we had support for that
> there.
>
> > (I'm going to respin all of these kernel branches against 5.17-rc once
> > the merge window closes, and bring the things both kernel trees have in
> > common into sync. I'll drop you a line once that's done.)
>
> Nice.
>
> > Config-wise both of these need kernels with CONFIG_KALLMODSYMS,
> > CONFIG_WAITFD and CONFIG_CTF turned on, and a kernel built with a 'make
> > ctf' done after 'make', and the kernel source tree available when DTrace
> > proper is built.
>
> Thanks for the heads up.
>
> Luis
CC'ing bfptrace folks for feedback.
I'm pretty reluctant to merge any of this unless we have wide community
desire to see this. I'm not quite seeing that yet.
On Wed, Jan 12, 2022 at 04:59:45PM +0000, Nick Alcock wrote:
> On 12 Jan 2022, Luis Chamberlain stated:
>
> > On Thu, Dec 16, 2021 at 08:19:12PM +0000, Nick Alcock wrote:
> >> /proc/kallsyms is very useful for tracers and other tools that need to
> >> map kernel symbols to addresses.
> >>
> >> It would be useful
> >
> > It took me digging on archives to see to *who* this is useful to.
> > The short answer seeme to be dtrace. Can you work on getting use
> > of this for something (I don't know, maybe kernelshark?) that does
> > not taint the kernel? Last I checked using dtrace on linux taints the
> > kernel.
>
> It hasn't tainted the kernel for at least four years :) v1 (with a
> kernel module) has been GPLv2 since 2017; v2 is pure-BPF and has no
> DTrace-specific kernel modules,
I google for dtrace LInux and I end up here:
https://www.oracle.com/linux/downloads/linux-dtrace.html
It then has documentation dating back to year 2020, and I can't
apt-get install any of these "dtrace-utils" or anything with dtrace.
How do I get running with dtrace on debian? Typically this is a flag
it has some funky license. You metioned dtrace is GPLv2 since 2017, does
the same apply to the pure-BPF stuff?
Note I see a bpftrace effort, can that be made to use your changes?
At *least* I can install that on a regular distro. And it notes
"The bpftrace language is inspired by awk and C, and predecessor tracers
such as DTrace and SystemTap."
I see on that page it says:
Note that DTrace requires the Unbreakable Enterprise Kernel (UEK)
release 5 or higher.
> just using some new things we have to
> add to the kernel, most of which seem plausibly useful to others too
> (kallmodsyms, waitfd pro tem until pidfd supports ptracers, and CTF).
All sounds nice, but I'd like to give this all a spin, but I can't
find anything remotely close to anything sensible to try it out.
I don't want to run any Oracle kernel. I want to run things upstream.
> This is not a DTrace-specific feature in any case: all my submissions
> have noted that it seems likely to be useful to anyone who wants a
> stable reference to modules that doesn't change whenever the kernel
> config changes, which probably means most tracers with support for
> kernel modules which implement anything like a programming language.
Great! But I'd like things to have tools
> > Without valid upstream users I see no need to add more complexity to the
> > kernel. And complexity added by tainting modules or not upstream modules
>
> We don't need any of those any more :) Even CTF is now generated by GCC
> (once GCC 12 is released) and deduplicated by GNU ld: the CTF patch will
> be only a few hundred lines long once GCC 12 is out and I drop the
> DWARF->CTF translator.
Great!
> > Without a valid non-taining user being made very clear with a value-add,
> > I will have to ignore this.
>
> I hope this gives you a reason to not ignore it! Have some links:
>
> DTrace v1 (maintenance mode, fairly hefty GPL kernel module, UPL
> userspace; fully-functional including fbt, kernel side will shrink):
>
> https://github.com/oracle/dtrace-linux-kernel v1/5.15
> https://github.com/oracle/dtrace-utils 1.x-branch-dev
>
> DTrace v2 based on BPF, in progress, some features still missing (UPL
> userspace and a few GPL kernel patches, including this one: needs a BPF
> cross-compiler, which is a new GCC 12 target):
>
> https://github.com/oracle/dtrace-linux-kernel v2/5.14.9
> https://github.com/oracle/dtrace-utils 2.0-branch
The "The Universal Permissive License (UPL)"? Really ? Anyway it seems
to be at least GPL compatible. I'm curios why no distro has picked up
any of this work?
I don't see much traction based on what you have said on dtrace
on anything other than Oracle Linux stuff, it would be nice if bpftrace
folks were excited about your changes and we had support for that
there.
> (I'm going to respin all of these kernel branches against 5.17-rc once
> the merge window closes, and bring the things both kernel trees have in
> common into sync. I'll drop you a line once that's done.)
Nice.
> Config-wise both of these need kernels with CONFIG_KALLMODSYMS,
> CONFIG_WAITFD and CONFIG_CTF turned on, and a kernel built with a 'make
> ctf' done after 'make', and the kernel source tree available when DTrace
> proper is built.
Thanks for the heads up.
Luis
Hi Luis, Nick,
On Tue, Feb 01, 2022 at 07:09:23PM -0800, Luis Chamberlain wrote:
[...]
>
> I don't see much traction based on what you have said on dtrace
> on anything other than Oracle Linux stuff, it would be nice if bpftrace
> folks were excited about your changes and we had support for that
> there.
I took a quick look at the v7 cover letter (I'll take a look at
discussion from previous versions later if I get time) and it's not
immediately obvious to me why a stable mapping is beneficial.
Nick, could you elaborate why it's beneficial for dtrace to have a
stable mapping?
For what it's worth, bpftrace uses /proc/kallsyms rather rarely.
bpftrace relies on perf_event_open()'s config1 parameter to resolve
kernel symbol name to address for kprobe attachment. /proc/kallsyms is
mostly used to resolve kaddr() calls in bpftrace scripts.
Kernel symbol size information would be useful, though. bpftrace
currently uses the vmlinux ELF to acquire that information.
[...]
Thanks,
Daniel
adding few more folks that might interested in this
On Tue, Feb 01, 2022 at 07:09:23PM -0800, Luis Chamberlain wrote:
> CC'ing bfptrace folks for feedback.
>
> I'm pretty reluctant to merge any of this unless we have wide community
> desire to see this. I'm not quite seeing that yet.
>
> On Wed, Jan 12, 2022 at 04:59:45PM +0000, Nick Alcock wrote:
> > On 12 Jan 2022, Luis Chamberlain stated:
> >
> > > On Thu, Dec 16, 2021 at 08:19:12PM +0000, Nick Alcock wrote:
> > >> /proc/kallsyms is very useful for tracers and other tools that need to
> > >> map kernel symbols to addresses.
> > >>
> > >> It would be useful
> > >
> > > It took me digging on archives to see to *who* this is useful to.
> > > The short answer seeme to be dtrace. Can you work on getting use
> > > of this for something (I don't know, maybe kernelshark?) that does
> > > not taint the kernel? Last I checked using dtrace on linux taints the
> > > kernel.
> >
> > It hasn't tainted the kernel for at least four years :) v1 (with a
> > kernel module) has been GPLv2 since 2017; v2 is pure-BPF and has no
> > DTrace-specific kernel modules,
>
> I google for dtrace LInux and I end up here:
>
> https://www.oracle.com/linux/downloads/linux-dtrace.html
>
> It then has documentation dating back to year 2020, and I can't
> apt-get install any of these "dtrace-utils" or anything with dtrace.
>
> How do I get running with dtrace on debian? Typically this is a flag
> it has some funky license. You metioned dtrace is GPLv2 since 2017, does
> the same apply to the pure-BPF stuff?
>
> Note I see a bpftrace effort, can that be made to use your changes?
> At *least* I can install that on a regular distro. And it notes
> "The bpftrace language is inspired by awk and C, and predecessor tracers
> such as DTrace and SystemTap."
>
> I see on that page it says:
>
> Note that DTrace requires the Unbreakable Enterprise Kernel (UEK)
> release 5 or higher.
>
> > just using some new things we have to
> > add to the kernel, most of which seem plausibly useful to others too
> > (kallmodsyms, waitfd pro tem until pidfd supports ptracers, and CTF).
>
> All sounds nice, but I'd like to give this all a spin, but I can't
> find anything remotely close to anything sensible to try it out.
> I don't want to run any Oracle kernel. I want to run things upstream.
>
> > This is not a DTrace-specific feature in any case: all my submissions
> > have noted that it seems likely to be useful to anyone who wants a
> > stable reference to modules that doesn't change whenever the kernel
> > config changes, which probably means most tracers with support for
> > kernel modules which implement anything like a programming language.
>
> Great! But I'd like things to have tools
>
> > > Without valid upstream users I see no need to add more complexity to the
> > > kernel. And complexity added by tainting modules or not upstream modules
> >
> > We don't need any of those any more :) Even CTF is now generated by GCC
> > (once GCC 12 is released) and deduplicated by GNU ld: the CTF patch will
> > be only a few hundred lines long once GCC 12 is out and I drop the
> > DWARF->CTF translator.
>
> Great!
>
> > > Without a valid non-taining user being made very clear with a value-add,
> > > I will have to ignore this.
> >
> > I hope this gives you a reason to not ignore it! Have some links:
> >
> > DTrace v1 (maintenance mode, fairly hefty GPL kernel module, UPL
> > userspace; fully-functional including fbt, kernel side will shrink):
> >
> > https://github.com/oracle/dtrace-linux-kernel v1/5.15
> > https://github.com/oracle/dtrace-utils 1.x-branch-dev
> >
> > DTrace v2 based on BPF, in progress, some features still missing (UPL
> > userspace and a few GPL kernel patches, including this one: needs a BPF
> > cross-compiler, which is a new GCC 12 target):
> >
> > https://github.com/oracle/dtrace-linux-kernel v2/5.14.9
> > https://github.com/oracle/dtrace-utils 2.0-branch
>
> The "The Universal Permissive License (UPL)"? Really ? Anyway it seems
> to be at least GPL compatible. I'm curios why no distro has picked up
> any of this work?
>
> I don't see much traction based on what you have said on dtrace
> on anything other than Oracle Linux stuff, it would be nice if bpftrace
> folks were excited about your changes and we had support for that
> there.
>
> > (I'm going to respin all of these kernel branches against 5.17-rc once
> > the merge window closes, and bring the things both kernel trees have in
> > common into sync. I'll drop you a line once that's done.)
>
> Nice.
>
> > Config-wise both of these need kernels with CONFIG_KALLMODSYMS,
> > CONFIG_WAITFD and CONFIG_CTF turned on, and a kernel built with a 'make
> > ctf' done after 'make', and the kernel source tree available when DTrace
> > proper is built.
>
> Thanks for the heads up.
>
> Luis
On 2 Feb 2022, Daniel Xu told this:
> I took a quick look at the v7 cover letter (I'll take a look at
> discussion from previous versions later if I get time) and it's not
> immediately obvious to me why a stable mapping is beneficial.
(FYI: I'm updating these patches for 5.17-rc2 right now, and will be
mailing them out once I've given them a spin. There are a couple of
bugfixes too.)
> Nick, could you elaborate why it's beneficial for dtrace to have a
> stable mapping?
Simply because when a symbol appears in both module names and the core
kernel, users can able to specify which symbol they mean via the
module`symbol syntax (the core kernel is of course called vmlinux`).
There are thousands of duplicates, so this can and does come up.
DTrace goes to some lengths to make D scripts portable not just across
.config's but across kernel releases (the whole translator mechanism
exists to translate kernel data structures into a release-independent
and as far as possible operating-system-independent form): it would be
rather silly if we could handle task_struct changing (which we can) but
not handle someone taking ext4.ko and changing .config so that it was
built in without having to review all their D scripts for references to
ext4. It would be even sillier if they suddenly found that a symbol
they were referencing in D scripts when ext4 was built as a module was
suddenly un-referenceable when it was built-in because there are already
symbols with that name in other built-in modules: in /proc/kallsyms you
can't tell such symbols apart: they differ only by address, while in
/proc/kallmodsyms you can at least tell that they came from different
modules when they were built into the core kernel.
(In fact this isn't even going far enough: in the long term, I'd like to
arrange to have *no duplicates at all* in /proc/kallmodsyms, but that
would mean that clashing symbols in different TUs in the same module
would need some sort of per-translation-unit markup, and I'm not sure
what syntax to use for that yet. It would be very cheap if we used the
same approach we're using here, literally one copy of each TU name and
one pointer for each.)
> For what it's worth, bpftrace uses /proc/kallsyms rather rarely.
> bpftrace relies on perf_event_open()'s config1 parameter to resolve
> kernel symbol name to address for kprobe attachment. /proc/kallsyms is
> mostly used to resolve kaddr() calls in bpftrace scripts.
>
> Kernel symbol size information would be useful, though. bpftrace
> currently uses the vmlinux ELF to acquire that information.
Yeah, that's a perfectly reasonable place to get that from. I'll have to
see if we can do the same thing, since courtesy of /proc/kall(mod)syms
we have access to the symbol index. This would obviate the symbol size
patch in this series, which is the only one with a nontrivial space cost
and the only one I'm unhappy with (it needs 4 bytes/symbol rather than a
few bytes per translation unit full of symbols).
I don't see any way to get the kallmodsyms per-builtin-module thing the
same way (also, it seems to me it would be much less convenient than
having it available directly in /proc almost for free).
--
NULL && (void)