What?
These commits set things up so we can start removing the sentinel elements.
They modify sysctl and net_sysctl internals so that registering a ctl_table
that contains a sentinel gives the same result as passing a table_size
calculated from the ctl_table array without a sentinel. We accomplish this by
introducing a table_size argument in the same place where procname is checked
for NULL. The idea is for it to keep stopping when it hits ->procname == NULL,
while the sentinel is still present. And when the sentinel is removed, it will
stop on the table_size (thx to [email protected] for the discussion
that led to this). This allows us to remove sentinels from one (or several)
files at a time.
These commits are part of a bigger set containing the removal of ctl_table sentinel
(https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3).
The idea is to make the review process easier by chunking the 65+ commits into
manageable pieces.
My idea is to send out one chunk at a time so it can be reviewed separately
from the others without the noise from parallel related sets. After this first
chunk will come 6 that remove the sentinel element from "arch/*, drivers/*,
fs/*, kernel/*, net/* and miscellaneous. And then a final one that removes the
->procname == NULL check. You can see all commits here
(https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3).
Why?
This is a preparation patch set that will make it easier for us to apply
subsequent patches that will remove the sentinel element (last empty element)
in the ctl_table arrays.
In itself, it does not remove any sentinels but it is needed to bring all the
advantages of the removal to fruition which is to help reduce the overall build
time size of the kernel and run time memory bloat by about ~64 bytes per
declared ctl_table array. It also ensures that future moves of sysctl arrays
out from kernel/sysctl.c to their own subsystem won't penalize in enlarging the
kernel build size or run time memory consumption. Without this patch set we
would have to put everything into one big commit making the review process that
much longer and harder for everyone.
Since it is so related to the removal of the sentinel element, its worth while
to give a bit of context on this:
* Good summary from Luis about why we want to remove the sentinels.
https://lore.kernel.org/all/[email protected]/
* This is a patch set that replaces register_sysctl_table with register_sysctl
https://lore.kernel.org/all/[email protected]/
* Patch set to deprecate register_sysctl_paths()
https://lore.kernel.org/all/[email protected]/
* Here there is an explicit expectation for the removal of the sentinel element.
https://lore.kernel.org/all/[email protected]
* The "ARRAY_SIZE" approach was mentioned (proposed?) in this thread
https://lore.kernel.org/all/[email protected]
Commits in this chunk:
* Preparation commits:
start : sysctl: Prefer ctl_table_header in proc_sysct
end : sysctl: Add size argument to init_header
These are preparation commits that make sure that we have the
ctl_table_header where we need the array size.
* Add size to __register_sysctl_table, __register_sysctl_init and register_sysctl
start : sysctl: Add a size arg to __register_sysctl_table
end : sysctl: Add size arg to __register_sysctl_init
Here we replace the existing register functions with macros that add the
ARRAY_SIZE automatically. Unfortunately these macros cannot be used for the
register calls that pass a pointer; in this situation we add register
functions with an table_size argument (thx to [email protected] for bringing
this to my attention)
* Add size to register_net_sysctl
start : sysctl: Add size to register_net_sysctl function
end : sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
register_net_sysctl is an indirection function to the sysctl registrations
and needed a several commits to add table_size to all its callers. We
temporarily use SIZE_MAX to avoid compiler warnings while we change to
register_net_sysctl to register_net_sysctl_sz; we remove it with the
penultimate patch of this set. Finally, we make sure to adjust the calculated
size every time there is a check for unprivileged users.
* Add size as additional stopping criteria
commit : sysctl: Use ctl_table_size as stopping criteria for list macro
We add table_size check in the main macro within proc_sysctl.c. This commit
allows the removal of the sentinel element by chunks.
Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Successfully ran this through 0-day
Size saving estimates:
A consequence of eventually removing all the sentinels (64 bytes per sentinel)
is the bytes we save. These are *not* numbers that we will get after this patch
set; these are the numbers that we will get after removing all the sentinels. I
included them here because they are relevant and to get an idea of just how
much memory we are talking about.
* bloat-o-meter:
The "yesall" configuration results save 9158 bytes (you can see the output here
https://lore.kernel.org/all/[email protected]/.
The "tiny" configuration + CONFIG_SYSCTL save 1215 bytes (you can see the
output here [2])
* memory usage:
As we no longer need the sentinel element within proc_sysctl.c, we save some
bytes in main memory as well. In my testing kernel I measured a difference of
6720 bytes. I include the way to measure this in [1]
Comments/feedback greatly appreciated
V3:
* Updated tags from mailing list
* Corrected an off-by-one error in
https://lore.kernel.org/all/[email protected]
* Fixed a bug where we would have erroneously registered ctl_table to
unprivileged ipv6 users
* Rebased on v6.5-rc5
* Rebase the bigger patchset located at
https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3 on
top of this version
V2:
* Dropped moving mpls_table up the af_mpls.c file. We don't need it any longer
as it is not really used before its current location.
* Added/Clarified the why in several commit messages that were missing it.
* Clarified the why in the cover letter to be "to make it easier to apply
subsequent patches that will remove the sentinels"
* Added documentation for table_size
* Added suggested by tags (Greg and Jani) to relevant commits
Best
Joel
[1]
To measure the in memory savings apply this patch on top of
https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V1
"
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 5f413bfd6271..9aa8374c0ef1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -975,6 +975,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
table[0].procname = new_name;
table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
init_header(&new->header, set->dir.header.root, set, node, table, 1);
+ printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table));
return new;
}
@@ -1202,6 +1203,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
head->ctl_table_size);
links->nreg = head->ctl_table_size;
+ printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table));
return links;
}
"
and then run the following bash script in the kernel:
accum=0
for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do
echo $n
accum=$(calc "$accum + $n")
done
echo $accum
[2]
bloat-o-meter with "tiny" config:
add/remove: 0/2 grow/shrink: 33/24 up/down: 470/-1685 (-1215)
Function old new delta
insert_header 831 966 +135
__register_sysctl_table 971 1092 +121
get_links 177 226 +49
put_links 167 186 +19
erase_header 55 66 +11
sysctl_init_bases 59 69 +10
setup_sysctl_set 65 73 +8
utsname_sysctl_init 26 31 +5
sld_mitigate_sysctl_init 33 38 +5
setup_userns_sysctls 158 163 +5
sched_rt_sysctl_init 33 38 +5
sched_fair_sysctl_init 33 38 +5
sched_dl_sysctl_init 33 38 +5
random_sysctls_init 33 38 +5
page_writeback_init 122 127 +5
oom_init 73 78 +5
kernel_panic_sysctls_init 33 38 +5
kernel_exit_sysctls_init 33 38 +5
init_umh_sysctls 33 38 +5
init_signal_sysctls 33 38 +5
init_pipe_fs 94 99 +5
init_fs_sysctls 33 38 +5
init_fs_stat_sysctls 33 38 +5
init_fs_namespace_sysctls 33 38 +5
init_fs_namei_sysctls 33 38 +5
init_fs_inode_sysctls 33 38 +5
init_fs_exec_sysctls 33 38 +5
init_fs_dcache_sysctls 33 38 +5
register_sysctl 22 25 +3
__register_sysctl_init 9 12 +3
user_namespace_sysctl_init 149 151 +2
sched_core_sysctl_init 38 40 +2
register_sysctl_mount_point 13 15 +2
vm_table 1344 1280 -64
vm_page_writeback_sysctls 512 448 -64
vm_oom_kill_table 256 192 -64
uts_kern_table 448 384 -64
usermodehelper_table 192 128 -64
user_table 576 512 -64
sld_sysctls 128 64 -64
signal_debug_table 128 64 -64
sched_rt_sysctls 256 192 -64
sched_fair_sysctls 128 64 -64
sched_dl_sysctls 192 128 -64
sched_core_sysctls 64 - -64
root_table 128 64 -64
random_table 448 384 -64
namei_sysctls 320 256 -64
kern_table 1792 1728 -64
kern_panic_table 128 64 -64
kern_exit_table 128 64 -64
inodes_sysctls 192 128 -64
fs_stat_sysctls 256 192 -64
fs_shared_sysctls 192 128 -64
fs_pipe_sysctls 256 192 -64
fs_namespace_sysctls 128 64 -64
fs_exec_sysctls 128 64 -64
fs_dcache_sysctls 128 64 -64
init_header 85 - -85
Total: Before=1877669, After=1876454, chg -0.06%
base: fdf0eaf11452
Joel Granados (14):
sysctl: Prefer ctl_table_header in proc_sysctl
sysctl: Use ctl_table_header in list_for_each_table_entry
sysctl: Add ctl_table_size to ctl_table_header
sysctl: Add size argument to init_header
sysctl: Add a size arg to __register_sysctl_table
sysctl: Add size to register_sysctl
sysctl: Add size arg to __register_sysctl_init
sysctl: Add size to register_net_sysctl function
ax.25: Update to register_net_sysctl_sz
netfilter: Update to register_net_sysctl_sz
networking: Update to register_net_sysctl_sz
vrf: Update to register_net_sysctl_sz
sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
sysctl: Use ctl_table_size as stopping criteria for list macro
arch/arm64/kernel/armv8_deprecated.c | 2 +-
arch/s390/appldata/appldata_base.c | 2 +-
drivers/net/vrf.c | 3 +-
fs/proc/proc_sysctl.c | 90 +++++++++++++------------
include/linux/sysctl.h | 31 +++++++--
include/net/ipv6.h | 2 +
include/net/net_namespace.h | 10 +--
ipc/ipc_sysctl.c | 4 +-
ipc/mq_sysctl.c | 4 +-
kernel/ucount.c | 5 +-
net/ax25/sysctl_net_ax25.c | 3 +-
net/bridge/br_netfilter_hooks.c | 3 +-
net/core/neighbour.c | 8 ++-
net/core/sysctl_net_core.c | 3 +-
net/ieee802154/6lowpan/reassembly.c | 8 ++-
net/ipv4/devinet.c | 3 +-
net/ipv4/ip_fragment.c | 3 +-
net/ipv4/route.c | 8 ++-
net/ipv4/sysctl_net_ipv4.c | 3 +-
net/ipv4/xfrm4_policy.c | 3 +-
net/ipv6/addrconf.c | 3 +-
net/ipv6/icmp.c | 5 ++
net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +-
net/ipv6/reassembly.c | 3 +-
net/ipv6/route.c | 9 +++
net/ipv6/sysctl_net_ipv6.c | 16 +++--
net/ipv6/xfrm6_policy.c | 3 +-
net/mpls/af_mpls.c | 6 +-
net/mptcp/ctrl.c | 3 +-
net/netfilter/ipvs/ip_vs_ctl.c | 8 ++-
net/netfilter/ipvs/ip_vs_lblc.c | 10 ++-
net/netfilter/ipvs/ip_vs_lblcr.c | 10 ++-
net/netfilter/nf_conntrack_standalone.c | 4 +-
net/netfilter/nf_log.c | 7 +-
net/rds/tcp.c | 3 +-
net/sctp/sysctl.c | 4 +-
net/smc/smc_sysctl.c | 3 +-
net/sysctl_net.c | 26 ++++---
net/unix/sysctl_net_unix.c | 3 +-
net/xfrm/xfrm_sysctl.c | 8 ++-
40 files changed, 222 insertions(+), 113 deletions(-)
--
2.30.2
This is a preparation commit to make it easy to remove the sentinel
elements (empty end markers) from the ctl_table arrays. It both allows
the systematic removal of the sentinels and adds the ctl_table_size
variable to the stopping criteria of the list_for_each_table_entry macro
that traverses all ctl_table arrays. Once all the sentinels are removed
by subsequent commits, ctl_table_size will become the only stopping
criteria in the macro. We don't actually remove any elements in this
commit, but it sets things up to for the removal process to take place.
By adding header->ctl_table_size as an additional stopping criteria for
the list_for_each_table_entry macro, it will execute until it finds an
"empty" ->procname or until the size runs out. Therefore if a ctl_table
array with a sentinel is passed its size will be too big (by one
element) but it will stop on the sentinel. On the other hand, if the
ctl_table array without a sentinel is passed its size will be just write
and there will be no need for a sentinel.
Signed-off-by: Joel Granados <[email protected]>
Suggested-by: Jani Nikula <[email protected]>
---
fs/proc/proc_sysctl.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 817bc51c58d8..504e847c2a3a 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -19,8 +19,9 @@
#include <linux/kmemleak.h>
#include "internal.h"
-#define list_for_each_table_entry(entry, header) \
- for ((entry) = (header->ctl_table); (entry)->procname; (entry)++)
+#define list_for_each_table_entry(entry, header) \
+ entry = header->ctl_table; \
+ for (size_t i = 0 ; i < header->ctl_table_size && entry->procname; ++i, entry++)
static const struct dentry_operations proc_sys_dentry_operations;
static const struct file_operations proc_sys_file_operations;
--
2.30.2
Move from register_net_sysctl to register_net_sysctl_sz for all the
netfilter related files. Do this while making sure to mirror the NULL
assignments with a table_size of zero for the unprivileged users.
We need to move to the new function in preparation for when we change
SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do
so would erroneously allow ARRAY_SIZE() to be called on a pointer. We
hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all
the relevant net sysctl registering functions to register_net_sysctl_sz
in subsequent commits.
Acked-by: Julian Anastasov <[email protected]>
Signed-off-by: Joel Granados <[email protected]>
---
net/bridge/br_netfilter_hooks.c | 3 ++-
net/ipv6/netfilter/nf_conntrack_reasm.c | 3 ++-
net/netfilter/ipvs/ip_vs_ctl.c | 8 ++++++--
net/netfilter/ipvs/ip_vs_lblc.c | 10 +++++++---
net/netfilter/ipvs/ip_vs_lblcr.c | 10 +++++++---
net/netfilter/nf_conntrack_standalone.c | 4 +++-
net/netfilter/nf_log.c | 7 ++++---
7 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 1a801fab9543..15186247b59a 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -1135,7 +1135,8 @@ static int br_netfilter_sysctl_init_net(struct net *net)
br_netfilter_sysctl_default(brnet);
- brnet->ctl_hdr = register_net_sysctl(net, "net/bridge", table);
+ brnet->ctl_hdr = register_net_sysctl_sz(net, "net/bridge", table,
+ ARRAY_SIZE(brnf_table));
if (!brnet->ctl_hdr) {
if (!net_eq(net, &init_net))
kfree(table);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index d13240f13607..b2dd48911c8d 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -87,7 +87,8 @@ static int nf_ct_frag6_sysctl_register(struct net *net)
table[2].data = &nf_frag->fqdir->high_thresh;
table[2].extra1 = &nf_frag->fqdir->low_thresh;
- hdr = register_net_sysctl(net, "net/netfilter", table);
+ hdr = register_net_sysctl_sz(net, "net/netfilter", table,
+ ARRAY_SIZE(nf_ct_frag6_sysctl_table));
if (hdr == NULL)
goto err_reg;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 62606fb44d02..8d69e4c2d822 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -4266,6 +4266,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
struct net *net = ipvs->net;
struct ctl_table *tbl;
int idx, ret;
+ size_t ctl_table_size = ARRAY_SIZE(vs_vars);
atomic_set(&ipvs->dropentry, 0);
spin_lock_init(&ipvs->dropentry_lock);
@@ -4282,8 +4283,10 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
return -ENOMEM;
/* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
+ if (net->user_ns != &init_user_ns) {
tbl[0].procname = NULL;
+ ctl_table_size = 0;
+ }
} else
tbl = vs_vars;
/* Initialize sysctl defaults */
@@ -4353,7 +4356,8 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
#endif
ret = -ENOMEM;
- ipvs->sysctl_hdr = register_net_sysctl(net, "net/ipv4/vs", tbl);
+ ipvs->sysctl_hdr = register_net_sysctl_sz(net, "net/ipv4/vs", tbl,
+ ctl_table_size);
if (!ipvs->sysctl_hdr)
goto err;
ipvs->sysctl_tbl = tbl;
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 1b87214d385e..cf78ba4ce5ff 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -550,6 +550,7 @@ static struct ip_vs_scheduler ip_vs_lblc_scheduler = {
static int __net_init __ip_vs_lblc_init(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ size_t vars_table_size = ARRAY_SIZE(vs_vars_table);
if (!ipvs)
return -ENOENT;
@@ -562,16 +563,19 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
return -ENOMEM;
/* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
+ if (net->user_ns != &init_user_ns) {
ipvs->lblc_ctl_table[0].procname = NULL;
+ vars_table_size = 0;
+ }
} else
ipvs->lblc_ctl_table = vs_vars_table;
ipvs->sysctl_lblc_expiration = DEFAULT_EXPIRATION;
ipvs->lblc_ctl_table[0].data = &ipvs->sysctl_lblc_expiration;
- ipvs->lblc_ctl_header =
- register_net_sysctl(net, "net/ipv4/vs", ipvs->lblc_ctl_table);
+ ipvs->lblc_ctl_header = register_net_sysctl_sz(net, "net/ipv4/vs",
+ ipvs->lblc_ctl_table,
+ vars_table_size);
if (!ipvs->lblc_ctl_header) {
if (!net_eq(net, &init_net))
kfree(ipvs->lblc_ctl_table);
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index ad8f5fea6d3a..9eddf118b40e 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -736,6 +736,7 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler =
static int __net_init __ip_vs_lblcr_init(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ size_t vars_table_size = ARRAY_SIZE(vs_vars_table);
if (!ipvs)
return -ENOENT;
@@ -748,15 +749,18 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
return -ENOMEM;
/* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
+ if (net->user_ns != &init_user_ns) {
ipvs->lblcr_ctl_table[0].procname = NULL;
+ vars_table_size = 0;
+ }
} else
ipvs->lblcr_ctl_table = vs_vars_table;
ipvs->sysctl_lblcr_expiration = DEFAULT_EXPIRATION;
ipvs->lblcr_ctl_table[0].data = &ipvs->sysctl_lblcr_expiration;
- ipvs->lblcr_ctl_header =
- register_net_sysctl(net, "net/ipv4/vs", ipvs->lblcr_ctl_table);
+ ipvs->lblcr_ctl_header = register_net_sysctl_sz(net, "net/ipv4/vs",
+ ipvs->lblcr_ctl_table,
+ vars_table_size);
if (!ipvs->lblcr_ctl_header) {
if (!net_eq(net, &init_net))
kfree(ipvs->lblcr_ctl_table);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 169e16fc2bce..0ee98ce5b816 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -1106,7 +1106,9 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
table[NF_SYSCTL_CT_BUCKETS].mode = 0444;
}
- cnet->sysctl_header = register_net_sysctl(net, "net/netfilter", table);
+ cnet->sysctl_header = register_net_sysctl_sz(net, "net/netfilter",
+ table,
+ ARRAY_SIZE(nf_ct_sysctl_table));
if (!cnet->sysctl_header)
goto out_unregister_netfilter;
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 8a29290149bd..8cc52d2bd31b 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -487,9 +487,10 @@ static int netfilter_log_sysctl_init(struct net *net)
for (i = NFPROTO_UNSPEC; i < NFPROTO_NUMPROTO; i++)
table[i].extra2 = net;
- net->nf.nf_log_dir_header = register_net_sysctl(net,
- "net/netfilter/nf_log",
- table);
+ net->nf.nf_log_dir_header = register_net_sysctl_sz(net,
+ "net/netfilter/nf_log",
+ table,
+ ARRAY_SIZE(nf_log_sysctl_table));
if (!net->nf.nf_log_dir_header)
goto err_reg;
--
2.30.2
Move from register_net_sysctl to register_net_sysctl_sz and pass the
ARRAY_SIZE of the ctl_table array that was used to create the table
variable. We need to move to the new function in preparation for when we
change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro.
Failing to do so would erroneously allow ARRAY_SIZE() to be called on a
pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have
migrated all the relevant net sysctl registering functions to
register_net_sysctl_sz in subsequent commits.
Signed-off-by: Joel Granados <[email protected]>
---
net/ax25/sysctl_net_ax25.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index 2154d004d3dc..db66e11e7fe8 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -159,7 +159,8 @@ int ax25_register_dev_sysctl(ax25_dev *ax25_dev)
table[k].data = &ax25_dev->values[k];
snprintf(path, sizeof(path), "net/ax25/%s", ax25_dev->dev->name);
- ax25_dev->sysheader = register_net_sysctl(&init_net, path, table);
+ ax25_dev->sysheader = register_net_sysctl_sz(&init_net, path, table,
+ ARRAY_SIZE(ax25_param_table));
if (!ax25_dev->sysheader) {
kfree(table);
return -ENOMEM;
--
2.30.2
The new ctl_table_size element will hold the size of the ctl_table
arrays contained in the ctl_table_header. This value should eventually
be passed by the callers to the sysctl register infrastructure. And
while this commit introduces the variable, it does not set nor use it
because that requires case by case considerations for each caller.
It provides two important things: (1) A place to put the
result of the ctl_table array calculation when it gets introduced for
each caller. And (2) the size that will be used as the additional
stopping criteria in the list_for_each_table_entry macro (to be added
when all the callers are migrated)
Signed-off-by: Joel Granados <[email protected]>
---
include/linux/sysctl.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 59d451f455bf..33252ad58ebe 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -159,12 +159,22 @@ struct ctl_node {
struct ctl_table_header *header;
};
-/* struct ctl_table_header is used to maintain dynamic lists of
- struct ctl_table trees. */
+/**
+ * struct ctl_table_header - maintains dynamic lists of struct ctl_table trees
+ * @ctl_table: pointer to the first element in ctl_table array
+ * @ctl_table_size: number of elements pointed by @ctl_table
+ * @used: The entry will never be touched when equal to 0.
+ * @count: Upped every time something is added to @inodes and downed every time
+ * something is removed from inodes
+ * @nreg: When nreg drops to 0 the ctl_table_header will be unregistered.
+ * @rcu: Delays the freeing of the inode. Introduced with "unfuck proc_sysctl ->d_compare()"
+ *
+ */
struct ctl_table_header {
union {
struct {
struct ctl_table *ctl_table;
+ int ctl_table_size;
int used;
int count;
int nreg;
--
2.30.2
We make these changes in order to prepare __register_sysctl_table and
its callers for when we remove the sentinel element (empty element at
the end of ctl_table arrays). We don't actually remove any sentinels in
this commit, but we *do* make sure to use ARRAY_SIZE so the table_size
is available when the removal occurs.
We add a table_size argument to __register_sysctl_table and adjust
callers, all of which pass ctl_table pointers and need an explicit call
to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in
order to forward the size of the array pointer received from the network
register calls.
The new table_size argument does not yet have any effect in the
init_header call which is still dependent on the sentinel's presence.
table_size *does* however drive the `kzalloc` allocation in
__register_sysctl_table with no adverse effects as the allocated memory
is either one element greater than the calculated ctl_table array (for
the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size
of the calculated ctl_table array (for the call from sysctl_net.c and
register_sysctl). This approach will allows us to "just" remove the
sentinel without further changes to __register_sysctl_table as
table_size will represent the exact size for all the callers at that
point.
Signed-off-by: Joel Granados <[email protected]>
---
fs/proc/proc_sysctl.c | 23 ++++++++++++-----------
include/linux/sysctl.h | 2 +-
ipc/ipc_sysctl.c | 4 +++-
ipc/mq_sysctl.c | 4 +++-
kernel/ucount.c | 3 ++-
net/sysctl_net.c | 8 +++++++-
6 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fa1438f1a355..b8dd78e344ff 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1312,6 +1312,7 @@ static struct ctl_dir *sysctl_mkdir_p(struct ctl_dir *dir, const char *path)
* should not be free'd after registration. So it should not be
* used on stack. It can either be a global or dynamically allocated
* by the caller and free'd later after sysctl unregistration.
+ * @table_size : The number of elements in table
*
* Register a sysctl table hierarchy. @table should be a filled in ctl_table
* array. A completely 0 filled entry terminates the table.
@@ -1354,27 +1355,20 @@ static struct ctl_dir *sysctl_mkdir_p(struct ctl_dir *dir, const char *path)
*/
struct ctl_table_header *__register_sysctl_table(
struct ctl_table_set *set,
- const char *path, struct ctl_table *table)
+ const char *path, struct ctl_table *table, size_t table_size)
{
struct ctl_table_root *root = set->dir.header.root;
struct ctl_table_header *header;
- struct ctl_table_header h_tmp;
struct ctl_dir *dir;
- struct ctl_table *entry;
struct ctl_node *node;
- int nr_entries = 0;
-
- h_tmp.ctl_table = table;
- list_for_each_table_entry(entry, (&h_tmp))
- nr_entries++;
header = kzalloc(sizeof(struct ctl_table_header) +
- sizeof(struct ctl_node)*nr_entries, GFP_KERNEL_ACCOUNT);
+ sizeof(struct ctl_node)*table_size, GFP_KERNEL_ACCOUNT);
if (!header)
return NULL;
node = (struct ctl_node *)(header + 1);
- init_header(header, root, set, node, table, nr_entries);
+ init_header(header, root, set, node, table, table_size);
if (sysctl_check_table(path, header))
goto fail;
@@ -1423,8 +1417,15 @@ struct ctl_table_header *__register_sysctl_table(
*/
struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table)
{
+ int count = 0;
+ struct ctl_table *entry;
+ struct ctl_table_header t_hdr;
+
+ t_hdr.ctl_table = table;
+ list_for_each_table_entry(entry, (&t_hdr))
+ count++;
return __register_sysctl_table(&sysctl_table_root.default_set,
- path, table);
+ path, table, count);
}
EXPORT_SYMBOL(register_sysctl);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 33252ad58ebe..0495c858989f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -226,7 +226,7 @@ extern void retire_sysctl_set(struct ctl_table_set *set);
struct ctl_table_header *__register_sysctl_table(
struct ctl_table_set *set,
- const char *path, struct ctl_table *table);
+ const char *path, struct ctl_table *table, size_t table_size);
struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table);
void unregister_sysctl_table(struct ctl_table_header * table);
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index ef313ecfb53a..8c62e443f78b 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -259,7 +259,9 @@ bool setup_ipc_sysctls(struct ipc_namespace *ns)
tbl[i].data = NULL;
}
- ns->ipc_sysctls = __register_sysctl_table(&ns->ipc_set, "kernel", tbl);
+ ns->ipc_sysctls = __register_sysctl_table(&ns->ipc_set,
+ "kernel", tbl,
+ ARRAY_SIZE(ipc_sysctls));
}
if (!ns->ipc_sysctls) {
kfree(tbl);
diff --git a/ipc/mq_sysctl.c b/ipc/mq_sysctl.c
index fbf6a8b93a26..ebb5ed81c151 100644
--- a/ipc/mq_sysctl.c
+++ b/ipc/mq_sysctl.c
@@ -109,7 +109,9 @@ bool setup_mq_sysctls(struct ipc_namespace *ns)
tbl[i].data = NULL;
}
- ns->mq_sysctls = __register_sysctl_table(&ns->mq_set, "fs/mqueue", tbl);
+ ns->mq_sysctls = __register_sysctl_table(&ns->mq_set,
+ "fs/mqueue", tbl,
+ ARRAY_SIZE(mq_sysctls));
}
if (!ns->mq_sysctls) {
kfree(tbl);
diff --git a/kernel/ucount.c b/kernel/ucount.c
index ee8e57fd6f90..2b80264bb79f 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -104,7 +104,8 @@ bool setup_userns_sysctls(struct user_namespace *ns)
for (i = 0; i < UCOUNT_COUNTS; i++) {
tbl[i].data = &ns->ucount_max[i];
}
- ns->sysctls = __register_sysctl_table(&ns->set, "user", tbl);
+ ns->sysctls = __register_sysctl_table(&ns->set, "user", tbl,
+ ARRAY_SIZE(user_table));
}
if (!ns->sysctls) {
kfree(tbl);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 4b45ed631eb8..8ee4b74bc009 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -163,10 +163,16 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path,
struct ctl_table_header *register_net_sysctl(struct net *net,
const char *path, struct ctl_table *table)
{
+ int count = 0;
+ struct ctl_table *entry;
+
if (!net_eq(net, &init_net))
ensure_safe_net_sysctl(net, path, table);
- return __register_sysctl_table(&net->sysctls, path, table);
+ for (entry = table; entry->procname; entry++)
+ count++;
+
+ return __register_sysctl_table(&net->sysctls, path, table, count);
}
EXPORT_SYMBOL_GPL(register_net_sysctl);
--
2.30.2
Move from register_net_sysctl to register_net_sysctl_sz and pass the
ARRAY_SIZE of the ctl_table array that was used to create the table
variable. We need to move to the new function in preparation for when we
change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro.
Failing to do so would erroneously allow ARRAY_SIZE() to be called on a
pointer. The actual change from SIZE_MAX to ARRAY_SIZE will take place
in subsequent commits.
Signed-off-by: Joel Granados <[email protected]>
---
drivers/net/vrf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 6043e63b42f9..6801f15ac609 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1979,7 +1979,8 @@ static int vrf_netns_init_sysctl(struct net *net, struct netns_vrf *nn_vrf)
/* init the extra1 parameter with the reference to current netns */
table[0].extra1 = net;
- nn_vrf->ctl_hdr = register_net_sysctl(net, "net/vrf", table);
+ nn_vrf->ctl_hdr = register_net_sysctl_sz(net, "net/vrf", table,
+ ARRAY_SIZE(vrf_table));
if (!nn_vrf->ctl_hdr) {
kfree(table);
return -ENOMEM;
--
2.30.2
Replace SIZE_MAX with ARRAY_SIZE in the register_net_sysctl macro. Now
that all the callers to register_net_sysctl are actual arrays, we can
call ARRAY_SIZE() without any compilation warnings. By calculating the
actual array size, this commit is making sure that register_net_sysctl
and all its callers forward the table_size into sysctl backend for when
the sentinel elements in the ctl_table arrays (last empty markers) are
removed. Without it the removal would fail lacking a stopping criteria
for traversing the ctl_table arrays.
Stopping condition continues to be based on both table size and the
procname null test. This is needed in order to allow for the systematic
removal al the sentinel element in subsequent commits: Before removing
sentinel the stopping criteria will be the last null element. When the
sentinel is removed then the (correct) size will take over.
Signed-off-by: Joel Granados <[email protected]>
Suggested-by: Jani Nikula <[email protected]>
---
include/net/net_namespace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e4e5fe75a281..75dba309e043 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -470,7 +470,7 @@ void unregister_pernet_device(struct pernet_operations *);
struct ctl_table;
#define register_net_sysctl(net, path, table) \
- register_net_sysctl_sz(net, path, table, SIZE_MAX)
+ register_net_sysctl_sz(net, path, table, ARRAY_SIZE(table))
#ifdef CONFIG_SYSCTL
int net_sysctl_init(void);
struct ctl_table_header *register_net_sysctl_sz(struct net *net, const char *path,
--
2.30.2
Move from register_net_sysctl to register_net_sysctl_sz for all the
networking related files. Do this while making sure to mirror the NULL
assignments with a table_size of zero for the unprivileged users.
We need to move to the new function in preparation for when we change
SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do
so would erroneously allow ARRAY_SIZE() to be called on a pointer. We
hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all
the relevant net sysctl registering functions to register_net_sysctl_sz
in subsequent commits.
An additional size function was added to the following files in order to
calculate the size of an array that is defined in another file:
include/net/ipv6.h
net/ipv6/icmp.c
net/ipv6/route.c
net/ipv6/sysctl_net_ipv6.c
Signed-off-by: Joel Granados <[email protected]>
---
include/net/ipv6.h | 2 ++
net/core/neighbour.c | 8 ++++++--
net/core/sysctl_net_core.c | 3 ++-
net/ieee802154/6lowpan/reassembly.c | 8 ++++++--
net/ipv4/devinet.c | 3 ++-
net/ipv4/ip_fragment.c | 3 ++-
net/ipv4/route.c | 8 ++++++--
net/ipv4/sysctl_net_ipv4.c | 3 ++-
net/ipv4/xfrm4_policy.c | 3 ++-
net/ipv6/addrconf.c | 3 ++-
net/ipv6/icmp.c | 5 +++++
net/ipv6/reassembly.c | 3 ++-
net/ipv6/route.c | 9 +++++++++
net/ipv6/sysctl_net_ipv6.c | 16 +++++++++++-----
net/ipv6/xfrm6_policy.c | 3 ++-
net/mpls/af_mpls.c | 6 ++++--
net/mptcp/ctrl.c | 3 ++-
net/rds/tcp.c | 3 ++-
net/sctp/sysctl.c | 4 +++-
net/smc/smc_sysctl.c | 3 ++-
net/unix/sysctl_net_unix.c | 3 ++-
net/xfrm/xfrm_sysctl.c | 8 ++++++--
22 files changed, 82 insertions(+), 28 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 2acc4c808d45..a704831753ff 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1270,7 +1270,9 @@ static inline int snmp6_unregister_dev(struct inet6_dev *idev) { return 0; }
#ifdef CONFIG_SYSCTL
struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
+size_t ipv6_icmp_sysctl_table_size(void);
struct ctl_table *ipv6_route_sysctl_init(struct net *net);
+size_t ipv6_route_sysctl_table_size(struct net *net);
int ipv6_sysctl_register(void);
void ipv6_sysctl_unregister(void);
#endif
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ddd0f32de20e..6b76cd103195 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3779,6 +3779,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
const char *dev_name_source;
char neigh_path[ sizeof("net//neigh/") + IFNAMSIZ + IFNAMSIZ ];
char *p_name;
+ size_t neigh_vars_size;
t = kmemdup(&neigh_sysctl_template, sizeof(*t), GFP_KERNEL_ACCOUNT);
if (!t)
@@ -3790,11 +3791,13 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
t->neigh_vars[i].extra2 = p;
}
+ neigh_vars_size = ARRAY_SIZE(t->neigh_vars);
if (dev) {
dev_name_source = dev->name;
/* Terminate the table early */
memset(&t->neigh_vars[NEIGH_VAR_GC_INTERVAL], 0,
sizeof(t->neigh_vars[NEIGH_VAR_GC_INTERVAL]));
+ neigh_vars_size = NEIGH_VAR_BASE_REACHABLE_TIME_MS + 1;
} else {
struct neigh_table *tbl = p->tbl;
dev_name_source = "default";
@@ -3841,8 +3844,9 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
snprintf(neigh_path, sizeof(neigh_path), "net/%s/neigh/%s",
p_name, dev_name_source);
- t->sysctl_header =
- register_net_sysctl(neigh_parms_net(p), neigh_path, t->neigh_vars);
+ t->sysctl_header = register_net_sysctl_sz(neigh_parms_net(p),
+ neigh_path, t->neigh_vars,
+ neigh_vars_size);
if (!t->sysctl_header)
goto free;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 782273bb93c2..03f1edb948d7 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -712,7 +712,8 @@ static __net_init int sysctl_core_net_init(struct net *net)
tmp->data += (char *)net - (char *)&init_net;
}
- net->core.sysctl_hdr = register_net_sysctl(net, "net/core", tbl);
+ net->core.sysctl_hdr = register_net_sysctl_sz(net, "net/core", tbl,
+ ARRAY_SIZE(netns_core_table));
if (net->core.sysctl_hdr == NULL)
goto err_reg;
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index a91283d1e5bf..6dd960ec558c 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -360,6 +360,7 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
struct ctl_table_header *hdr;
struct netns_ieee802154_lowpan *ieee802154_lowpan =
net_ieee802154_lowpan(net);
+ size_t table_size = ARRAY_SIZE(lowpan_frags_ns_ctl_table);
table = lowpan_frags_ns_ctl_table;
if (!net_eq(net, &init_net)) {
@@ -369,8 +370,10 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
goto err_alloc;
/* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
+ if (net->user_ns != &init_user_ns) {
table[0].procname = NULL;
+ table_size = 0;
+ }
}
table[0].data = &ieee802154_lowpan->fqdir->high_thresh;
@@ -379,7 +382,8 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
table[1].extra2 = &ieee802154_lowpan->fqdir->high_thresh;
table[2].data = &ieee802154_lowpan->fqdir->timeout;
- hdr = register_net_sysctl(net, "net/ieee802154/6lowpan", table);
+ hdr = register_net_sysctl_sz(net, "net/ieee802154/6lowpan", table,
+ table_size);
if (hdr == NULL)
goto err_reg;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 5deac0517ef7..89087844ea6e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2720,7 +2720,8 @@ static __net_init int devinet_init_net(struct net *net)
goto err_reg_dflt;
err = -ENOMEM;
- forw_hdr = register_net_sysctl(net, "net/ipv4", tbl);
+ forw_hdr = register_net_sysctl_sz(net, "net/ipv4", tbl,
+ ARRAY_SIZE(ctl_forward_entry));
if (!forw_hdr)
goto err_reg_ctl;
net->ipv4.forw_hdr = forw_hdr;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 69c00ffdcf3e..a4941f53b523 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -615,7 +615,8 @@ static int __net_init ip4_frags_ns_ctl_register(struct net *net)
table[2].data = &net->ipv4.fqdir->timeout;
table[3].data = &net->ipv4.fqdir->max_dist;
- hdr = register_net_sysctl(net, "net/ipv4", table);
+ hdr = register_net_sysctl_sz(net, "net/ipv4", table,
+ ARRAY_SIZE(ip4_frags_ns_ctl_table));
if (!hdr)
goto err_reg;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 92fede388d52..24f55dbb8901 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3592,6 +3592,7 @@ static struct ctl_table ipv4_route_netns_table[] = {
static __net_init int sysctl_route_net_init(struct net *net)
{
struct ctl_table *tbl;
+ size_t table_size = ARRAY_SIZE(ipv4_route_netns_table);
tbl = ipv4_route_netns_table;
if (!net_eq(net, &init_net)) {
@@ -3603,8 +3604,10 @@ static __net_init int sysctl_route_net_init(struct net *net)
/* Don't export non-whitelisted sysctls to unprivileged users */
if (net->user_ns != &init_user_ns) {
- if (tbl[0].procname != ipv4_route_flush_procname)
+ if (tbl[0].procname != ipv4_route_flush_procname) {
tbl[0].procname = NULL;
+ table_size = 0;
+ }
}
/* Update the variables to point into the current struct net
@@ -3615,7 +3618,8 @@ static __net_init int sysctl_route_net_init(struct net *net)
}
tbl[0].extra1 = net;
- net->ipv4.route_hdr = register_net_sysctl(net, "net/ipv4/route", tbl);
+ net->ipv4.route_hdr = register_net_sysctl_sz(net, "net/ipv4/route",
+ tbl, table_size);
if (!net->ipv4.route_hdr)
goto err_reg;
return 0;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 2afb0870648b..6ac890b4073f 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1519,7 +1519,8 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
}
}
- net->ipv4.ipv4_hdr = register_net_sysctl(net, "net/ipv4", table);
+ net->ipv4.ipv4_hdr = register_net_sysctl_sz(net, "net/ipv4", table,
+ ARRAY_SIZE(ipv4_net_table));
if (!net->ipv4.ipv4_hdr)
goto err_reg;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 9403bbaf1b61..57ea394ffa8c 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -178,7 +178,8 @@ static __net_init int xfrm4_net_sysctl_init(struct net *net)
table[0].data = &net->xfrm.xfrm4_dst_ops.gc_thresh;
}
- hdr = register_net_sysctl(net, "net/ipv4", table);
+ hdr = register_net_sysctl_sz(net, "net/ipv4", table,
+ ARRAY_SIZE(xfrm4_policy_table));
if (!hdr)
goto err_reg;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 94cec2075eee..2426cf3255ea 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7091,7 +7091,8 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
- p->sysctl_header = register_net_sysctl(net, path, table);
+ p->sysctl_header = register_net_sysctl_sz(net, path, table,
+ ARRAY_SIZE(addrconf_sysctl));
if (!p->sysctl_header)
goto free;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 65fa5014bc85..a76b01b41b57 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -1229,4 +1229,9 @@ struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
}
return table;
}
+
+size_t ipv6_icmp_sysctl_table_size(void)
+{
+ return ARRAY_SIZE(ipv6_icmp_table_template);
+}
#endif
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 5bc8a28e67f9..5ebc47da1000 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -470,7 +470,8 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
table[1].extra2 = &net->ipv6.fqdir->high_thresh;
table[2].data = &net->ipv6.fqdir->timeout;
- hdr = register_net_sysctl(net, "net/ipv6", table);
+ hdr = register_net_sysctl_sz(net, "net/ipv6", table,
+ ARRAY_SIZE(ip6_frags_ns_ctl_table));
if (!hdr)
goto err_reg;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 56a55585eb79..6a13609e1427 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6456,6 +6456,15 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
return table;
}
+
+size_t ipv6_route_sysctl_table_size(struct net *net)
+{
+ /* Don't export sysctls to unprivileged users */
+ if (net->user_ns != &init_user_ns)
+ return 1;
+
+ return ARRAY_SIZE(ipv6_route_table_template);
+}
#endif
static int __net_init ip6_route_net_init(struct net *net)
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 94a0a294c6a1..888676163e90 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -275,17 +275,23 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
if (!ipv6_icmp_table)
goto out_ipv6_route_table;
- net->ipv6.sysctl.hdr = register_net_sysctl(net, "net/ipv6", ipv6_table);
+ net->ipv6.sysctl.hdr = register_net_sysctl_sz(net, "net/ipv6",
+ ipv6_table,
+ ARRAY_SIZE(ipv6_table_template));
if (!net->ipv6.sysctl.hdr)
goto out_ipv6_icmp_table;
- net->ipv6.sysctl.route_hdr =
- register_net_sysctl(net, "net/ipv6/route", ipv6_route_table);
+ net->ipv6.sysctl.route_hdr = register_net_sysctl_sz(net,
+ "net/ipv6/route",
+ ipv6_route_table,
+ ipv6_route_sysctl_table_size(net));
if (!net->ipv6.sysctl.route_hdr)
goto out_unregister_ipv6_table;
- net->ipv6.sysctl.icmp_hdr =
- register_net_sysctl(net, "net/ipv6/icmp", ipv6_icmp_table);
+ net->ipv6.sysctl.icmp_hdr = register_net_sysctl_sz(net,
+ "net/ipv6/icmp",
+ ipv6_icmp_table,
+ ipv6_icmp_sysctl_table_size());
if (!net->ipv6.sysctl.icmp_hdr)
goto out_unregister_route_table;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index eecc5e59da17..8f931e46b460 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -205,7 +205,8 @@ static int __net_init xfrm6_net_sysctl_init(struct net *net)
table[0].data = &net->xfrm.xfrm6_dst_ops.gc_thresh;
}
- hdr = register_net_sysctl(net, "net/ipv6", table);
+ hdr = register_net_sysctl_sz(net, "net/ipv6", table,
+ ARRAY_SIZE(xfrm6_policy_table));
if (!hdr)
goto err_reg;
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bf6e81d56263..1af29af65388 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1419,7 +1419,8 @@ static int mpls_dev_sysctl_register(struct net_device *dev,
snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
- mdev->sysctl = register_net_sysctl(net, path, table);
+ mdev->sysctl = register_net_sysctl_sz(net, path, table,
+ ARRAY_SIZE(mpls_dev_table));
if (!mdev->sysctl)
goto free;
@@ -2689,7 +2690,8 @@ static int mpls_net_init(struct net *net)
for (i = 0; i < ARRAY_SIZE(mpls_table) - 1; i++)
table[i].data = (char *)net + (uintptr_t)table[i].data;
- net->mpls.ctl = register_net_sysctl(net, "net/mpls", table);
+ net->mpls.ctl = register_net_sysctl_sz(net, "net/mpls", table,
+ ARRAY_SIZE(mpls_table));
if (net->mpls.ctl == NULL) {
kfree(table);
return -ENOMEM;
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index ae20b7d92e28..43e540328a52 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -150,7 +150,8 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
table[4].data = &pernet->stale_loss_cnt;
table[5].data = &pernet->pm_type;
- hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table);
+ hdr = register_net_sysctl_sz(net, MPTCP_SYSCTL_PATH, table,
+ ARRAY_SIZE(mptcp_sysctl_table));
if (!hdr)
goto err_reg;
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c5b86066ff66..2dba7505b414 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -565,7 +565,8 @@ static __net_init int rds_tcp_init_net(struct net *net)
}
tbl[RDS_TCP_SNDBUF].data = &rtn->sndbuf_size;
tbl[RDS_TCP_RCVBUF].data = &rtn->rcvbuf_size;
- rtn->rds_tcp_sysctl = register_net_sysctl(net, "net/rds/tcp", tbl);
+ rtn->rds_tcp_sysctl = register_net_sysctl_sz(net, "net/rds/tcp", tbl,
+ ARRAY_SIZE(rds_tcp_sysctl_table));
if (!rtn->rds_tcp_sysctl) {
pr_warn("could not register sysctl\n");
err = -ENOMEM;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index a7a9136198fd..f65d6f92afcb 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -612,7 +612,9 @@ int sctp_sysctl_net_register(struct net *net)
table[SCTP_PF_RETRANS_IDX].extra2 = &net->sctp.ps_retrans;
table[SCTP_PS_RETRANS_IDX].extra1 = &net->sctp.pf_retrans;
- net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
+ net->sctp.sysctl_header = register_net_sysctl_sz(net, "net/sctp",
+ table,
+ ARRAY_SIZE(sctp_net_table));
if (net->sctp.sysctl_header == NULL) {
kfree(table);
return -ENOMEM;
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index b6f79fabb9d3..3ab2d8eefc55 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -81,7 +81,8 @@ int __net_init smc_sysctl_net_init(struct net *net)
table[i].data += (void *)net - (void *)&init_net;
}
- net->smc.smc_hdr = register_net_sysctl(net, "net/smc", table);
+ net->smc.smc_hdr = register_net_sysctl_sz(net, "net/smc", table,
+ ARRAY_SIZE(smc_table));
if (!net->smc.smc_hdr)
goto err_reg;
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index 500129aa710c..3e84b31c355a 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -36,7 +36,8 @@ int __net_init unix_sysctl_register(struct net *net)
table[0].data = &net->unx.sysctl_max_dgram_qlen;
}
- net->unx.ctl = register_net_sysctl(net, "net/unix", table);
+ net->unx.ctl = register_net_sysctl_sz(net, "net/unix", table,
+ ARRAY_SIZE(unix_table));
if (net->unx.ctl == NULL)
goto err_reg;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 0c6c5ef65f9d..7fdeafc838a7 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -44,6 +44,7 @@ static struct ctl_table xfrm_table[] = {
int __net_init xfrm_sysctl_init(struct net *net)
{
struct ctl_table *table;
+ size_t table_size = ARRAY_SIZE(xfrm_table);
__xfrm_sysctl_init(net);
@@ -56,10 +57,13 @@ int __net_init xfrm_sysctl_init(struct net *net)
table[3].data = &net->xfrm.sysctl_acq_expires;
/* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
+ if (net->user_ns != &init_user_ns) {
table[0].procname = NULL;
+ table_size = 0;
+ }
- net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
+ net->xfrm.sysctl_hdr = register_net_sysctl_sz(net, "net/core", table,
+ table_size);
if (!net->xfrm.sysctl_hdr)
goto out_register;
return 0;
--
2.30.2
This commit adds table_size to register_sysctl in preparation for the
removal of the sentinel elements in the ctl_table arrays (last empty
markers). And though we do *not* remove any sentinels in this commit, we
set things up by either passing the table_size explicitly or using
ARRAY_SIZE on the ctl_table arrays.
We replace the register_syctl function with a macro that will add the
ARRAY_SIZE to the new register_sysctl_sz function. In this way the
callers that are already using an array of ctl_table structs do not
change. For the callers that pass a ctl_table array pointer, we pass the
table_size to register_sysctl_sz instead of the macro.
Signed-off-by: Joel Granados <[email protected]>
Suggested-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/kernel/armv8_deprecated.c | 2 +-
arch/s390/appldata/appldata_base.c | 2 +-
fs/proc/proc_sysctl.c | 30 +++++++++++++++-------------
include/linux/sysctl.h | 10 ++++++++--
kernel/ucount.c | 2 +-
net/sysctl_net.c | 2 +-
6 files changed, 28 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 1febd412b4d2..e459cfd33711 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -569,7 +569,7 @@ static void __init register_insn_emulation(struct insn_emulation *insn)
sysctl->extra2 = &insn->max;
sysctl->proc_handler = emulation_proc_handler;
- register_sysctl("abi", sysctl);
+ register_sysctl_sz("abi", sysctl, 1);
}
}
diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index bbefe5e86bdf..3b0994625652 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -365,7 +365,7 @@ int appldata_register_ops(struct appldata_ops *ops)
ops->ctl_table[0].proc_handler = appldata_generic_handler;
ops->ctl_table[0].data = ops;
- ops->sysctl_header = register_sysctl(appldata_proc_name, ops->ctl_table);
+ ops->sysctl_header = register_sysctl_sz(appldata_proc_name, ops->ctl_table, 1);
if (!ops->sysctl_header)
goto out;
return 0;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b8dd78e344ff..80d3e2f61947 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -43,7 +43,7 @@ static struct ctl_table sysctl_mount_point[] = {
*/
struct ctl_table_header *register_sysctl_mount_point(const char *path)
{
- return register_sysctl(path, sysctl_mount_point);
+ return register_sysctl_sz(path, sysctl_mount_point, 0);
}
EXPORT_SYMBOL(register_sysctl_mount_point);
@@ -1399,7 +1399,7 @@ struct ctl_table_header *__register_sysctl_table(
}
/**
- * register_sysctl - register a sysctl table
+ * register_sysctl_sz - register a sysctl table
* @path: The path to the directory the sysctl table is in. If the path
* doesn't exist we will create it for you.
* @table: the table structure. The calller must ensure the life of the @table
@@ -1409,25 +1409,20 @@ struct ctl_table_header *__register_sysctl_table(
* to call unregister_sysctl_table() and can instead use something like
* register_sysctl_init() which does not care for the result of the syctl
* registration.
+ * @table_size: The number of elements in table.
*
* Register a sysctl table. @table should be a filled in ctl_table
* array. A completely 0 filled entry terminates the table.
*
* See __register_sysctl_table for more details.
*/
-struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table)
+struct ctl_table_header *register_sysctl_sz(const char *path, struct ctl_table *table,
+ size_t table_size)
{
- int count = 0;
- struct ctl_table *entry;
- struct ctl_table_header t_hdr;
-
- t_hdr.ctl_table = table;
- list_for_each_table_entry(entry, (&t_hdr))
- count++;
return __register_sysctl_table(&sysctl_table_root.default_set,
- path, table, count);
+ path, table, table_size);
}
-EXPORT_SYMBOL(register_sysctl);
+EXPORT_SYMBOL(register_sysctl_sz);
/**
* __register_sysctl_init() - register sysctl table to path
@@ -1452,10 +1447,17 @@ EXPORT_SYMBOL(register_sysctl);
void __init __register_sysctl_init(const char *path, struct ctl_table *table,
const char *table_name)
{
- struct ctl_table_header *hdr = register_sysctl(path, table);
+ int count = 0;
+ struct ctl_table *entry;
+ struct ctl_table_header t_hdr, *hdr;
+
+ t_hdr.ctl_table = table;
+ list_for_each_table_entry(entry, (&t_hdr))
+ count++;
+ hdr = register_sysctl_sz(path, table, count);
if (unlikely(!hdr)) {
- pr_err("failed when register_sysctl %s to %s\n", table_name, path);
+ pr_err("failed when register_sysctl_sz %s to %s\n", table_name, path);
return;
}
kmemleak_not_leak(hdr);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 0495c858989f..b1168ae281c9 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -215,6 +215,9 @@ struct ctl_path {
const char *procname;
};
+#define register_sysctl(path, table) \
+ register_sysctl_sz(path, table, ARRAY_SIZE(table))
+
#ifdef CONFIG_SYSCTL
void proc_sys_poll_notify(struct ctl_table_poll *poll);
@@ -227,7 +230,8 @@ extern void retire_sysctl_set(struct ctl_table_set *set);
struct ctl_table_header *__register_sysctl_table(
struct ctl_table_set *set,
const char *path, struct ctl_table *table, size_t table_size);
-struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table);
+struct ctl_table_header *register_sysctl_sz(const char *path, struct ctl_table *table,
+ size_t table_size);
void unregister_sysctl_table(struct ctl_table_header * table);
extern int sysctl_init_bases(void);
@@ -262,7 +266,9 @@ static inline struct ctl_table_header *register_sysctl_mount_point(const char *p
return NULL;
}
-static inline struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table)
+static inline struct ctl_table_header *register_sysctl_sz(const char *path,
+ struct ctl_table *table,
+ size_t table_size)
{
return NULL;
}
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 2b80264bb79f..4aa6166cb856 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -365,7 +365,7 @@ static __init int user_namespace_sysctl_init(void)
* default set so that registrations in the child sets work
* properly.
*/
- user_header = register_sysctl("user", empty);
+ user_header = register_sysctl_sz("user", empty, 0);
kmemleak_ignore(user_header);
BUG_ON(!user_header);
BUG_ON(!setup_userns_sysctls(&init_user_ns));
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 8ee4b74bc009..d9cbbb51b143 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -101,7 +101,7 @@ __init int net_sysctl_init(void)
* registering "/proc/sys/net" as an empty directory not in a
* network namespace.
*/
- net_header = register_sysctl("net", empty);
+ net_header = register_sysctl_sz("net", empty, 0);
if (!net_header)
goto out;
ret = register_pernet_subsys(&sysctl_pernet_ops);
--
2.30.2
We replace the ctl_table with the ctl_table_header pointer in
list_for_each_table_entry which is the macro responsible for traversing
the ctl_table arrays. This is a preparation commit that will make it
easier to add the ctl_table array size (that will be added to
ctl_table_header in subsequent commits) to the already existing loop
logic based on empty ctl_table elements (so called sentinels).
Signed-off-by: Joel Granados <[email protected]>
---
fs/proc/proc_sysctl.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 94d71446da39..884460b0385b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -19,8 +19,8 @@
#include <linux/kmemleak.h>
#include "internal.h"
-#define list_for_each_table_entry(entry, table) \
- for ((entry) = (table); (entry)->procname; (entry)++)
+#define list_for_each_table_entry(entry, header) \
+ for ((entry) = (header->ctl_table); (entry)->procname; (entry)++)
static const struct dentry_operations proc_sys_dentry_operations;
static const struct file_operations proc_sys_file_operations;
@@ -204,7 +204,7 @@ static void init_header(struct ctl_table_header *head,
if (node) {
struct ctl_table *entry;
- list_for_each_table_entry(entry, table) {
+ list_for_each_table_entry(entry, head) {
node->header = head;
node++;
}
@@ -215,7 +215,7 @@ static void erase_header(struct ctl_table_header *head)
{
struct ctl_table *entry;
- list_for_each_table_entry(entry, head->ctl_table)
+ list_for_each_table_entry(entry, head)
erase_entry(head, entry);
}
@@ -242,7 +242,7 @@ static int insert_header(struct ctl_dir *dir, struct ctl_table_header *header)
err = insert_links(header);
if (err)
goto fail_links;
- list_for_each_table_entry(entry, header->ctl_table) {
+ list_for_each_table_entry(entry, header) {
err = insert_entry(header, entry);
if (err)
goto fail;
@@ -1129,7 +1129,7 @@ static int sysctl_check_table(const char *path, struct ctl_table_header *header)
{
struct ctl_table *entry;
int err = 0;
- list_for_each_table_entry(entry, header->ctl_table) {
+ list_for_each_table_entry(entry, header) {
if ((entry->proc_handler == proc_dostring) ||
(entry->proc_handler == proc_dobool) ||
(entry->proc_handler == proc_dointvec) ||
@@ -1169,7 +1169,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
name_bytes = 0;
nr_entries = 0;
- list_for_each_table_entry(entry, head->ctl_table) {
+ list_for_each_table_entry(entry, head) {
nr_entries++;
name_bytes += strlen(entry->procname) + 1;
}
@@ -1188,7 +1188,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
link_name = (char *)&link_table[nr_entries + 1];
link = link_table;
- list_for_each_table_entry(entry, head->ctl_table) {
+ list_for_each_table_entry(entry, head) {
int len = strlen(entry->procname) + 1;
memcpy(link_name, entry->procname, len);
link->procname = link_name;
@@ -1211,7 +1211,7 @@ static bool get_links(struct ctl_dir *dir,
struct ctl_table *entry, *link;
/* Are there links available for every entry in table? */
- list_for_each_table_entry(entry, header->ctl_table) {
+ list_for_each_table_entry(entry, header) {
const char *procname = entry->procname;
link = find_entry(&tmp_head, dir, procname, strlen(procname));
if (!link)
@@ -1224,7 +1224,7 @@ static bool get_links(struct ctl_dir *dir,
}
/* The checks passed. Increase the registration count on the links */
- list_for_each_table_entry(entry, header->ctl_table) {
+ list_for_each_table_entry(entry, header) {
const char *procname = entry->procname;
link = find_entry(&tmp_head, dir, procname, strlen(procname));
tmp_head->nreg++;
@@ -1356,12 +1356,14 @@ struct ctl_table_header *__register_sysctl_table(
{
struct ctl_table_root *root = set->dir.header.root;
struct ctl_table_header *header;
+ struct ctl_table_header h_tmp;
struct ctl_dir *dir;
struct ctl_table *entry;
struct ctl_node *node;
int nr_entries = 0;
- list_for_each_table_entry(entry, table)
+ h_tmp.ctl_table = table;
+ list_for_each_table_entry(entry, (&h_tmp))
nr_entries++;
header = kzalloc(sizeof(struct ctl_table_header) +
@@ -1471,7 +1473,7 @@ static void put_links(struct ctl_table_header *header)
if (IS_ERR(core_parent))
return;
- list_for_each_table_entry(entry, header->ctl_table) {
+ list_for_each_table_entry(entry, header) {
struct ctl_table_header *link_head;
struct ctl_table *link;
const char *name = entry->procname;
--
2.30.2