While backporting Michael's "pipe: fix limit handling" patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:
1 - procfs signed wrap: echo'ing a large number into
/proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
negative value.
2 - round_pipe_size() nr_pages overflow on 32bit: this would
subsequently try roundup_pow_of_two(0), which is undefined.
3 - visible non-rounded pipe-max-size value: there is no mutual
exclusion or protection between the time pipe_max_size is assigned
a raw value from proc_dointvec_minmax() and when it is rounded.
v1 (differences from initial rfc):
- Re-arrange patchset order, push smaller fixes to the front
- Add a check so that round_pipe_size(size < pipe_min_size) will round
up to round_pipe_size(pipe_min_size) as per man page [RD]
- Add new procfs proc_dopipe_max_size() and helpers to consolidate user
space read / type validation / rounding / assignment [MP]
Testing
=======
Tests run on both 32 and 64-bit kernels.
Patch 1 - procfs signed wrap
----------------------------
Before:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
-2147483648
After:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
2147483648
Patch 2 - 32bit overflow
------------------------
>From userspace:
fcntl(fd, F_SETPIPE_SZ, 0xffffffff);
- Before: return value was 4096 (due to overflow) and was set to 4096
- After: returns -1 and sets errno EINVAL, pipe size remains untouched
Patch 3 - non-rounded pipe-max-size value
-----------------------------------------
Keep plugging in values that need to be rounded:
while (true); do echo 1048570 > /proc/sys/fs/pipe-max-size; done
and in another terminal, loop around reading the value:
time (while (true); do SIZE=$(cat /proc/sys/fs/pipe-max-size); [[ $(( $SIZE % 4096 )) -ne 0 ]] && break; done; echo "$SIZE")
1048570
real 0m46.213s
user 0m29.688s
sys 0m20.042s
- Before: found a non-rounded value within a few minutes
- After: never encountered a non-page-rounded value
Joe Lawrence (3):
pipe: match pipe_max_size data type with procfs
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
pipe: add proc_dopipe_max_size() to safely assign pipe_max_size
fs/pipe.c | 21 +++++++++++----------
include/linux/pipe_fs_i.h | 1 +
include/linux/sysctl.h | 3 +++
kernel/sysctl.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 59 insertions(+), 11 deletions(-)
--
1.8.3.1
pipe_max_size is defined as an unsigned int:
unsigned int pipe_max_size = 1048576;
but its procfs/sysctl representation is an integer:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
that is signed:
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
This leads to signed results via procfs for large values of
pipe_max_size:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
-2147483648
Use unsigned operations on this variable to avoid such negative values.
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Joe Lawrence <[email protected]>
---
fs/pipe.c | 2 +-
kernel/sysctl.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index 97e5be897753..a21ad26de557 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1124,7 +1124,7 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
{
int ret;
- ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
+ ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
if (ret < 0 || !write)
return ret;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6648fbbb8157..c976719bf37a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1825,7 +1825,7 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(pipe_max_size),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
--
1.8.3.1
The round_pipe_size() function contains a right-bit-shift expression
which may overflow, which would cause undefined results in a subsequent
roundup_pow_of_two() call.
static inline unsigned int round_pipe_size(unsigned int size)
{
unsigned long nr_pages;
nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
}
PAGE_SIZE is defined as (1UL << PAGE_SHIFT), so:
- 4 bytes wide on 32-bit (0 to 0xffffffff)
- 8 bytes wide on 64-bit (0 to 0xffffffffffffffff)
That means that 32-bit round_pipe_size(), nr_pages may overflow to 0:
size=0x00000000 nr_pages=0x0
size=0x00000001 nr_pages=0x1
size=0xfffff000 nr_pages=0xfffff
size=0xfffff001 nr_pages=0x0 << !
size=0xffffffff nr_pages=0x0 << !
This is bad because roundup_pow_of_two(n) is undefined when n == 0!
64-bit is not a problem as the unsigned int size is 4 bytes wide
(similar to 32-bit) and the larger, 8 byte wide unsigned long, is
sufficient to handle the largest value of the bit shift expression:
size=0xffffffff nr_pages=100000
Modify round_pipe_size() to return 0 if n == 0 and updates its callers
to handle accordingly.
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Joe Lawrence <[email protected]>
---
fs/pipe.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index a21ad26de557..8cbc97d97753 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1017,13 +1017,19 @@ static int fifo_open(struct inode *inode, struct file *filp)
/*
* Currently we rely on the pipe array holding a power-of-2 number
- * of pages.
+ * of pages. Returns 0 on error.
*/
static inline unsigned int round_pipe_size(unsigned int size)
{
unsigned long nr_pages;
+ if (size < pipe_min_size)
+ size = pipe_min_size;
+
nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ if (nr_pages == 0)
+ return 0;
+
return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
}
@@ -1039,6 +1045,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
long ret = 0;
size = round_pipe_size(arg);
+ if (size == 0)
+ return -EINVAL;
nr_pages = size >> PAGE_SHIFT;
if (!nr_pages)
@@ -1122,13 +1130,18 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
+ unsigned int rounded_pipe_max_size;
int ret;
ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
if (ret < 0 || !write)
return ret;
- pipe_max_size = round_pipe_size(pipe_max_size);
+ rounded_pipe_max_size = round_pipe_size(pipe_max_size);
+ if (rounded_pipe_max_size == 0)
+ return -EINVAL;
+
+ pipe_max_size = rounded_pipe_max_size;
return ret;
}
--
1.8.3.1
pipe_max_size is assigned directly via procfs sysctl:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
...
and then later rounded in-place a few statements later:
...
pipe_max_size = round_pipe_size(pipe_max_size);
...
This leaves a window of time between initial assignment and rounding
that may be visible to other threads. (For example, one thread sets a
non-rounded value to pipe_max_size while another reads its value.)
Similar reads of pipe_max_size are potentially racey:
pipe.c :: alloc_pipe_info()
pipe.c :: pipe_set_size()
Add a new proc_dopipe_max_size() function that consolidates reading the
new value from the user buffer, verifying bounds, and calling
round_pipe_size() with a single assignment to pipe_max_size.
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Joe Lawrence <[email protected]>
---
fs/pipe.c | 16 ++--------------
include/linux/pipe_fs_i.h | 1 +
include/linux/sysctl.h | 3 +++
kernel/sysctl.c | 43 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+), 14 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index 8cbc97d97753..4db3cd2d139c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1019,7 +1019,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
* Currently we rely on the pipe array holding a power-of-2 number
* of pages. Returns 0 on error.
*/
-static inline unsigned int round_pipe_size(unsigned int size)
+unsigned int round_pipe_size(unsigned int size)
{
unsigned long nr_pages;
@@ -1130,19 +1130,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
- unsigned int rounded_pipe_max_size;
- int ret;
-
- ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
- if (ret < 0 || !write)
- return ret;
-
- rounded_pipe_max_size = round_pipe_size(pipe_max_size);
- if (rounded_pipe_max_size == 0)
- return -EINVAL;
-
- pipe_max_size = rounded_pipe_max_size;
- return ret;
+ return proc_dopipe_max_size(table, write, buf, lenp, ppos);
}
/*
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e7497c9dde7f..485cf7a7aa8f 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -190,5 +190,6 @@ static inline int pipe_buf_steal(struct pipe_inode_info *pipe,
struct pipe_inode_info *get_pipe_info(struct file *file);
int create_pipe_files(struct file **, int);
+unsigned int round_pipe_size(unsigned int size);
#endif
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 1d4dba490fb6..ba24ca72800c 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -50,6 +50,9 @@ extern int proc_dointvec_minmax(struct ctl_table *, int,
extern int proc_douintvec_minmax(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
+extern int proc_dopipe_max_size(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
extern int proc_dointvec_jiffies(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_userhz_jiffies(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c976719bf37a..7a2913c5546e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,7 @@
#include <linux/kexec.h>
#include <linux/bpf.h>
#include <linux/mount.h>
+#include <linux/pipe_fs_i.h>
#include <linux/uaccess.h>
#include <asm/processor.h>
@@ -2631,6 +2632,47 @@ int proc_douintvec_minmax(struct ctl_table *table, int write,
do_proc_douintvec_minmax_conv, ¶m);
}
+struct do_proc_dopipe_max_size_conv_param {
+ unsigned int *min;
+};
+
+static int do_proc_dopipe_max_size_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
+{
+ struct do_proc_dopipe_max_size_conv_param *param = data;
+
+ if (write) {
+ unsigned int val = round_pipe_size(*lvalp);
+
+ if (val == 0)
+ return -EINVAL;
+
+ if (param->min && *param->min > val)
+ return -ERANGE;
+
+ if (*lvalp > UINT_MAX)
+ return -EINVAL;
+
+ *valp = val;
+ } else {
+ unsigned int val = *valp;
+ *lvalp = (unsigned long) val;
+ }
+
+ return 0;
+}
+
+int proc_dopipe_max_size(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct do_proc_dopipe_max_size_conv_param param = {
+ .min = (unsigned int *) table->extra1,
+ };
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_dopipe_max_size_conv, ¶m);
+}
+
static void validate_coredump_safety(void)
{
#ifdef CONFIG_COREDUMP
@@ -3179,6 +3221,7 @@ int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int write,
EXPORT_SYMBOL(proc_dointvec_jiffies);
EXPORT_SYMBOL(proc_dointvec_minmax);
EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
+EXPORT_SYMBOL_GPL(proc_dopipe_max_size);
EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
--
1.8.3.1
Hi Joe,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc2 next-20170928]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Joe-Lawrence/A-few-round_pipe_size-and-pipe-max-size-fixups/20170929-205234
config: i386-tinyconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
fs/pipe.o: In function `pipe_proc_fn':
>> pipe.c:(.text+0xe90): undefined reference to `proc_dopipe_max_size'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Joe,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc2 next-20170929]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Joe-Lawrence/A-few-round_pipe_size-and-pipe-max-size-fixups/20170929-205234
config: h8300-h8300h-sim_defconfig (attached as .config)
compiler: h8300-linux-gcc (GCC) 6.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=h8300
All errors (new ones prefixed by >>):
fs/pipe.o: In function `pipe_proc_fn':
>> fs/pipe.c:1133: undefined reference to `proc_dopipe_max_size'
vim +1133 fs/pipe.c
1125
1126 /*
1127 * This should work even if CONFIG_PROC_FS isn't set, as proc_dointvec_minmax
1128 * will return an error.
1129 */
1130 int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
1131 size_t *lenp, loff_t *ppos)
1132 {
> 1133 return proc_dopipe_max_size(table, write, buf, lenp, ppos);
1134 }
1135
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation