Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751652AbdIOOIZ (ORCPT ); Fri, 15 Sep 2017 10:08:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54060 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751435AbdIOOIX (ORCPT ); Fri, 15 Sep 2017 10:08:23 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4D30E285D0 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=joe.lawrence@redhat.com Subject: Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex To: Mikulas Patocka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Alexander Viro , "Luis R. Rodriguez" , Kees Cook , Michael Kerrisk , Randy Dunlap References: <1504622676-2992-1-git-send-email-joe.lawrence@redhat.com> <1504622676-2992-3-git-send-email-joe.lawrence@redhat.com> From: Joe Lawrence Organization: Red Hat Message-ID: <29f65845-40e5-839d-0bc7-79a5c1fa6f4b@redhat.com> Date: Fri, 15 Sep 2017 10:08:13 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 15 Sep 2017 14:08:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5511 Lines: 166 On 09/14/2017 07:09 PM, Mikulas Patocka wrote: > On Tue, 5 Sep 2017, Joe Lawrence wrote: >> pipe_max_size is assigned directly via procfs sysctl: >> >> static struct ctl_table fs_table[] = { >> ... >> { >> .procname = "pipe-max-size", >> .data = &pipe_max_size, >> .maxlen = sizeof(int), >> .mode = 0644, >> .proc_handler = &pipe_proc_fn, >> .extra1 = &pipe_min_size, >> }, >> ... >> >> int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, >> size_t *lenp, loff_t *ppos) >> { >> ... >> ret = proc_dointvec_minmax(table, write, buf, lenp, ppos) >> ... >> >> and then later rounded in-place a few statements later: >> >> ... >> pipe_max_size = round_pipe_size(pipe_max_size); >> ... >> >> This leaves a window of time between initial assignment and rounding >> that may be visible to other threads. (For example, one thread sets a >> non-rounded value to pipe_max_size while another reads its value.) >> >> Similar reads of pipe_max_size are potentially racey: >> >> pipe.c :: alloc_pipe_info() >> pipe.c :: pipe_set_size() >> >> Protect them and the procfs sysctl assignment with a mutex. >> >> Reported-by: Mikulas Patocka >> Signed-off-by: Joe Lawrence >> --- >> fs/pipe.c | 28 ++++++++++++++++++++++++---- >> 1 file changed, 24 insertions(+), 4 deletions(-) >> >> diff --git a/fs/pipe.c b/fs/pipe.c >> index fa28910b3c59..33bb11b0d78e 100644 >> --- a/fs/pipe.c >> +++ b/fs/pipe.c >> @@ -35,6 +35,11 @@ >> unsigned int pipe_max_size = 1048576; >> >> /* >> + * Provide mutual exclusion around access to pipe_max_size >> + */ >> +static DEFINE_MUTEX(pipe_max_mutex); >> + >> +/* >> * Minimum pipe size, as required by POSIX >> */ >> unsigned int pipe_min_size = PAGE_SIZE; >> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void) >> unsigned long pipe_bufs = PIPE_DEF_BUFFERS; >> struct user_struct *user = get_current_user(); >> unsigned long user_bufs; >> + unsigned int max_size; >> >> pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT); >> if (pipe == NULL) >> goto out_free_uid; >> >> - if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE)) >> - pipe_bufs = pipe_max_size >> PAGE_SHIFT; >> + mutex_lock(&pipe_max_mutex); >> + max_size = pipe_max_size; >> + mutex_unlock(&pipe_max_mutex); >> + >> + if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE)) >> + pipe_bufs = max_size >> PAGE_SHIFT; >> >> user_bufs = account_pipe_buffers(user, 0, pipe_bufs); >> >> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) >> struct pipe_buffer *bufs; >> unsigned int size, nr_pages; >> unsigned long user_bufs; >> + unsigned int max_size; >> long ret = 0; >> >> size = round_pipe_size(arg); >> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) >> * Decreasing the pipe capacity is always permitted, even >> * if the user is currently over a limit. >> */ >> + mutex_lock(&pipe_max_mutex); >> + max_size = pipe_max_size; >> + mutex_unlock(&pipe_max_mutex); >> if (nr_pages > pipe->buffers && >> - size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) >> + size > max_size && !capable(CAP_SYS_RESOURCE)) >> return -EPERM; >> >> user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); >> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, >> unsigned int rounded_pipe_max_size; >> int ret; >> >> + mutex_lock(&pipe_max_mutex); >> orig_pipe_max_size = pipe_max_size; >> ret = proc_dointvec_minmax(table, write, buf, lenp, ppos); >> - if (ret < 0 || !write) >> + if (ret < 0 || !write) { >> + mutex_unlock(&pipe_max_mutex); >> return ret; >> + } >> >> rounded_pipe_max_size = round_pipe_size(pipe_max_size); >> if (rounded_pipe_max_size == 0) { >> pipe_max_size = orig_pipe_max_size; >> + mutex_unlock(&pipe_max_mutex); >> return -EINVAL; >> } >> >> pipe_max_size = rounded_pipe_max_size; >> + mutex_unlock(&pipe_max_mutex); >> + >> return ret; >> } >> >> -- >> 1.8.3.1 >> > I think this mutex is too heavy - if multiple processes simultaneously > create a pipe, the mutex would cause performance degradation. > > You can call do_proc_dointvec with a custom callback "conv" function that > does the rounding of the pipe size value. > > Mikulas > Hi Mikulas, I'm not strong when it comes to memory barriers, but one of the side-effects of using the mutex is that pipe_set_size() and alloc_pipe_info() should have a consistent view of pipe_max_size. If I remove the mutex (and assume that I implement a custom do_proc_dointvec "conv" callback), is it safe for these routines to directly use pipe_max_size as they had done before? If not, is it safe to alias through a temporary stack variable (ie, could the compiler re-read pipe_max_size multiple times in the function)? Would READ_ONCE() help in any way? The mutex covered up some confusion on my part here. OTOH, since pipe_max_size is read-only for pipe_set_size() and alloc_pipe_info() and only updated occasionally by pipe_proc_fn(), would rw_semaphore or RCU be a fit here? Regards, -- Joe