Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754739AbbDTUuB (ORCPT ); Mon, 20 Apr 2015 16:50:01 -0400 Received: from mail-ie0-f169.google.com ([209.85.223.169]:33653 "EHLO mail-ie0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754494AbbDTUty (ORCPT ); Mon, 20 Apr 2015 16:49:54 -0400 Message-ID: <1429562991.7346.290.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install From: Eric Dumazet To: Mateusz Guzik Cc: Al Viro , Andrew Morton , "Paul E. McKenney" , Yann Droneaud , Konstantin Khlebnikov , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 20 Apr 2015 13:49:51 -0700 In-Reply-To: <1429550126.7346.268.camel@edumazet-glaptop2.roam.corp.google.com> References: <20150416121628.GA20615@mguzik> <1429307216.7346.255.camel@edumazet-glaptop2.roam.corp.google.com> <20150417221646.GA15589@mguzik> <20150417230252.GE889@ZenIV.linux.org.uk> <20150420130633.GA2513@mguzik> <20150420134326.GC2513@mguzik> <20150420151054.GD2513@mguzik> <1429550126.7346.268.camel@edumazet-glaptop2.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3287 Lines: 110 On Mon, 2015-04-20 at 10:15 -0700, Eric Dumazet wrote: > On Mon, 2015-04-20 at 17:10 +0200, Mateusz Guzik wrote: > > > Sorry for spam but I came up with another hack. :) > > > > The idea is that we can have a variable which would signify the that > > given thread is playing with fd table in fd_install (kind of a lock > > embedded into task_struct). We would also have a flag in files struct > > indicating that a thread would like to resize it. > > > > expand_fdtable would set the flag and iterate over all threads waiting > > for all of them to have the var set to 0. > > The opposite : you have to block them in some way and add a rcu_sched() > or something. Here is the patch I cooked here but not yet tested. diff --git a/fs/file.c b/fs/file.c index 93c5f89c248b..d72bdcacd4df 100644 --- a/fs/file.c +++ b/fs/file.c @@ -145,17 +145,23 @@ static int expand_fdtable(struct files_struct *files, int nr) { struct fdtable *new_fdt, *cur_fdt; + files->resize_in_progress++; spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr); + /* make sure all __fd_install() have seen resize_in_progress */ + synchronize_sched(); spin_lock(&files->file_lock); - if (!new_fdt) + if (!new_fdt) { + files->resize_in_progress--; return -ENOMEM; + } /* * extremely unlikely race - sysctl_nr_open decreased between the check in * caller and alloc_fdtable(). Cheaper to catch it here... */ if (unlikely(new_fdt->max_fds <= nr)) { __free_fdtable(new_fdt); + files->resize_in_progress--; return -EMFILE; } /* @@ -173,6 +179,9 @@ static int expand_fdtable(struct files_struct *files, int nr) /* Somebody else expanded, so undo our attempt */ __free_fdtable(new_fdt); } + /* coupled with smp_rmb() in __fd_install() */ + smp_wmb(); + files->resize_in_progress--; return 1; } @@ -256,6 +265,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp) atomic_set(&newf->count, 1); spin_lock_init(&newf->file_lock); + newf->resize_in_progress = 0; newf->next_fd = 0; new_fdt = &newf->fdtab; new_fdt->max_fds = NR_OPEN_DEFAULT; @@ -553,11 +563,21 @@ void __fd_install(struct files_struct *files, unsigned int fd, struct file *file) { struct fdtable *fdt; - spin_lock(&files->file_lock); + + rcu_read_lock_sched(); + + while (unlikely(READ_ONCE(files->resize_in_progress))) { + rcu_read_unlock_sched(); + yield(); + rcu_read_lock_sched(); + } + /* coupled with smp_wmb() in expand_fdtable() */ + smp_rmb(); fdt = files_fdtable(files); BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); - spin_unlock(&files->file_lock); + + rcu_read_unlock_sched(); } void fd_install(unsigned int fd, struct file *file) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 230f87bdf5ad..7496a0d73a7c 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -47,6 +47,7 @@ struct files_struct { * read mostly part */ atomic_t count; + int resize_in_progress; struct fdtable __rcu *fdt; struct fdtable fdtab; /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/