Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751357AbbEZSMI (ORCPT ); Tue, 26 May 2015 14:12:08 -0400 Received: from mail-ie0-f178.google.com ([209.85.223.178]:36252 "EHLO mail-ie0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750962AbbEZSMF (ORCPT ); Tue, 26 May 2015 14:12:05 -0400 MIME-Version: 1.0 In-Reply-To: <20150526114356.609107918@infradead.org> References: <20150526114356.609107918@infradead.org> Date: Tue, 26 May 2015 11:12:04 -0700 X-Google-Sender-Auth: HIClrXo9vZhVKE3ACnVq-qe3-08 Message-ID: Subject: Re: [RFC][PATCH 0/5] Optimize percpu-rwsem From: Linus Torvalds To: Peter Zijlstra Cc: Oleg Nesterov , Paul McKenney , Tejun Heo , Ingo Molnar , Linux Kernel Mailing List , der.herr@hofr.at, Davidlohr Bueso Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2907 Lines: 79 On Tue, May 26, 2015 at 4:43 AM, Peter Zijlstra wrote: > > This is a derived work of the cpu hotplug lock rework I did in 2013 which never > really went anywhere because Linus didn't like it. > > This applies those same optimizations to the percpu-rwsem. Seeing how we did > all the work it seemed a waste to not use it at all. So I *still* don't like it. We literally have one single percpu-rwsem IN THE WHOLE KERNEL TREE. One. Not "a couple". Not "hundreds". ONE. And that single use that you are optimizing for isn't even explained. It's not really even clear that that thing is needed: fork() already takes the mmap_sem for writing, and I'm not at all convinced that the uprobes code couldn't just use mmap_sem for every mm it traverses, rather than use this global percpu lock that we use absolutely nowhere else. So I'm not convinced that the right thing to do is to try to optimize that lock at ALL. I'd rather get rid of it entirely. Is there some new use that I don't know about? Have people really looked at that uprobes code deeply? OF COURSE global locks will have problems, I'm not at all convinced that "let's make that global lock really complicated and clever" is the proper solution. For example, the write lock is only ever taken when registering an uprobe. Not exactly likely to be timing-critical. Is there some reason why we couldn't get rid of the magical per-cpu rwsem entirely, and replace it with something truly simple like a hashed rmsem, where the readers (fork), will just read-lock one hashed rwsem, and the writer will just write-lock every hashed rwsem. The hash might be "pick the currently executing CPU at the time of fork(), modulo 16". That one doesn't even need to disable preemption, since it doesn't actually matter that the CPU stays constant, it's just a trivial heurisitic to avoid cacheline pingpongs under very heavy fork() load. So we'd have #define NR_UPROBE_RWSEM 16 // One cacheline per rwsem. struct cache_line_aligned_rw_semaphore uprobe_rwsem[NR_UPROBE_RWSEM]; and fork() would basically do int idx = raw_smp_processor_id() & (NR_UPROBE_RWSEM-1); struct rw_sem *usem = &uprobe_rwsem[idx].rwsem; .. down_read(usem); ... do the dup_mmap() here .. up_read(usem); and the uprobe register thing could just do for (i = 0; i < NR_UPROBE_RWSEM; i++) down_write(&uprobe_rwsem[i].rwsem); ... for (i = 0; ... unlock them all ..) which doesn't look any slower than what we do now (I suspect the fork() part would be faster), and is much simpler, and would get rid of the percpu-rwsem entirely. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/