Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754529AbaGUV2J (ORCPT ); Mon, 21 Jul 2014 17:28:09 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:37455 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753184AbaGUV2G (ORCPT ); Mon, 21 Jul 2014 17:28:06 -0400 Date: Mon, 21 Jul 2014 23:27:40 +0200 From: Peter Zijlstra To: Thomas Gleixner Cc: Darren Hart , Andi Kleen , Waiman Long , Ingo Molnar , Davidlohr Bueso , Heiko Carstens , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, Jason Low , Scott J Norton Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex Message-ID: <20140721212740.GS3935@laptop> References: <1405956271-34339-1-git-send-email-Waiman.Long@hp.com> <8761iq3bp3.fsf@tassilo.jf.intel.com> <871tte3bjw.fsf@tassilo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 21, 2014 at 10:16:37PM +0200, Thomas Gleixner wrote: > On Mon, 21 Jul 2014, Darren Hart wrote: > > We observed some significant improvements under some very specific use > > cases, but a more thorough dive into performance impact in the other cases > > as well as security implications with the vdso is still wanting. > > The security implication is that the feature can only be available for > process private futexes. There is no way to expose information which > crosses the process spaces. > > But the way worse issue is storage. > > While you can cache the namespace specific TID of a thread in the > task_struct, you still need a O(1) zero overhead mechanism to update > the thread state (only on/off cpu is interesting) in a per process > shared data structure from the guts of schedule() > > For that you have basically two choices: > > 1) cpu_thread_id[NR_CPUS] > > Simple to update from the scheduler, and a halfways moderate > storage size (NR_CPUS * 4 bytes) in the worst case, i.e. 16k > today. Set to 0 on scheduling out and to the namespace specific TID > on scheduling in. > > But that requires a linear search in the user space spin loop. And > that's required for every iteration of the loop. Can you imagine > how well that works performance wise? > > 2) Bitmap threads_on_cpu > > Again, simple to update from the scheduler, cache line bouncing > issues aside. Clear the bit on schedule out and set it on schedule > in. > > But the bitmap needs the size of PID_MAX_LIMIT, which is a whopping > 512k per process in the worst case. > > Anything else would involve search/lookup schemes which are just > overkill in both the scheduler and the user space loop. > > Now for enhanced fun you need immutable pages for that storage, as you > can't have pagefaults in the guts of schedule(). > > So once you found a way to make that opt-in as you don't want inflict > any of this to all processes by default, it might be a worthwhile > optimization. So the probably tolerable impact on schedule() would be > > schedule_out() > if (curr->threads_on_cpu) > clear_bit(curr->ns_tid, curr->threads_on_cpu); > and > > schedule_in() > if (curr->threads_on_cpu) > clear_bit(curr->ns_tid, curr->threads_on_cpu); > > Anything more complex is just going to defeat the whole purpose. All this is predicated on the fact that syscalls are 'expensive'. Weren't syscalls only 100s of cycles? All this bitmap mucking is far more expensive due to cacheline misses, which due to the size of the things is almost guaranteed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/