Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753714AbaGUUQ7 (ORCPT ); Mon, 21 Jul 2014 16:16:59 -0400 Received: from www.linutronix.de ([62.245.132.108]:33292 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753497AbaGUUQ5 (ORCPT ); Mon, 21 Jul 2014 16:16:57 -0400 Date: Mon, 21 Jul 2014 22:16:37 +0200 (CEST) From: Thomas Gleixner To: Darren Hart cc: Andi Kleen , Waiman Long , Ingo Molnar , Peter Zijlstra , Davidlohr Bueso , Heiko Carstens , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, Jason Low , Scott J Norton Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex In-Reply-To: Message-ID: References: <1405956271-34339-1-git-send-email-Waiman.Long@hp.com> <8761iq3bp3.fsf@tassilo.jf.intel.com> <871tte3bjw.fsf@tassilo.jf.intel.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 21 Jul 2014, Darren Hart wrote: > We observed some significant improvements under some very specific use > cases, but a more thorough dive into performance impact in the other cases > as well as security implications with the vdso is still wanting. The security implication is that the feature can only be available for process private futexes. There is no way to expose information which crosses the process spaces. But the way worse issue is storage. While you can cache the namespace specific TID of a thread in the task_struct, you still need a O(1) zero overhead mechanism to update the thread state (only on/off cpu is interesting) in a per process shared data structure from the guts of schedule() For that you have basically two choices: 1) cpu_thread_id[NR_CPUS] Simple to update from the scheduler, and a halfways moderate storage size (NR_CPUS * 4 bytes) in the worst case, i.e. 16k today. Set to 0 on scheduling out and to the namespace specific TID on scheduling in. But that requires a linear search in the user space spin loop. And that's required for every iteration of the loop. Can you imagine how well that works performance wise? 2) Bitmap threads_on_cpu Again, simple to update from the scheduler, cache line bouncing issues aside. Clear the bit on schedule out and set it on schedule in. But the bitmap needs the size of PID_MAX_LIMIT, which is a whopping 512k per process in the worst case. Anything else would involve search/lookup schemes which are just overkill in both the scheduler and the user space loop. Now for enhanced fun you need immutable pages for that storage, as you can't have pagefaults in the guts of schedule(). So once you found a way to make that opt-in as you don't want inflict any of this to all processes by default, it might be a worthwhile optimization. So the probably tolerable impact on schedule() would be schedule_out() if (curr->threads_on_cpu) clear_bit(curr->ns_tid, curr->threads_on_cpu); and schedule_in() if (curr->threads_on_cpu) clear_bit(curr->ns_tid, curr->threads_on_cpu); Anything more complex is just going to defeat the whole purpose. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/