Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933344AbXBXCRR (ORCPT ); Fri, 23 Feb 2007 21:17:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933345AbXBXCRR (ORCPT ); Fri, 23 Feb 2007 21:17:17 -0500 Received: from wr-out-0506.google.com ([64.233.184.224]:55044 "EHLO wr-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933344AbXBXCRQ (ORCPT ); Fri, 23 Feb 2007 21:17:16 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=to2H0dZdRmCgzPGNV9EQYEwHRx9q1bNfqW0aQ+z7nqy4ALlAkON3N5wWBsqgV2tEyYggwXS1dXohFPHnSt1yJwKOTKoyc176K1treGlwGH0pCLQV5AX710z6DkB7Z+JbCSwepTV8RtM2Bto9muTpqMkvxvKw7OT2rw3eanHGzfQ= Message-ID: Date: Fri, 23 Feb 2007 18:17:12 -0800 From: "Michael K. Edwards" To: Alan Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3 Cc: "Ingo Molnar" , "Evgeniy Polyakov" , "Ulrich Drepper" , linux-kernel@vger.kernel.org, "Linus Torvalds" , "Arjan van de Ven" , "Christoph Hellwig" , "Andrew Morton" , "Zach Brown" , "David S. Miller" , "Suparna Bhattacharya" , "Davide Libenzi" , "Jens Axboe" , "Thomas Gleixner" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070221211355.GA7302@elte.hu> <20070222113148.GA3781@2ka.mipt.ru> <20070222125931.GB25788@elte.hu> <20070223003018.0d244576@lxorguk.ukuu.org.uk> <20070223123718.54c9670e@lxorguk.ukuu.org.uk> <20070224010824.5cf6c0ac@lxorguk.ukuu.org.uk> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1854 Lines: 37 I wrote: > (On a pre-EABI ARM, there is even a substantial > cache-related penalty for encoding the syscall number in the syscall > opcode, because you have to peek back at the text segment to see it, > which costs you a D-cache stall.) Before you say it, I'm aware that this is not directly relevant to TLS switch costs, except insofar as the "arch-dependent syscalls" introduced for certain parts of ARM TLS handling carry the same overhead as any other syscall. My point is that the system impact of seemingly benign operations is not always predictable even to the arch experts, and therefore one should be "parsimonious" (to use Kahan's word) in defining what semantics programmers may rely on in performance-critical situations. If you arrange things so that threadlets are scheduled as much as possible in bursts that share the same processor context (process context, location in program text, TLS arena, FPU state -- basically everything other than stack and integer registers), you are giving yourself and future designers the maximum opportunity for exploiting hardware optimizations. This would be a good thing if you want threadlets to be performance-competitive with state machine designs. If you still allow application programmers to _use_ shared processor state, in the knowledge that it will be clobbered on threadlet switch, then threadlets can use most of the coding style with which programmers of event-driven frameworks are familiar. This would be a good thing if you want threadlets to get wider use than the innards of three or four databases and web servers. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/