Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261256AbTEHKR4 (ORCPT ); Thu, 8 May 2003 06:17:56 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261259AbTEHKR4 (ORCPT ); Thu, 8 May 2003 06:17:56 -0400 Received: from 237.oncolt.com ([213.86.99.237]:9424 "EHLO warthog.warthog") by vger.kernel.org with ESMTP id S261256AbTEHKRy (ORCPT ); Thu, 8 May 2003 06:17:54 -0400 To: "Richard B. Johnson" cc: Linux kernel Subject: Re: top stack (l)users for 2.5.69 In-Reply-To: User-Agent: EMH/1.14.1 SEMI/1.14.4 (Hosorogi) FLIM/1.14.4 (=?ISO-8859-4?Q?Kashiharajing=FE-mae?=) APEL/10.4 Emacs/21.2 (i386-redhat-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII Date: Thu, 08 May 2003 11:29:58 +0100 Message-ID: <3433.1052389798@warthog.warthog> From: David Howells Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3352 Lines: 83 > The kernel stack, at least for ix86, is only one, set upon startup > at 8192 bytes above a label called init_task_unit. The kernel must > have a separate stack and, contrary to what I've been reading on > this list, it can't have more kernel stacks than CPUs and, I don't > see a separate stack allocated for different CPUs. Not so... Look into asm-i386/thread_info.h on 2.5 kernels. Each process/thread has a chunk of memory of THREAD_SIZE size (8K on i386) with a small structure (struct thread_info) lurking at the bottom and its own personal kernel stack lurking at the top. The number of CPUs doesn't have anything to do with it. Unless you mean "kernel stack pointer"? > The context of a task (see entry.S) is completely defined by > its registers, including the hidden part of the segments > (selectors) that define priviledge. No, it's not. It's also defined by, for instance, all the waitqueues it's on, and these bits of information are frequently stored on the kernel stack, and all the internal function information in the call chain leading to whoever called schedule(). > But no! Not at all. The context of a user does not need to be saved > on the stack, and in fact, isn't. It's saved in a task structure > that was created when the original task was born. The pointer to > that task structure is called 'current' in the kernel. It's in > the kernel's data space, and everything necessary to put that > task back together is in that structure. > > Context switching is usually not done by pushing all the registers > onto a stack, then later popping them back. That's not the way > it works. Yes it is. The context saved on the kernel stack belonging to the process that was executing at the time. entry.S uses SAVE_ALL to build a stack frame including all the values of all the data registers that were in use at the time. The pt_regs structure defines the layout of this. The pt_regs that holds the userspace context for any particular process is pointed to by current->thread.esp0 or its equivalent on other archs (look at arch/i386/ptrace.c). What you get in the thread info for any process is: +------------------+ <--- highest addr | PT_REGS (userspace) +------------------+ | function call chain +------------------+ | PT_REGS (kernel) <--- MMU exception maybe +------------------+ | function call chain +------------------+ | PT_REGS (kernel) <--- timer interrupt maybe +------------------+ | function call chain +------------------+ <--- addr in kernel stack pointer | : | +------------------+ <--- limit of stack | THREAD_INFO +------------------+ <--- lowest addr The lowest address always resides on an 8Kb boundary, and so the address of THREAD_INFO can be found by ESP&~8191. That said, some registers are saved in current->thread, but only the minimum possible. This is done by switch_to() which is called from the scheduler. Furthermore, interrupt handlers aren't allowed to call schedule (the scheduler barfs on them if they do), so stacks will only be switched if the top stack frame belongs to the process and not to an interrupt. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/