LinuxLists.cc - kernel stack size

2005-04-02 17:46:39

by ooyama eiichi

[permalink] [raw]

Subject: kernel stack size

Hi all,

How can I know the rest size of the kernel stack.
(in my kernel driver)
Thanks.

2005-04-02 17:54:01

by Chris Wedgwood

[permalink] [raw]

Subject: Re: kernel stack size

On Sun, Apr 03, 2005 at 02:46:34AM +0900, ooyama eiichi wrote:

> How can I know the rest size of the kernel stack.

you can't in a platfork-independant way

> (in my kernel driver)

*why* do you want to do this?

2005-04-02 18:16:32

by ooyama eiichi

[permalink] [raw]

Subject: Re: kernel stack size

Thanks for your reply.

> On Sun, Apr 03, 2005 at 02:46:34AM +0900, ooyama eiichi wrote:
>
> > How can I know the rest size of the kernel stack.
>
> you can't in a platfork-independant way

in i386 and ia64.

>
> > (in my kernel driver)
>
> *why* do you want to do this?
>

because my driver hungs the machine by an certain ioctl.
and it seems to me there is no bad in the code correspond to
the ioctl, except for that it is using large auto variables.
(some functions are useing ~1KB autos)

2005-04-02 18:24:44

by Chris Wedgwood

[permalink] [raw]

Subject: Re: kernel stack size

On Sun, Apr 03, 2005 at 03:15:42AM +0900, ooyama eiichi wrote:

> in i386 and ia64.

search for CONFIG_DEBUG_STACKOVERFLOW in arch/i386/kernel/irq.c

ia64 has fairly large stacks so you probably won't need to check there
if you get the above working

> because my driver hungs the machine by an certain ioctl. and it
> seems to me there is no bad in the code correspond to the ioctl,
> except for that it is using large auto variables. (some functions
> are useing ~1KB autos)

don't do that, even if you make it 'apparently' work for you it will
just end up being a problem mater on or for someone else

2005-04-02 18:27:06

by Brian Gerst

[permalink] [raw]

Subject: Re: kernel stack size

ooyama eiichi wrote:
> Thanks for your reply.
>
>
>>On Sun, Apr 03, 2005 at 02:46:34AM +0900, ooyama eiichi wrote:
>>
>>
>>>How can I know the rest size of the kernel stack.
>>
>>you can't in a platfork-independant way
>
>
> in i386 and ia64.
>
>
>>>(in my kernel driver)
>>
>>*why* do you want to do this?
>>
>
>
> because my driver hungs the machine by an certain ioctl.
> and it seems to me there is no bad in the code correspond to
> the ioctl, except for that it is using large auto variables.
> (some functions are useing ~1KB autos)

That's your problem. Use kmalloc instead of large local variables.

--
Brian Gerst

2005-04-02 18:49:04

by ooyama eiichi

[permalink] [raw]

Subject: Re: kernel stack size

> On Sun, Apr 03, 2005 at 03:15:42AM +0900, ooyama eiichi wrote:
>
> > in i386 and ia64.
>
> search for CONFIG_DEBUG_STACKOVERFLOW in arch/i386/kernel/irq.c

Oh, very good information for me.

>
> ia64 has fairly large stacks so you probably won't need to check there
> if you get the above working

in ia64, he works properly.

>
> > because my driver hungs the machine by an certain ioctl. and it
> > seems to me there is no bad in the code correspond to the ioctl,
> > except for that it is using large auto variables. (some functions
> > are useing ~1KB autos)
>
> don't do that, even if you make it 'apparently' work for you it will
> just end up being a problem mater on or for someone else
>

I changed these to using kmalloc().
(but not yet confirmed for my driver to work properly)

Thanks very much.

2005-04-02 19:04:23

by Steven Rostedt

[permalink] [raw]

Subject: Re: kernel stack size

On Sun, 2005-04-03 at 03:48 +0900, ooyama eiichi wrote:

> > > because my driver hungs the machine by an certain ioctl. and it
> > > seems to me there is no bad in the code correspond to the ioctl,
> > > except for that it is using large auto variables. (some functions
> > > are useing ~1KB autos)
> >
> > don't do that, even if you make it 'apparently' work for you it will
> > just end up being a problem mater on or for someone else
> >
>
> I changed these to using kmalloc().
> (but not yet confirmed for my driver to work properly)

You can also use globally static variables too. But this makes for
non-reentry code.

Sometimes I don't feel that a kmalloc is worth it, and if the function
in question for the driver would seldom have problems with reentry, I
use a statically defined global, and protect it with spin_locks. If
these can also be used in interrupt context, you need to use the
spin_lock_irqsave variants. But don't do this if the critical section
has long latencies.

-- Steve

2005-04-02 19:37:10

by Al Viro

[permalink] [raw]

Subject: Re: kernel stack size

On Sat, Apr 02, 2005 at 02:04:11PM -0500, Steven Rostedt wrote:
> You can also use globally static variables too. But this makes for
> non-reentry code.
>
> Sometimes I don't feel that a kmalloc is worth it, and if the function
> in question for the driver would seldom have problems with reentry, I
> use a statically defined global, and protect it with spin_locks. If
> these can also be used in interrupt context, you need to use the
> spin_lock_irqsave variants. But don't do this if the critical section
> has long latencies.

... and the first time copy_from_user() blocks under your spinlock
you will get a nice shiny deadlock.

2005-04-02 19:52:32

by Steven Rostedt

[permalink] [raw]

Subject: Re: kernel stack size

On Sat, 2005-04-02 at 20:37 +0100, Al Viro wrote:
> On Sat, Apr 02, 2005 at 02:04:11PM -0500, Steven Rostedt wrote:
> > You can also use globally static variables too. But this makes for
> > non-reentry code.
> >
> > Sometimes I don't feel that a kmalloc is worth it, and if the function
> > in question for the driver would seldom have problems with reentry, I
> > use a statically defined global, and protect it with spin_locks. If
> > these can also be used in interrupt context, you need to use the
> > spin_lock_irqsave variants. But don't do this if the critical section
> > has long latencies.
>
> ... and the first time copy_from_user() blocks under your spinlock
> you will get a nice shiny deadlock.

I forgot that he mentioned that this was for ioctls. I then use
semaphores if I need to access userspace. But if it just needs to modify
data around areas that only the kernel uses, without access to
userspace, than I use spinlocks.

I admit you really need to know what you're doing to use this method. If
I believe that a kmalloc would be too expensive, then I use the locking
of static variables. But each situation is different and I try to use
the best method for the occasion.

-- Steve

2005-04-02 20:14:55

by Manfred Spraul

[permalink] [raw]

Subject: Re: kernel stack size

Steven Rostedt wrote:

>I admit you really need to know what you're doing to use this method. If
>I believe that a kmalloc would be too expensive, then I use the locking
>of static variables. But each situation is different and I try to use
>the best method for the occasion.
>
>
Have you benchmarked your own memory manager?
kmalloc(1024, GFP_KERNEL) is something like 17 instructions on i386
uniprocessor.

--
Manfred

2005-04-02 22:19:48

by Steven Rostedt

[permalink] [raw]

Subject: Re: kernel stack size

On Sat, 2005-04-02 at 22:14 +0200, Manfred Spraul wrote:
> Steven Rostedt wrote:
>
> >I admit you really need to know what you're doing to use this method. If
> >I believe that a kmalloc would be too expensive, then I use the locking
> >of static variables. But each situation is different and I try to use
> >the best method for the occasion.
> >
> >
> Have you benchmarked your own memory manager?
> kmalloc(1024, GFP_KERNEL) is something like 17 instructions on i386
> uniprocessor.

Where did you get that? I'm looking at the assembly of it right now and
it's much larger than 17 instructions. Not to mention that it calls the
slab functions which might have to invoke the buddy system.

Also, I don't use my own memory manager. My memory manager would be the
statically allocated globals (allocated automatically when the kernel
loads at boot up) and spin_locks (which are much smaller than kmalloc)
or sems. Now if kmalloc didn't have a free slab available, and needed to
go to the buddy list, this gets expensive, especially if you have to
contend with other processes doing the same.

With the static global variable method, you only have to worry about
processes (and interrupts) that are contending for your data. This can
be very efficient, especially if the data IS shared with an interrupt
handler. And if you want to be more efficient, just use the normal
spin_lock after disabling just your interrupt. Now you don't stop other
interrupts coming in, and still can work with your own global data.

Since the original poster was talking about local data, and I'm talking
about global, I sometimes use global variables for just local use, but
you need to lock the data so that on SMP, or PREEMPT you don't worry
about reentry. I haven't clocked the speed of sem compared to kmalloc.
But I would think that the sem functions are still quicker.

Like I mentioned before, each case is different. I do use kmalloc when
I find that there will be too much contention with the data, or that I
would need to lock the data for long periods of time. Then again, a sem
may work too.

-- Steve

2005-04-03 07:10:37

by Manfred Spraul

[permalink] [raw]

Subject: Re: kernel stack size

Steven Rostedt wrote:

>>Have you benchmarked your own memory manager?
>>kmalloc(1024, GFP_KERNEL) is something like 17 instructions on i386
>>uniprocessor.
>>
>>
>
>Where did you get that? I'm looking at the assembly of it right now and
>it's much larger than 17 instructions. Not to mention that it calls the
>slab functions which might have to invoke the buddy system.
>
>
>
Have you looked at kmem_cache_alloc? kmalloc(1024, GFP_KERNEL) is
compile-time replaced with the appropriate kmem_cache_alloc call. And
the fast path within kmem_cache_alloc is 17 instructions long. Best
case: uniprocessor, no regparams. Unfortunately with cli and popfd, thus
something like 35 cpu cycles on an Athlon 64.

> I haven't clocked the speed of sem compared to kmalloc.
>But I would think that the sem functions are still quicker.
>
>
>
Yes - sem or spin locks are quicker as long as no cache line transfers
are necessary. If the semaphore is accessed by multiple cpus, then
kmalloc would be faster: slab tries hard to avoid taking global locks.
I'm not speaking about contention, just the cache line ping pong for
acquiring a free semaphore.

--
Manfred

2005-04-03 18:01:57

by Steven Rostedt

[permalink] [raw]

Subject: Re: kernel stack size

On Sun, 2005-04-03 at 09:10 +0200, Manfred Spraul wrote:

> Yes - sem or spin locks are quicker as long as no cache line transfers
> are necessary. If the semaphore is accessed by multiple cpus, then
> kmalloc would be faster: slab tries hard to avoid taking global locks.
> I'm not speaking about contention, just the cache line ping pong for
> acquiring a free semaphore.

Without contention, is there still a problem with cache line ping pong
of acquiring a free semaphore?

I mean, say only one task is using a given semaphore. Is there still
going to be cache line transfers for acquiring it? Even if the task in
question stays on a CPU. Is the "LOCK" on an instruction that expensive
even if the other CPUs haven't accessed that location of memory.

Sorry for my ignorance, I don't know all the interworkings of the Cache
on SMP systems. Is there any good references on the Internet? I
definitely want to know so that my coding practices for SMP improve.

Thanks,

-- Steve

2005-04-03 19:23:45

by Manfred Spraul

[permalink] [raw]

Subject: Re: kernel stack size

Steven Rostedt wrote:

>On Sun, 2005-04-03 at 09:10 +0200, Manfred Spraul wrote:
>
>
>
>>Yes - sem or spin locks are quicker as long as no cache line transfers
>>are necessary. If the semaphore is accessed by multiple cpus, then
>>kmalloc would be faster: slab tries hard to avoid taking global locks.
>>I'm not speaking about contention, just the cache line ping pong for
>>acquiring a free semaphore.
>>
>>
>
>Without contention, is there still a problem with cache line ping pong
>of acquiring a free semaphore?
>
>I mean, say only one task is using a given semaphore. Is there still
>going to be cache line transfers for acquiring it? Even if the task in
>question stays on a CPU. Is the "LOCK" on an instruction that expensive
>even if the other CPUs haven't accessed that location of memory.
>
>
>
No. If everything is cpu-local, then there are obviously no cache line
transfers. LOCK is not that expensive. On a Pentium 3, it was 20 cpu
cycles. On an Athlon 64, it's virtually free.

--
Manfred