Hi,
I was looking at stack usage in drivers/media/video/v4l1-compat.c::
v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee.
It's fairly straightforward to change the stack space to
kmalloc() space, but some of these functions (ioctls) are
speed-critical, so I was wondering if the changes should still
be done, or done only in some cases and not others, or wait
until an oops is reported here....
Comments?
Thanks,
~Randy
"Randy.Dunlap" <[email protected]> wrote:
>
> Hi,
>
> I was looking at stack usage in drivers/media/video/v4l1-compat.c::
> v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee.
>
> It's fairly straightforward to change the stack space to
> kmalloc() space, but some of these functions (ioctls) are
> speed-critical, so I was wondering if the changes should still
> be done, or done only in some cases and not others, or wait
> until an oops is reported here....
>
Well the function is doing a memset() of 2k's worth of memory already,
so it presumably doesn't care much.
Your kmalloc will be a small cost compared with that.
"Randy.Dunlap" <[email protected]> writes:
> Hi,
>
> I was looking at stack usage in drivers/media/video/v4l1-compat.c::
> v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee.
Whoops. Hmm. I assumed having the variable declarations within the
case blocks where they are actually used helps to keep the stack size
small, i.e. use this ...
do_ioctl()
{
switch (cmd)
case FOO:
{
struct foo;
...
}
case BAR:
{
struct bar;
...
}
...
}
... instead of this:
do_ioctl()
{
struct foo;
struct bar;
switch (cmd) {
...
}
}
But when looking at the disasm output it is obvious that it isn't true
(at least with gcc 3.2). On the other hand it is common practice in
many drivers, there must be a reason for that, no? Any chance this
used to work with older gcc versions?
> It's fairly straightforward to change the stack space to
> kmalloc() space, but some of these functions (ioctls) are
> speed-critical, so I was wondering if the changes should still
> be done, or done only in some cases and not others, or wait
> until an oops is reported here....
Although v4l_compat_translate_ioctl() is probably one of the worst
offenders, this issue likely exists for lots of other ioctl functions
too: Lot of stack space is allocated but never used because the
variables for all cases are allocated but only one of the switch cases
actually runs ...
Not sure what is the best idea to fix that. Don't like the kmalloc
idea that much. The individual structs are not huge, the real problem
is that many of them are allocated and only few are needed. Any
chance to tell gcc that it should allocate block-local variables at
the start block not at the start of the function?
Gerd
--
/join #zonenkinder
On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote:
> But when looking at the disasm output it is obvious that it isn't true
> (at least with gcc 3.2). On the other hand it is common practice in
> many drivers, there must be a reason for that, no? Any chance this
> used to work with older gcc versions?
I don't believe so - I seem to remember looking at gcc 2.95 and finding
the same annoying behaviour.
> Not sure what is the best idea to fix that. Don't like the kmalloc
> idea that much. The individual structs are not huge, the real problem
> is that many of them are allocated and only few are needed. Any
> chance to tell gcc that it should allocate block-local variables at
> the start block not at the start of the function?
Not a particularly clean idea, but maybe creating a union of the
structures and putting that on the stack? (ie, doing what GCC should
be doing in the first place.)
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On the topic of v4l, Ive had complete lockups on recent 2.5.x kernels
when loading a v4l application Zapping. I havent tried anyother v4l
applications just yet, have you seen this in action? should i compile
debuging support to show it? /var/log/messages currently doesnt show it.
Running Debian Sid/Gcc 3.2.3 20030228
--
mdew <[email protected]>
mdew <[email protected]> writes:
> On the topic of v4l, Ive had complete lockups on recent 2.5.x kernels
> when loading a v4l application Zapping.
There are a few known issues, pushing updates to Linus is on my todo list.
If it still happens with the latest bits from http://bytesex.org/snapshot/
I'd like to know.
Gerd
--
/join #zonenkinder
On Wed, 5 Mar 2003 09:35:34 +0000
Russell King <[email protected]> wrote:
| On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote:
| > But when looking at the disasm output it is obvious that it isn't true
| > (at least with gcc 3.2). On the other hand it is common practice in
| > many drivers, there must be a reason for that, no? Any chance this
| > used to work with older gcc versions?
|
| I don't believe so - I seem to remember looking at gcc 2.95 and finding
| the same annoying behaviour.
Yes, that's what I'm seeing with 2.96 as well.
| > Not sure what is the best idea to fix that. Don't like the kmalloc
| > idea that much. The individual structs are not huge, the real problem
| > is that many of them are allocated and only few are needed. Any
| > chance to tell gcc that it should allocate block-local variables at
| > the start block not at the start of the function?
|
| Not a particularly clean idea, but maybe creating a union of the
| structures and putting that on the stack? (ie, doing what GCC should
| be doing in the first place.)
That's an idea. Or make separate called functions for each ioctl and declare
the structures inside them?
--
~Randy
Here are some v4l structure sizes from gcc 2.96 on a P4:
sizeof video_audio: 40
sizeof video_buffer: 20
sizeof video_capability: 60
sizeof video_channel: 48
sizeof video_mbuf: 136
sizeof video_mmap: 16
sizeof video_picture: 14
sizeof video_tuner: 52
sizeof video_window: 32
sizeof v4l2_audio: 52
sizeof v4l2_buffer: 68
sizeof v4l2_capability: 104
sizeof v4l2_control: 8
sizeof v4l2_fmtdesc: 64
sizeof v4l2_format: 204
sizeof v4l2_framebuffer: 44
sizeof v4l2_frequency: 44
sizeof v4l2_input: 76
sizeof v4l2_queryctrl: 68
sizeof v4l2_requestbuffers: 20
sizeof v4l2_tuner: 84
On Wed, 5 March 2003 07:34:37 -0800, Randy.Dunlap wrote:
> Russell King <[email protected]> wrote:
> | On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote:
>
> | > Not sure what is the best idea to fix that. Don't like the kmalloc
> | > idea that much. The individual structs are not huge, the real problem
> | > is that many of them are allocated and only few are needed. Any
> | > chance to tell gcc that it should allocate block-local variables at
> | > the start block not at the start of the function?
> |
> | Not a particularly clean idea, but maybe creating a union of the
> | structures and putting that on the stack? (ie, doing what GCC should
> | be doing in the first place.)
>
> That's an idea. Or make separate called functions for each ioctl and declare
> the structures inside them?
As far as I have seen, at least ds_ioctl uses the union trick. And the
three marked (*) ioctl-functions appear to be suffering from the same
gcc inability.
How much complaining is necessary for the gcc folks to worry about
this?
0xc0ad420b <snd_emu10k1_fx8010_ioctl+3/5ec>: sub $0x814,%esp
0xc06dbd77 <v4l_compat_translate_ioctl+3/159c>: sub $0x804,%esp
0xc0521e43 <ida_ioctl+3/388>: sub $0x528,%esp
* 0xc01ed733 <presto_ioctl+3/13c4>: sub $0x4d0,%esp
* 0xc0bfbfd3 <br_ioctl_device+3/454>: sub $0x474,%esp
0xc07968bb <megadev_ioctl+3/c94>: sub $0x394,%esp
* 0xc068ff5b <ray_dev_ioctl+3/8b8>: sub $0x2e0,%esp
0xc0583a63 <netdev_ethtool_ioctl+3/e64>: sub $0x2c8,%esp
0xc0865963 <ds_ioctl+3/6b0>: sub $0x2ac,%esp
0xc04370c3 <vt_ioctl+3/1a84>: sub $0x29c,%esp
0xc054c5eb <e1000_ethtool_ioctl+3/6f8>: sub $0x290,%esp
0xc0299b23 <ncp_ioctl+3/13b0>: sub $0x278,%esp
0xc0532e67 <DAC960_UserIOCTL+3/17c0>: sub $0x230,%esp
J?rn
--
Optimizations always bust things, because all optimizations are, in
the long haul, a form of cheating, and cheaters eventually get caught.
-- Larry Wall
> On Wed, 5 Mar 2003 09:35:34 +0000
> Russell King <[email protected]> wrote:
>
> | On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote:
> | > But when looking at the disasm output it is obvious that it isn't true |
>> (at least with gcc 3.2). On the other hand it is common practice in | >
> many drivers, there must be a reason for that, no? Any chance this | > used
> to work with older gcc versions?
> |
> | I don't believe so - I seem to remember looking at gcc 2.95 and finding |
> the same annoying behaviour.
AFAIK we have no reason to believe that this will work and we have
counter-examples of it continuing to be a problem.
> Yes, that's what I'm seeing with 2.96 as well.
>
> | > Not sure what is the best idea to fix that. Don't like the kmalloc | >
> idea that much. The individual structs are not huge, the real problem | >
> is that many of them are allocated and only few are needed. Any | > chance
> to tell gcc that it should allocate block-local variables at | > the start
> block not at the start of the function?
Not that anyone has mentioned here.
> | Not a particularly clean idea, but maybe creating a union of the |
> structures and putting that on the stack? (ie, doing what GCC should | be
> doing in the first place.)
I looked into this. The various structures are used in a variety
of combinations, so this method looks bad IMO.
> That's an idea. Or make separate called functions for each ioctl and
> declare the structures inside them?
and this method looks like a possibility.
Gerd, Andrew says that the kmalloc() time will be small compared to
the memset()'s that the function is already doing.
Do you want to do anything about this, or want me to, or want me
to drop it?
> --
>
> Here are some v4l structure sizes from gcc 2.96 on a P4:
>
> sizeof v4l2_audio: 52
> sizeof v4l2_buffer: 68
> sizeof v4l2_capability: 104
> sizeof v4l2_fmtdesc: 64
> sizeof v4l2_format: 204
> sizeof v4l2_framebuffer: 44
> sizeof v4l2_frequency: 44
> sizeof v4l2_input: 76
> sizeof v4l2_queryctrl: 68
> sizeof v4l2_requestbuffers: 20
> sizeof v4l2_tuner: 84
> -
And here's a summary of how they are used together:
sizeof v4l2_audio: 52 + queryctrl:68 + tuner:84 / +tuner:84
sizeof v4l2_buffer: 68 + format:204 / alone
sizeof v4l2_capability: 104 + framebuffer:44
sizeof v4l2_framebuffer: 44 + capability:104 / + format:204
sizeof v4l2_frequency: 44 alone
sizeof v4l2_fmtdesc: 64 + format:204
sizeof v4l2_format: 204 + fmtdesc:64 / + buffer:68
sizeof v4l2_input: 76 alone / alone
sizeof v4l2_queryctrl: 68 + audio:52 + tuner:84
sizeof v4l2_tuner: 84 alone / + audio:52 + queryctrl:68 / + audio:52
~Randy
> > That's an idea. Or make separate called functions for each ioctl and
> > declare the structures inside them?
>
> and this method looks like a possibility.
>
> Gerd, Andrew says that the kmalloc() time will be small compared to
> the memset()'s that the function is already doing.
That is wrong, at least the 2k memset/call mentioned by Andrew. There
are lots of memset() calls, but they all are within the case switches
for the ioctls and zero out only the structs which are used in that code
path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending
on the ioctl).
> Do you want to do anything about this, or want me to, or want me
> to drop it?
It is on my todo list, just havn't found yet the time to do that.
Probably I'll split it into smaller functions, this looks like the
best way to me.
Gerd
On Tue, 2003-03-11 at 09:19, Gerd Knorr wrote
> That is wrong, at least the 2k memset/call mentioned by Andrew. There
> are lots of memset() calls, but they all are within the case switches
> for the ioctls and zero out only the structs which are used in that code
> path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending
> on the ioctl).
gcc sometimes does things like allocate all the objects in case
statements at entry time. I assume its a performance win to do so.
Alan
On Tue, Mar 11, 2003 at 04:29:30PM +0000, Alan Cox wrote:
> On Tue, 2003-03-11 at 09:19, Gerd Knorr wrote
>
> > That is wrong, at least the 2k memset/call mentioned by Andrew. There
> > are lots of memset() calls, but they all are within the case switches
> > for the ioctls and zero out only the structs which are used in that code
> > path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending
> > on the ioctl).
>
> gcc sometimes does things like allocate all the objects in case
> statements at entry time. I assume its a performance win to do so.
No, it's more likely a known GCC bug to do so. See PR middle-end/9997
if you're really curious.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer