2010-02-25 23:17:35

by Steven J. Magnani

[permalink] [raw]
Subject: Buggy variable-length array code...or compiler?

When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
system crashes after about 400 iterations. After much head scratching, I
believe I've narrowed the problem to this fragment of code in
drivers/dma/dmatest.c:

static int dmatest_func(void *data)
{
struct dmatest_thread *thread = data;
...
unsigned int total_tests = 0;
int src_cnt;
int dst_cnt;

...
if (thread->type == DMA_MEMCPY)
src_cnt = dst_cnt = 1;
...

while (!kthread_should_stop()
&& !(iterations && total_tests >= iterations)) {
struct dma_device *dev = chan->device;
struct dma_async_tx_descriptor *tx = NULL;
dma_addr_t dma_srcs[src_cnt];
dma_addr_t dma_dsts[dst_cnt];

...
total_tests++;

/* CODE ADDED BY ME FOR DEBUG */
printk("dmatest: Iteration %d, dma_srcs = %p\n",
total_tests, dma_srcs);

...
}

With this code I get output like this:

dmatest: Iteration 1, dma_srcs = 2c963ee8
dmatest: Iteration 2, dma_srcs = 2c963ed8
dmatest: Iteration 3, dma_srcs = 2c963ec8
dmatest: Iteration 4, dma_srcs = 2c963eb8
...
dmatest: Iteration 420, dma_srcs = 2c9624b8

...and then the stack detonates and the kernel crashes with some strange
error or other.

Are there any language lawyers in the house who'd care to weigh in on
which of these possibilities is the right one?

1. There is a coding error in dmatest
2. There is a bug specific to Microblaze gcc compiler(s) [mine is 4.1.2]
3. There is a bug generic to specific versions of gcc compilers
4. There is a bug generic to all gcc compilers

Obviously, the options get more disturbing the higher you go. I don't
know if VLAs are used elsewhere in the kernel; a 'smatch' search might
be helpful.

Regards,
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>



2010-02-25 23:23:49

by Samuel Thibault

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

IIRC C99 allows VLAs to not be freed immediately, but at function
termination, which makes things way simpler to implement for compilers.
I've noticed it on Solaris and cell's spugcc.

Steven J. Magnani, le Thu 25 Feb 2010 17:17:29 -0600, a ?crit :
> 1. There is a coding error in dmatest

It needs to get out of the function from times to times to free the
VLAs.

> 2. There is a bug specific to Microblaze gcc compiler(s) [mine is 4.1.2]
> 3. There is a bug generic to specific versions of gcc compilers
> 4. There is a bug generic to all gcc compilers

I've usually seen gcc free VLAs as soon as possible, so I believe it's
only a few targets.

Samuel

2010-02-25 23:46:58

by David Rientjes

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Thu, 25 Feb 2010, Steven J. Magnani wrote:

> When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
> system crashes after about 400 iterations. After much head scratching, I
> believe I've narrowed the problem to this fragment of code in
> drivers/dma/dmatest.c:
>
> static int dmatest_func(void *data)
> {
> struct dmatest_thread *thread = data;
> ...
> unsigned int total_tests = 0;
> int src_cnt;
> int dst_cnt;
>
> ...
> if (thread->type == DMA_MEMCPY)
> src_cnt = dst_cnt = 1;
> ...
>
> while (!kthread_should_stop()
> && !(iterations && total_tests >= iterations)) {
> struct dma_device *dev = chan->device;
> struct dma_async_tx_descriptor *tx = NULL;
> dma_addr_t dma_srcs[src_cnt];
> dma_addr_t dma_dsts[dst_cnt];
>
> ...
> total_tests++;
>
> /* CODE ADDED BY ME FOR DEBUG */
> printk("dmatest: Iteration %d, dma_srcs = %p\n",
> total_tests, dma_srcs);
>
> ...
> }
>
> With this code I get output like this:
>
> dmatest: Iteration 1, dma_srcs = 2c963ee8
> dmatest: Iteration 2, dma_srcs = 2c963ed8
> dmatest: Iteration 3, dma_srcs = 2c963ec8
> dmatest: Iteration 4, dma_srcs = 2c963eb8
> ...
> dmatest: Iteration 420, dma_srcs = 2c9624b8
>
> ...and then the stack detonates and the kernel crashes with some strange
> error or other.
>

This could probably become the first kernel user of the flexible array
library (see Documentation/flexible-arrays.txt). Dan?

2010-02-26 00:46:29

by J.A. Magallón

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Thu, 25 Feb 2010 17:17:29 -0600, "Steven J. Magnani" <[email protected]> wrote:

> When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
> system crashes after about 400 iterations. After much head scratching, I
> believe I've narrowed the problem to this fragment of code in
> drivers/dma/dmatest.c:
>
> static int dmatest_func(void *data)
> {
> struct dmatest_thread *thread = data;
> ...
> unsigned int total_tests = 0;
> int src_cnt;
> int dst_cnt;
>
> ...
> if (thread->type == DMA_MEMCPY)
> src_cnt = dst_cnt = 1;
> ...
>
> while (!kthread_should_stop()
> && !(iterations && total_tests >= iterations)) {
> struct dma_device *dev = chan->device;
> struct dma_async_tx_descriptor *tx = NULL;
> dma_addr_t dma_srcs[src_cnt];
> dma_addr_t dma_dsts[dst_cnt];
>
> ...
> total_tests++;
>
> /* CODE ADDED BY ME FOR DEBUG */
> printk("dmatest: Iteration %d, dma_srcs = %p\n",
> total_tests, dma_srcs);
>
> ...
> }
>
> With this code I get output like this:
>
> dmatest: Iteration 1, dma_srcs = 2c963ee8
> dmatest: Iteration 2, dma_srcs = 2c963ed8
> dmatest: Iteration 3, dma_srcs = 2c963ec8
> dmatest: Iteration 4, dma_srcs = 2c963eb8
> ...
> dmatest: Iteration 420, dma_srcs = 2c9624b8
>
> ...and then the stack detonates and the kernel crashes with some strange
> error or other.
>
> Are there any language lawyers in the house who'd care to weigh in on
> which of these possibilities is the right one?
>
> 1. There is a coding error in dmatest
> 2. There is a bug specific to Microblaze gcc compiler(s) [mine is 4.1.2]
> 3. There is a bug generic to specific versions of gcc compilers
> 4. There is a bug generic to all gcc compilers
>
> Obviously, the options get more disturbing the higher you go. I don't
> know if VLAs are used elsewhere in the kernel; a 'smatch' search might
> be helpful.

Can you try this in userspace ? I compiled in CentOS gcc 4.1.2 (just the
same), and the addresses are always the same:

#include <stdio.h>

int main()
{
int c = 2;
while (1)
{
int a[c];
int b[c];
a[0] = b[0];
printf("%p %p\n",a,b);
}
}

Could you post the full contents of the while loop ? Perhaps there's a
buglet in other piece of code that leaves something on the stack.
Which is the size of dma_addr_t ? Does it match the difference of 16 bytes
on each iteration ? cnt's are always 1, isn't it ?
Can you switch the size to a fixed '1' to see if this hangs again ?

TIA

--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free

2010-02-26 10:12:12

by Dan Carpenter

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Thu, Feb 25, 2010 at 05:17:29PM -0600, Steven J. Magnani wrote:
> When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
> system crashes after about 400 iterations. After much head scratching, I
> believe I've narrowed the problem to this fragment of code in
> drivers/dma/dmatest.c:
>
> static int dmatest_func(void *data)
> {
> struct dmatest_thread *thread = data;
> ...
> unsigned int total_tests = 0;
> int src_cnt;
> int dst_cnt;
>
> ...
> if (thread->type == DMA_MEMCPY)
> src_cnt = dst_cnt = 1;
> ...
>
> while (!kthread_should_stop()
> && !(iterations && total_tests >= iterations)) {
> struct dma_device *dev = chan->device;
> struct dma_async_tx_descriptor *tx = NULL;
> dma_addr_t dma_srcs[src_cnt];
> dma_addr_t dma_dsts[dst_cnt];
>
> ...
> total_tests++;
>
> /* CODE ADDED BY ME FOR DEBUG */
> printk("dmatest: Iteration %d, dma_srcs = %p\n",
> total_tests, dma_srcs);
>
> ...
> }
>
> With this code I get output like this:
>
> dmatest: Iteration 1, dma_srcs = 2c963ee8
> dmatest: Iteration 2, dma_srcs = 2c963ed8
> dmatest: Iteration 3, dma_srcs = 2c963ec8
> dmatest: Iteration 4, dma_srcs = 2c963eb8
> ...
> dmatest: Iteration 420, dma_srcs = 2c9624b8
>
> ...and then the stack detonates and the kernel crashes with some strange
> error or other.
>
> Are there any language lawyers in the house who'd care to weigh in on
> which of these possibilities is the right one?
>
> 1. There is a coding error in dmatest
> 2. There is a bug specific to Microblaze gcc compiler(s) [mine is 4.1.2]
> 3. There is a bug generic to specific versions of gcc compilers
> 4. There is a bug generic to all gcc compilers
>
> Obviously, the options get more disturbing the higher you go. I don't
> know if VLAs are used elsewhere in the kernel; a 'smatch' search might
> be helpful.
>

Sparse complains about variable length arrays.

drivers/dma/dmatest.c:293:37: error: bad constant expression

In my allmodconfig with staging:
~/progs/kernel/devel$ grep "bad constant expression" warns.txt | sort -u | wc -l
133

regards,
dan carpenter


> Regards,
> ------------------------------------------------------------------------
> Steven J. Magnani "I claim this network for MARS!
> http://www.digidescorp.com Earthling, return my space modulator!"
>
> #include <standard.disclaimer>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-02-26 10:28:03

by Dan Carpenter

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Thu, Feb 25, 2010 at 03:46:48PM -0800, David Rientjes wrote:
> On Thu, 25 Feb 2010, Steven J. Magnani wrote:
>
> > When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
> > system crashes after about 400 iterations. After much head scratching, I
> > believe I've narrowed the problem to this fragment of code in
> > drivers/dma/dmatest.c:
> >
> > static int dmatest_func(void *data)
> > {
> > struct dmatest_thread *thread = data;
> > ...
> > unsigned int total_tests = 0;
> > int src_cnt;
> > int dst_cnt;
> >
> > ...
> > if (thread->type == DMA_MEMCPY)
> > src_cnt = dst_cnt = 1;
> > ...
> >
> > while (!kthread_should_stop()
> > && !(iterations && total_tests >= iterations)) {
> > struct dma_device *dev = chan->device;
> > struct dma_async_tx_descriptor *tx = NULL;
> > dma_addr_t dma_srcs[src_cnt];
> > dma_addr_t dma_dsts[dst_cnt];
> >
> > ...
> > total_tests++;
> >
> > /* CODE ADDED BY ME FOR DEBUG */
> > printk("dmatest: Iteration %d, dma_srcs = %p\n",
> > total_tests, dma_srcs);
> >
> > ...
> > }
> >
> > With this code I get output like this:
> >
> > dmatest: Iteration 1, dma_srcs = 2c963ee8
> > dmatest: Iteration 2, dma_srcs = 2c963ed8
> > dmatest: Iteration 3, dma_srcs = 2c963ec8
> > dmatest: Iteration 4, dma_srcs = 2c963eb8
> > ...
> > dmatest: Iteration 420, dma_srcs = 2c9624b8
> >
> > ...and then the stack detonates and the kernel crashes with some strange
> > error or other.
> >
>
> This could probably become the first kernel user of the flexible array
> library (see Documentation/flexible-arrays.txt). Dan?
> --

I think the max that src_cnt can be is 3 and the most dst_cnt can be is 2.
We could just put that there.

regards,
dan carpenter

2010-02-26 17:43:24

by Samuel Thibault

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

J.A. Magall?n, le Fri 26 Feb 2010 01:46:23 +0100, a ?crit :
> int main()
> {
> int c = 2;
> while (1)
> {
> int a[c];
> int b[c];

c is not variable here.

Samuel

2010-02-26 18:52:11

by Steven J. Magnani

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Fri, 2010-02-26 at 01:46 +0100, J.A. Magallón wrote:

> Can you try this in userspace ? I compiled in CentOS gcc 4.1.2 (just the
> same), and the addresses are always the same:
>
> #include <stdio.h>
>
> int main()
> {
> int c = 2;
> while (1)
> {
> int a[c];
> int b[c];
> a[0] = b[0];
> printf("%p %p\n",a,b);
> }
> }

On the Microblaze:

0x2d711f1c 0x2d711f10
0x2d711f04 0x2d711ef8
0x2d711eec 0x2d711ee0
0x2d711ed4 0x2d711ec8

This happens no matter what optimization setting I compile with.

>
> Could you post the full contents of the while loop ?

It's the stock drivers/dma/dmatest.c in 2.6.33.

> Which is the size of dma_addr_t ?
Microblaze is a 32-bit arch; dma_addr_t is 32 bits.

> Does it match the difference of 16 bytes on each iteration ?
No. It appears that the data going on the stack each iteration are:

dma_srcs[0]: iteration n
total_tests: [n-2]
X
dst_off: [iteration n-1]
dma_srcs[0]: iteration n-1
total_tests [n-3]
X

> cnt's are always 1, isn't it ?
For the memcpy test, yes/

> Can you switch the size to a fixed '1' to see if this hangs again ?
It does not.

Regards,
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>


2010-02-26 19:15:06

by Steven J. Magnani

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Thu, 25 Feb 2010, Steven J. Magnani wrote:
> >
> > > When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
> > > system crashes after about 400 iterations. After much head scratching, I
> > > believe I've narrowed the problem to this fragment of code in
> > > drivers/dma/dmatest.c:
> > >
> > > static int dmatest_func(void *data)
> > > {
> > > ...
> > > int src_cnt;
> > > int dst_cnt;
> > > ...
> > > if (thread->type == DMA_MEMCPY)
> > > src_cnt = dst_cnt = 1;
> > > ...
> > > while (!kthread_should_stop()
> > > && !(iterations && total_tests >= iterations)) {
> > > ...
> > > dma_addr_t dma_srcs[src_cnt];
> > > dma_addr_t dma_dsts[dst_cnt];
> > > ...

On Thu, Feb 25, 2010 at 03:46:48PM -0800, David Rientjes wrote:
> > This could probably become the first kernel user of the flexible array
> > library (see Documentation/flexible-arrays.txt). Dan?
> > --

On Fri, 2010-02-26 at 13:27 +0300, Dan Carpenter wrote:
> I think the max that src_cnt can be is 3 and the most dst_cnt can be is 2.
> We could just put that there.

src_cnt is dependent on module parameters.

The bug goes away if dma_srcs and dma_dsts are declared outside the
loop, but that requires knocking the loop in another tabstop. At that
point dmatest_func() would be begging for refactoring.

Regards,
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>


2010-02-26 19:43:12

by Dan Williams

[permalink] [raw]
Subject: Re: Buggy variable-length array code...or compiler?

On Fri, Feb 26, 2010 at 12:15 PM, Steven J. Magnani
<[email protected]> wrote:
> On Thu, 25 Feb 2010, Steven J. Magnani wrote:
>> >
>> > > When I run a memcpy dmatest with a Microblaze 2.6.33 noMMU kernel, the
>> > > system crashes after about 400 iterations. After much head scratching, I
>> > > believe I've narrowed the problem to this fragment of code in
>> > > drivers/dma/dmatest.c:
>> > >
>> > > static int dmatest_func(void *data)
>> > > {
>> > > ...
>> > > ? ? int ? ? ? ? ? ? ? ? ? ? src_cnt;
>> > > ? ? int ? ? ? ? ? ? ? ? ? ? dst_cnt;
>> > > ...
>> > > ? ? if (thread->type == DMA_MEMCPY)
>> > > ? ? ? ? ? ? src_cnt = dst_cnt = 1;
>> > > ...
>> > > ? ? while (!kthread_should_stop()
>> > > ? ? ? ? ? ?&& !(iterations && total_tests >= iterations)) {
>> > > ...
>> > > ? ? ? ? ? ? dma_addr_t dma_srcs[src_cnt];
>> > > ? ? ? ? ? ? dma_addr_t dma_dsts[dst_cnt];
>> > > ...
>
> On Thu, Feb 25, 2010 at 03:46:48PM -0800, David Rientjes wrote:
>> > This could probably become the first kernel user of the flexible array
>> > library (see Documentation/flexible-arrays.txt). ?Dan?
>> > --
>
> On Fri, 2010-02-26 at 13:27 +0300, Dan Carpenter wrote:
>> I think the max that src_cnt can be is 3 and the most dst_cnt can be is 2.
>> We could just put that there.
>
> src_cnt is dependent on module parameters.
>
> The bug goes away if dma_srcs and dma_dsts are declared outside the
> loop, but that requires knocking the loop in another tabstop. At that
> point dmatest_func() would be begging for refactoring.
>

...and even if you refactored it there is no guarantee that gcc would
not re-inline the functions and put you back in the same situation.
Especially given the finding that J.A.'s simple test, where the size
of the array should have been constant, still failed.

--
Dan