2005-01-13 00:11:27

by Russell King

[permalink] [raw]
Subject: [BUG] ATA over Ethernet __init calling __exit

static void __exit
aoe_exit(void)
{
...
}

static int __init
aoe_init(void)
{
...
aoe_exit();
...
}

Enough said, please shoot the author of that in the foot. You can't
call functions marked __exit (which may be discarded) from functions
which aren't. If you want to do this, please loose the __exit on
aoe_exit().

In addition, please shoot the author in the other foot for:

config ATA_OVER_ETH
tristate "ATA over Ethernet support"
depends on NET
default m <==== this line.

That's not nice for embedded guys who do a "make xxx_defconfig" and
unsuspectingly end up with ATA over Ethernet built in for their
platform.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core


2005-01-13 08:50:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

On Thu, Jan 13 2005, Russell King wrote:
> In addition, please shoot the author in the other foot for:
>
> config ATA_OVER_ETH
> tristate "ATA over Ethernet support"
> depends on NET
> default m <==== this line.
>
> That's not nice for embedded guys who do a "make xxx_defconfig" and
> unsuspectingly end up with ATA over Ethernet built in for their
> platform.

That annoyed me, too. There's no reason for standard kernel driver
modules to assume they should be selected, especially true for something
as special case as ata over ethernet.

--
Jens Axboe

2005-01-13 20:23:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

Jens Axboe <[email protected]> writes:

> On Thu, Jan 13 2005, Russell King wrote:
>> In addition, please shoot the author in the other foot for:
>>
>> config ATA_OVER_ETH
>> tristate "ATA over Ethernet support"
>> depends on NET
>> default m <==== this line.
>>
>> That's not nice for embedded guys who do a "make xxx_defconfig" and
>> unsuspectingly end up with ATA over Ethernet built in for their
>> platform.
>
> That annoyed me, too. There's no reason for standard kernel driver
> modules to assume they should be selected, especially true for something
> as special case as ata over ethernet.

In general I think it was a bad idea to merge this driver at all.
The protocol is obviously broken by design - they use a 16 bit sequence
number space which has been proven for many years (in ip fragmentation)
to be far too small for modern network performance.

Also the memory allocation without preallocation in the block write
out path looks quite broken too and will most likely will lead to deadlocks
under high load.

(I wrote a detailed review when it was posted but apparently it
disappeared or I never got any answer at least)

IMHO this thing should have never been merged in this form. Can it
still be backed out?

-Andi

2005-01-13 20:58:37

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

Andi Kleen <[email protected]> writes:

...
> In general I think it was a bad idea to merge this driver at all.
> The protocol is obviously broken by design - they use a 16 bit sequence
> number space which has been proven for many years (in ip fragmentation)
> to be far too small for modern network performance.

While that experience may apply well to IP, this is a non-IP protocol
for a single LAN. For any given AoE device, there are only a few
outstanding packets at any given time.

For existing AoE devices that number of outstanding packets is only
three! So, with only three packets on the wire at any time for a
given device, 16 bits is overkill. In fact, the AoE protocol allows
the AoE device to specify how many outstanding packets it supports.
That number is only 16 bits wide.

If it ever did become desirable, we could use a couple more bits for
the sequence number by borrowing from the low bits of jiffies that we
use to estimate the RTT, but it doesn't seem likely to ever be
desirable.

> Also the memory allocation without preallocation in the block write
> out path looks quite broken too and will most likely will lead to deadlocks
> under high load.
>
> (I wrote a detailed review when it was posted but apparently it
> disappeared or I never got any answer at least)

I think you're talking about your suggestion that the skb allocation
could lead to a deadlock. If I'm correct, this issue is similar to
the one that led us to create a mempool for the buf structs we use.

> IMHO this thing should have never been merged in this form. Can it
> still be backed out?

The sequence number is a non-issue, and for the deadlock danger you
originally suggested that we add a warning against using aoe storage
for swap. We could add such a warning to the driver's documentation
and start working on a patch that avoids deadlocking on the skb
allocation. There's no need to panic and force users go get a third
party driver.

--
Ed L Cashin <[email protected]>

2005-01-13 21:59:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

On Thu, Jan 13, 2005 at 03:52:05PM -0500, Ed L Cashin wrote:
> Andi Kleen <[email protected]> writes:
>
> ...
> > In general I think it was a bad idea to merge this driver at all.
> > The protocol is obviously broken by design - they use a 16 bit sequence
> > number space which has been proven for many years (in ip fragmentation)
> > to be far too small for modern network performance.
>
> While that experience may apply well to IP, this is a non-IP protocol
> for a single LAN. For any given AoE device, there are only a few
> outstanding packets at any given time.
>
> For existing AoE devices that number of outstanding packets is only
> three! So, with only three packets on the wire at any time for a
> given device, 16 bits is overkill. In fact, the AoE protocol allows
> the AoE device to specify how many outstanding packets it supports.
> That number is only 16 bits wide.
>
> If it ever did become desirable, we could use a couple more bits for

It likely will if someone ever adds significant write cache to such
devices.

> the sequence number by borrowing from the low bits of jiffies that we
> use to estimate the RTT, but it doesn't seem likely to ever be
> desirable.

Can this be done now?

>
> > Also the memory allocation without preallocation in the block write
> > out path looks quite broken too and will most likely will lead to deadlocks
> > under high load.
> >
> > (I wrote a detailed review when it was posted but apparently it
> > disappeared or I never got any answer at least)
>
> I think you're talking about your suggestion that the skb allocation
> could lead to a deadlock. If I'm correct, this issue is similar to
> the one that led us to create a mempool for the buf structs we use.

For the skbuffs? I don't think it's possible to preallocate them
because the network stack (intentionally) misses hooks to give
them back to you.

BTW iirc your submit patch did too much allocations anyways because
in modern Linux skbs you can just stick a pointer to the page
into the skb when the device announces NETIF_F_SG.

>
> > IMHO this thing should have never been merged in this form. Can it
> > still be backed out?
>
> The sequence number is a non-issue, and for the deadlock danger you
> originally suggested that we add a warning against using aoe storage
> for swap. We could add such a warning to the driver's documentation

It's not only swap, the same problem occurs when any file on such
a device is writeable mmaped.

-Andi

2005-01-13 22:21:36

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

Andi Kleen <[email protected]> writes:

> On Thu, Jan 13, 2005 at 03:52:05PM -0500, Ed L Cashin wrote:
>> Andi Kleen <[email protected]> writes:
>>
>> ...
>> > In general I think it was a bad idea to merge this driver at all.
>> > The protocol is obviously broken by design - they use a 16 bit sequence
>> > number space which has been proven for many years (in ip fragmentation)
>> > to be far too small for modern network performance.
>>
>> While that experience may apply well to IP, this is a non-IP protocol
>> for a single LAN. For any given AoE device, there are only a few
>> outstanding packets at any given time.
>>
>> For existing AoE devices that number of outstanding packets is only
>> three! So, with only three packets on the wire at any time for a
>> given device, 16 bits is overkill. In fact, the AoE protocol allows
>> the AoE device to specify how many outstanding packets it supports.
>> That number is only 16 bits wide.
>>
>> If it ever did become desirable, we could use a couple more bits for
>
> It likely will if someone ever adds significant write cache to such
> devices.
>
>> the sequence number by borrowing from the low bits of jiffies that we
>> use to estimate the RTT, but it doesn't seem likely to ever be
>> desirable.
>
> Can this be done now?

It seems rash to make the change now, because there is no need for it.

>> > Also the memory allocation without preallocation in the block write
>> > out path looks quite broken too and will most likely will lead to deadlocks
>> > under high load.
>> >
>> > (I wrote a detailed review when it was posted but apparently it
>> > disappeared or I never got any answer at least)
>>
>> I think you're talking about your suggestion that the skb allocation
>> could lead to a deadlock. If I'm correct, this issue is similar to
>> the one that led us to create a mempool for the buf structs we use.
>
> For the skbuffs? I don't think it's possible to preallocate them
> because the network stack (intentionally) misses hooks to give
> them back to you.

For the skbuffs, we could use a mempool with GFP_NOIO allocation and
then skb_get them before the network layer ever sees them. We already
keep track of the packets that we've sent out, so we'll just keep a
pointer to the skbuffs.

> BTW iirc your submit patch did too much allocations anyways because
> in modern Linux skbs you can just stick a pointer to the page
> into the skb when the device announces NETIF_F_SG.

I thought there was some shared information at the end of the data,
but that's interesting, thanks. I'll look into it.

--
Ed L Cashin <[email protected]>

2005-01-13 22:53:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

On Thu, Jan 13, 2005 at 05:12:21PM -0500, Ed L Cashin wrote:

> It seems rash to make the change now, because there is no need for it.

It would be better to future proof the driver at least a bit.

>
> For the skbuffs, we could use a mempool with GFP_NOIO allocation and
> then skb_get them before the network layer ever sees them. We already
> keep track of the packets that we've sent out, so we'll just keep a
> pointer to the skbuffs.

That won't work because you don't know when the driver is done with it.

There is a callback, but it's already used by socket buffer management
(if you don't use a socket it may be usable)


-Andi

2005-01-13 23:21:53

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

Andi Kleen <[email protected]> writes:

> On Thu, Jan 13, 2005 at 05:12:21PM -0500, Ed L Cashin wrote:
>
>> It seems rash to make the change now, because there is no need for it.
>
> It would be better to future proof the driver at least a bit.
>
>>
>> For the skbuffs, we could use a mempool with GFP_NOIO allocation and
>> then skb_get them before the network layer ever sees them. We already
>> keep track of the packets that we've sent out, so we'll just keep a
>> pointer to the skbuffs.
>
> That won't work because you don't know when the driver is done with it.
>
> There is a callback, but it's already used by socket buffer management

The destructor member of struct sk_buff, right?

> (if you don't use a socket it may be usable)

You mean a struct sock? No, we don't use that because we're not IP.
Thanks for the lead.

--
Ed L Cashin <[email protected]>

2005-01-13 23:47:03

by Alan

[permalink] [raw]
Subject: Re: [BUG] ATA over Ethernet __init calling __exit

On Iau, 2005-01-13 at 21:53, Andi Kleen wrote:
> > If it ever did become desirable, we could use a couple more bits for
>
> It likely will if someone ever adds significant write cache to such
> devices.

ATA doesn't support tagged queueing. Therefore write cache is
irrelevant.

Alan