LinuxLists.cc - Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

2008-07-30 19:36:05

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Karen Xie wrote:
> Cxgb3i iSCSI driver
>
> Signed-off-by: Karen Xie <[email protected]>
> ---
>
> drivers/scsi/cxgb3i/Kconfig | 6
> drivers/scsi/cxgb3i/Makefile | 5
> drivers/scsi/cxgb3i/cxgb3i.h | 155 +++
> drivers/scsi/cxgb3i/cxgb3i_init.c | 109 ++
> drivers/scsi/cxgb3i/cxgb3i_iscsi.c | 800 ++++++++++++++
> drivers/scsi/cxgb3i/cxgb3i_offload.c | 2001 ++++++++++++++++++++++++++++++++++
> drivers/scsi/cxgb3i/cxgb3i_offload.h | 242 ++++
> drivers/scsi/cxgb3i/cxgb3i_ulp2.c | 692 ++++++++++++
> drivers/scsi/cxgb3i/cxgb3i_ulp2.h | 106 ++
> 9 files changed, 4116 insertions(+), 0 deletions(-)
> create mode 100644 drivers/scsi/cxgb3i/Kconfig
> create mode 100644 drivers/scsi/cxgb3i/Makefile
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i.h
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_init.c
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_iscsi.c
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_offload.c
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_offload.h
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_ulp2.c
> create mode 100644 drivers/scsi/cxgb3i/cxgb3i_ulp2.h

Comments:

* SCSI drivers should be submitted via the [email protected]
mailing list.

* The driver is clean and readable, well done

* From a networking standpoint, our main concern becomes how this
interacts with the networking stack. In particular, I'm concerned based
on reading the source that this driver uses "TCP port stealing" rather
than using a totally separate MAC address (and IP).

Stealing a TCP port on an IP/interface already assigned is a common
solution in this space, but also a flawed one. Precisely because the
kernel and applications are unaware of this "special, magic TCP port"
you open the potential for application problems that are very difficult
for an admin to diagnose based on observed behavior.

So, additional information on your TCP port usage would be greatly
appreciated. Also, how does this interact with IPv6? Clearly it
interacts with IPv4...

Jeff

2008-07-30 21:36:21

by Roland Dreier

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

> * From a networking standpoint, our main concern becomes how this
> interacts with the networking stack. In particular, I'm concerned
> based on reading the source that this driver uses "TCP port stealing"
> rather than using a totally separate MAC address (and IP).
>
> Stealing a TCP port on an IP/interface already assigned is a common
> solution in this space, but also a flawed one. Precisely because the
> kernel and applications are unaware of this "special, magic TCP port"
> you open the potential for application problems that are very
> difficult for an admin to diagnose based on observed behavior.

That's true, but using a separate MAC and IP opens up a bunch of other
operational problems. I don't think the right answer for iSCSI offload
is clear yet.

- R.

2008-08-01 00:52:43

by Divy Le ray

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

On Wednesday 30 July 2008 02:35:51 pm Roland Dreier wrote:
> > * From a networking standpoint, our main concern becomes how this
> > interacts with the networking stack. In particular, I'm concerned
> > based on reading the source that this driver uses "TCP port stealing"
> > rather than using a totally separate MAC address (and IP).
> >
> > Stealing a TCP port on an IP/interface already assigned is a common
> > solution in this space, but also a flawed one. Precisely because the
> > kernel and applications are unaware of this "special, magic TCP port"
> > you open the potential for application problems that are very
> > difficult for an admin to diagnose based on observed behavior.
>
> That's true, but using a separate MAC and IP opens up a bunch of other
> operational problems. I don't think the right answer for iSCSI offload
> is clear yet.
>
> - R.

Hi Jeff,

We've considered the approach of having a separate IP/MAC addresses to manage
iSCSI connections. In such a context, the stack would have to be unaware of
this iSCSI specific IP address. The iSCSI driver would then have to implement
at least its own ARP reply mechanism. DHCP too would have to be managed
separately. Most network setting/monitoring tools would also be unavailable.

The open-iscsi initiator is not a huge consumer of TCP connections, allocating
a TCP port from the stack would be reasonable in terms of resources in this
context. It is however unclear if it is an acceptable approach.

Our current implementation was designed to be the most tolerable one
within the constraints - real or expected - aforementioned.

Cheers,
Divy

2008-08-07 18:46:36

by Divy Le ray

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

On Thursday 31 July 2008 05:51:59 pm Divy Le Ray wrote:
> On Wednesday 30 July 2008 02:35:51 pm Roland Dreier wrote:
> > > * From a networking standpoint, our main concern becomes how this
> > > interacts with the networking stack. In particular, I'm concerned
> > > based on reading the source that this driver uses "TCP port stealing"
> > > rather than using a totally separate MAC address (and IP).
> > >
> > > Stealing a TCP port on an IP/interface already assigned is a common
> > > solution in this space, but also a flawed one. Precisely because the
> > > kernel and applications are unaware of this "special, magic TCP port"
> > > you open the potential for application problems that are very
> > > difficult for an admin to diagnose based on observed behavior.
> >
> > That's true, but using a separate MAC and IP opens up a bunch of other
> > operational problems. I don't think the right answer for iSCSI offload
> > is clear yet.
> >
> > - R.
>
> Hi Jeff,
>
> We've considered the approach of having a separate IP/MAC addresses to
> manage iSCSI connections. In such a context, the stack would have to be
> unaware of this iSCSI specific IP address. The iSCSI driver would then have
> to implement at least its own ARP reply mechanism. DHCP too would have to
> be managed separately. Most network setting/monitoring tools would also be
> unavailable.
>
> The open-iscsi initiator is not a huge consumer of TCP connections,
> allocating a TCP port from the stack would be reasonable in terms of
> resources in this context. It is however unclear if it is an acceptable
> approach.
>
> Our current implementation was designed to be the most tolerable one
> within the constraints - real or expected - aforementioned.
>

Hi Jeff,

Mike Christie will not merge this code until he has an explicit
acknowledgement from netdev.

As you mentioned, the port stealing approach we've taken has its issues.
We consequently analyzed your suggestion to use a different IP/MAC address for
iSCSI and it raises other tough issues (separate ARP and DHCP management,
unavailability of common networking tools).
On these grounds, we believe our current approach is the most tolerable.
Would the stack provide a TCP port allocation service, we'd be glad to use it
to solve the current concerns.
The cxgb3i driver is up and running here, its merge is pending our decision.

Cheers,
Divy

2008-08-07 20:21:23

by Mike Christie

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Divy Le Ray wrote:
> On Thursday 31 July 2008 05:51:59 pm Divy Le Ray wrote:
>> On Wednesday 30 July 2008 02:35:51 pm Roland Dreier wrote:
>>> > * From a networking standpoint, our main concern becomes how this
>>> > interacts with the networking stack. In particular, I'm concerned
>>> > based on reading the source that this driver uses "TCP port stealing"
>>> > rather than using a totally separate MAC address (and IP).
>>> >
>>> > Stealing a TCP port on an IP/interface already assigned is a common
>>> > solution in this space, but also a flawed one. Precisely because the
>>> > kernel and applications are unaware of this "special, magic TCP port"
>>> > you open the potential for application problems that are very
>>> > difficult for an admin to diagnose based on observed behavior.
>>>
>>> That's true, but using a separate MAC and IP opens up a bunch of other
>>> operational problems. I don't think the right answer for iSCSI offload
>>> is clear yet.
>>>
>>> - R.
>> Hi Jeff,
>>
>> We've considered the approach of having a separate IP/MAC addresses to
>> manage iSCSI connections. In such a context, the stack would have to be
>> unaware of this iSCSI specific IP address. The iSCSI driver would then have
>> to implement at least its own ARP reply mechanism. DHCP too would have to
>> be managed separately. Most network setting/monitoring tools would also be
>> unavailable.
>>
>> The open-iscsi initiator is not a huge consumer of TCP connections,
>> allocating a TCP port from the stack would be reasonable in terms of
>> resources in this context. It is however unclear if it is an acceptable
>> approach.
>>
>> Our current implementation was designed to be the most tolerable one
>> within the constraints - real or expected - aforementioned.
>>
>
> Hi Jeff,
>
> Mike Christie will not merge this code until he has an explicit
> acknowledgement from netdev.
>
> As you mentioned, the port stealing approach we've taken has its issues.
> We consequently analyzed your suggestion to use a different IP/MAC address for
> iSCSI and it raises other tough issues (separate ARP and DHCP management,
> unavailability of common networking tools).

If the iscsi tools could not have to deal with networking issues that
are already handled by other networking tools it would great for the
iscsi users so they do not have to learn new tools. Maybe we could
somehow hook into the existing network tools so they support these iscsi
hbas as well as normal NICs. Would it be possible to have the iscsi hbas
export the necessary network interfaces so that existing network tools
can manage them?

If it comes down to it and your port stealing implementation is not
acceptable like how broadcom's was not, I will be ok with doing some
special iscsi network tools. Or instead of special iscsi tools, is there
something that the RDMA/iWarp guys are using that we can share?

2008-08-08 18:08:24

by Steve Wise

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

> Hi Jeff,
>
> Mike Christie will not merge this code until he has an explicit
> acknowledgement from netdev.
>
> As you mentioned, the port stealing approach we've taken has its issues.
> We consequently analyzed your suggestion to use a different IP/MAC address for
> iSCSI and it raises other tough issues (separate ARP and DHCP management,
> unavailability of common networking tools).
> On these grounds, we believe our current approach is the most tolerable.
> Would the stack provide a TCP port allocation service, we'd be glad to use it
> to solve the current concerns.
> The cxgb3i driver is up and running here, its merge is pending our decision.
>
> Cheers,
> Divy
>
Hey Dave/Jeff,

I think we need some guidance here on how to proceed. Is the approach
currently being reviewed ACKable? Or is it DOA? If its DOA, then what
approach do you recommend? I believe Jeff's opinion is a separate
ipaddr. But Dave, what do you think? Lets get some agreement on a high
level design here.

Possible solutions seen to date include:

1) reserving a socket to allocate the port. This has been NAK'd in the
past and I assume is still a no go.

2) creating a 4-tuple allocation service so the host stack, the rdma
stack, and the iscsi stack can share the same TCP 4-tuple space. This
also has been NAK'd in the past and I assume is still a no go.

3) the iscsi device allocates its own local ephemeral posts (port
stealing) and use the host's ip address for the iscsi offload device.
This is the current proposal and you can review the thread for the pros
and cons. IMO it is the least objectionable (and I think we really
should be doing #2).

4) the iscsi device will manage its own ip address thus ensuring 4-tuple
uniqueness.

Unless you all want to re-open considering #1 or #2, then we're left
with 3 or 4. Which one?

Steve.

2008-08-08 22:16:15

by Jeff Garzik

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Steve Wise wrote:
>
>> Hi Jeff,
>>
>> Mike Christie will not merge this code until he has an explicit
>> acknowledgement from netdev.
>>
>> As you mentioned, the port stealing approach we've taken has its issues.
>> We consequently analyzed your suggestion to use a different IP/MAC
>> address for iSCSI and it raises other tough issues (separate ARP and
>> DHCP management, unavailability of common networking tools).
>> On these grounds, we believe our current approach is the most tolerable.
>> Would the stack provide a TCP port allocation service, we'd be glad to
>> use it to solve the current concerns.
>> The cxgb3i driver is up and running here, its merge is pending our
>> decision.
>>
>> Cheers,
>> Divy
>>
> Hey Dave/Jeff,
>
> I think we need some guidance here on how to proceed. Is the approach
> currently being reviewed ACKable? Or is it DOA? If its DOA, then what
> approach do you recommend? I believe Jeff's opinion is a separate
> ipaddr. But Dave, what do you think? Lets get some agreement on a high
> level design here.
> Possible solutions seen to date include:
>
> 1) reserving a socket to allocate the port. This has been NAK'd in the
> past and I assume is still a no go.
>
> 2) creating a 4-tuple allocation service so the host stack, the rdma
> stack, and the iscsi stack can share the same TCP 4-tuple space. This
> also has been NAK'd in the past and I assume is still a no go.
>
> 3) the iscsi device allocates its own local ephemeral posts (port
> stealing) and use the host's ip address for the iscsi offload device.
> This is the current proposal and you can review the thread for the pros
> and cons. IMO it is the least objectionable (and I think we really
> should be doing #2).
>
> 4) the iscsi device will manage its own ip address thus ensuring 4-tuple
> uniqueness.

Conceptually, it is a nasty business for the OS kernel to be forced to
co-manage an IP address in conjunction with a remote, independent entity.

Hardware designers make the mistake of assuming that firmware management
of a TCP port ("port stealing") successfully provides the illusion to
the OS that that port is simply inactive, and the OS happily continues
internetworking its merry way through life.

This is certainly not true, because of current netfilter and userland
application behavior, which often depends on being able to allocate
(bind) to random TCP ports. Allocating a TCP port successfully within
the OS, that then behaves different from all other TCP ports (because it
is the magic iSCSI port) creates a cascading functional disconnect. On
that magic iSCSI port, strange errors will be returned instead of proper
behavior. Which, in turn, cascades through new (and inevitably
under-utilized) error handling paths in the app.

So, of course, one must work around problems like this, which leads to
one of two broad choices:

1) implement co-management (sharing) of IP address/port space, between
the OS kernel and a remote entity.

2) come up with a solution in hardware that does not require the OS to
co-manage the data it has so far been managing exclusively in software.

It should be obvious that we prefer path #2.

For, trudging down path #1 means

* one must give the user the ability to manage shared IP addresses IN A
NON-HARDWARE-SPECIFIC manner. Currently most vendors of "TCP port
stealing" solutions seem to expect each user to learn a vendor-specific
method of identifying and managing the "magic port".

Excuse my language, but, what a fucking security and management
nightmare in a cross-vendor environment. It is already a pain, with
some [unnamed system/chipset vendors] management stealing TCP ports --
and admins only discover this fact when applications behave strangely on
new hardware.

But... its tough to notice because stumbling upon the magic TCP port
won't happen often unless the server is heavily loaded. Thus you have a
security/application problem once in a blue moon, due to this magic TCP
port mentioned in some obscure online documentation nobody has read.

* however, giving the user the ability to co-manage IP addresses means
hacking up the kernel TCP code and userland tools for this new concept,
something that I think DaveM would rightly be a bit reluctant to do?
You are essentially adding a bunch of special case code whenever TCP
ports are used:

if (port in list of "magic" TCP ports with special,
hardware-specific behavior)
...
else
do what we've been doing for decades

ISTR Roland(?) pointing out code that already does a bit of this in the
IB space... but the point is

Finally, this shared IP address/port co-management thing has several
problems listed on the TOE page: http://www.linuxfoundation.org/en/Net:TOE

such as,

* security updates for TCP problems mean that a single IP address can be
PARTIALLY SECURE, because security updates for kernel TCP stack and
h/w's firmware are inevitably updated separately (even if distributed
and compiled together). Yay, we are introducing a wonderful new
security problem here.

* from a security, network scanner and packet classifier point of view,
a single IP address no longer behaves like Linux. It behaves like
Linux... sometime. Depending on whether it is a magic TCP port or not.

Talk about security audit hell.

This should be plenty, so I'm stopping now. But looking down the TOE
wiki page I could easily come up with more reasons why "IP address
remote co-management" is more complicated and costly than you think.

Jeff

2008-08-08 22:20:30

by Jeff Garzik

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Jeff Garzik wrote:
> * however, giving the user the ability to co-manage IP addresses means
> hacking up the kernel TCP code and userland tools for this new concept,
> something that I think DaveM would rightly be a bit reluctant to do? You
> are essentially adding a bunch of special case code whenever TCP ports
> are used:
>
> if (port in list of "magic" TCP ports with special,
> hardware-specific behavior)
> ...
> else
> do what we've been doing for decades
>
> ISTR Roland(?) pointing out code that already does a bit of this in the
> IB space... but the point is

grrr. but the point is that the solution is not at all complete, with
feature disconnects and security audit differences still outsanding, and
non-hw-specific management apps still unwritten.

(I'm not calling for their existence, merely saying trying to strike the
justification that current capability to limp along exists)

Jeff

2008-08-09 07:28:53

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Attachments:

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator

Subject: Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator