2005-03-07 07:04:07

by Alex Aizman

[permalink] [raw]
Subject: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

This is to announce Open-iSCSI project: High-Performance iSCSI Initiator
for Linux.

MOTIVATION
==========

Our initial motivations for the project were: (1) implement the right
user/kernel split, and (2) design iSCSI data path for performance. Recently
we added (3): get accepted into the mainline kernel.

As far as user/kernel, the existing iSCSI initiators bloat the kernel with
ever-growing control plane code, including but not limited to: iSCSI
discovery, Login (Authentication and Operational), session and connection
management, connection-level error processing, iSCSI Text, Nop-Out/In, Async
Message, iSNS, SLP, Radius... Open-iSCSI puts the entire control plane in
the user space. This control plane talks to the data plane via well defined
interface over the netlink transport.

(Side note: prior to closing on the netlink we considered: sysfs, ioctl, and
syscall. Because the entire control plane logic resides in the user space,
we needed a real bi-directional transport that could support asynchronous
API to transfer iSCSI control PDUs: Login, Logout, Nop-in, Nop-Out, Text,
Async Message.

Performance.
This is the major goal and motivation for this project. As it happens, iSCSI
has to compete with Fibre Channel, which is a more entrenched technology in
the storage space. In addition, the "soft" iSCSI implementation have to show
good results in presence of specialized hardware offloads.

Our today's performance numbers are:

- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block size);

- 320MB/sec Write on a single connection (2-way 2.4Ghz Opteron, 64KB block
size);

- 50,000 Read IOPS on a single connection (2-way 2.4Ghz Opteron, 4KB block
size).

Prior to starting from-scratch the data path code we did evaluate the sfnet
Initiator. And eventually decided against patching it. Instead, we reused
its Discovery, Login, etc. control plane code. Technically, it was the
shortest way to achieve the (1) and (2) goals stated above. We believe that
it remains the easiest and the most practical thing on the larger scale of:
iSCSI for Linux.


STATUS
======

There's a 100% working code that interoperates with all (count=5) iSCSI
targets we could get our hands on.

The software was tested on AMD Opteron (TM) and Intel Xeon (TM).

Code is available online via either Subversion source control database or
the latest development release (i.e., the tarball containing Open-iSCSI
sources, including user space, that will build and run on kernels starting
2.6.10).

http://www.open-iscsi.org

Features:

- highly optimized and small-footprint data path;
- multiple outstanding R2Ts;
- thread-less receive;
- sendpage() based transmit;
- zero-copy header processing on receive;
- no data path memory allocations at runtime;
- persistent configuration database;
- SendTargets discovery;
- CHAP;
- DataSequenceInOrder=No;
- PDU header Digest;
- multiple sessions;
- MC/S (note: disabled in the patch);
- SCSI-level recovery via Abort Task and session re-open.


TODO
====

The near term plan is: test, test, and test. We need to stabilize the
existing code, after 5 months of development this seems to be the right
thing to do.

Other short-term plans include:

a) process community feedback, implement comments and apply patches;
b) cleanup user side of the iSCSI open interface; use API calls (instead of
directly constructing events);
c) eliminate runtime control path memory allocations (for Nop-In, Nop-Out,
etc.);
d) implement Write path optimizations (delayed because of the self-imposed
submission deadline);
e) oProfile the data path, use the reports for further optimization;
f) complete the readme.

Comments, code reviews, patches - are greatly appreciated!


THANKS
======

Special thanks to our first reviewers: Christoph Hellwig and Mike Christie.

Special thanks to Ming Zhang for help in testing and for insightful questions.


Regards,

Alex Aizman & Dmitry Yusupov

=============================================

The following 6 patches alltogether represent the Open-iSCSI Initiator:

Patch 1:
SCSI LLDD consists of 3 files:
- iscsi_if.c (iSCSI open interface over netlink);
- iscsi_tcp.[ch] (iSCSI transport over TCP/IP).

Patch 2:
Common header files:
- iscsi_if.h (iSCSI open interface over netlink);
- iscsi_proto.h (RFC3720 #defines and types);
- iscsi_ifev.h (user/kernel events).

Patch 3:
drivers/scsi/Kconfig changes.

Patch 4:
drivers/scsi/Makefile changes.

Patch 5:
include/linux/netlink.h changes (added new protocol NETLINK_ISCSI)

Patch 6:
Documentation/scsi/iscsi.txt









2005-03-09 05:04:46

by Matt Mackall

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Sun, Mar 06, 2005 at 11:03:14PM -0800, Alex Aizman wrote:
> As far as user/kernel, the existing iSCSI initiators bloat the kernel with
> ever-growing control plane code, including but not limited to: iSCSI
> discovery, Login (Authentication and Operational), session and connection
> management, connection-level error processing, iSCSI Text, Nop-Out/In, Async
> Message, iSNS, SLP, Radius... Open-iSCSI puts the entire control plane in
> the user space. This control plane talks to the data plane via well defined
> interface over the netlink transport.

How big is the userspace client?

How does this perform under memory pressure? If the userspace iSCSI
client is paged out for whatever reason, and flushing _to_ an iSCSI
device is necessary to page the usersace portion back in, and the
connection needs restarting or the like to flush...

> Performance.
> This is the major goal and motivation for this project. As it happens, iSCSI
> has to compete with Fibre Channel, which is a more entrenched technology in
> the storage space. In addition, the "soft" iSCSI implementation have to show
> good results in presence of specialized hardware offloads.
>
> Our today's performance numbers are:
>
> - 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> size);

With what network hardware and drives, please?

--
Mathematics is the supreme nostalgia of our time.

2005-03-09 05:52:14

by Alex Aizman

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

Matt Mackall wrote:

>How big is the userspace client?
>
>
Hmm.. x86 executable? source?

Anyway, there's about 12,000 lines of user space code, and growing. In
the kernel we have approx. 3,300 lines.

>>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
>>size);
>>
>>
>
>With what network hardware and drives, please?
>
>
>
Neterion's 10GbE adapters. RAM disk on the target side.

Alex

2005-03-09 06:06:09

by Matt Mackall

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Tue, Mar 08, 2005 at 09:51:39PM -0800, Alex Aizman wrote:
> Matt Mackall wrote:
>
> >How big is the userspace client?
> >
> Hmm.. x86 executable? source?
>
> Anyway, there's about 12,000 lines of user space code, and growing. In
> the kernel we have approx. 3,300 lines.
>
> >>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> >>size);
> >
> >With what network hardware and drives, please?
> >
> Neterion's 10GbE adapters. RAM disk on the target side.

Ahh.

Snipped my question about userspace deadlocks - that was the important
one. It is in fact why the sfnet one is written as it is - it
originally had a userspace component and turned out to be easy to
deadlock under load because of it.

--
Mathematics is the supreme nostalgia of our time.

2005-03-09 06:26:19

by Alex Aizman

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

Matt Mackall wrote:

>On Tue, Mar 08, 2005 at 09:51:39PM -0800, Alex Aizman wrote:
>
>
>>Matt Mackall wrote:
>>
>>
>>
>>>How big is the userspace client?
>>>
>>>
>>>
>>Hmm.. x86 executable? source?
>>
>>Anyway, there's about 12,000 lines of user space code, and growing. In
>>the kernel we have approx. 3,300 lines.
>>
>>
>>
>>>>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
>>>>size);
>>>>
>>>>
>>>With what network hardware and drives, please?
>>>
>>>
>>>
>>Neterion's 10GbE adapters. RAM disk on the target side.
>>
>>
>
>Ahh.
>
>Snipped my question about userspace deadlocks - that was the important
>one. It is in fact why the sfnet one is written as it is - it
>originally had a userspace component and turned out to be easy to
>deadlock under load because of it.
>
>
>
There's (or at least was up until today) an ongoing discussion on our
mailing list at http://groups-beta.google.com/group/open-iscsi. The
short and long of it: the problem can be solved, and it will. Couple
simple things we already do: mlockall() to keep the daemon un-swapped,
and also looking into potential dependency created by syslog (there's
one for 2.4 kernel, not sure if this is an issue for 2.6).

The sfnet is a learning experience; it is by no means a proof that it
cannot be done.

Alex

2005-03-09 06:26:50

by Dmitry Yusupov

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Tue, 2005-03-08 at 22:05 -0800, Matt Mackall wrote:
> On Tue, Mar 08, 2005 at 09:51:39PM -0800, Alex Aizman wrote:
> > Matt Mackall wrote:
> >
> > >How big is the userspace client?
> > >
> > Hmm.. x86 executable? source?
> >
> > Anyway, there's about 12,000 lines of user space code, and growing. In
> > the kernel we have approx. 3,300 lines.
> >
> > >>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> > >>size);
> > >
> > >With what network hardware and drives, please?
> > >
> > Neterion's 10GbE adapters. RAM disk on the target side.
>
> Ahh.
>
> Snipped my question about userspace deadlocks - that was the important
> one. It is in fact why the sfnet one is written as it is - it
> originally had a userspace component and turned out to be easy to
> deadlock under load because of it.

As Scott Ferris pointed out, the main reason for deadlock in sfnet was
blocking behavior of page cache when daemon tried to do filesystem IO,
namely syslog(). That was 2.4.x kernel. We don't know whether it is
fixed in 2.6.x. If someone knows, please let us know. Meanwhile we came
up with work-around design in user-space. "Paged out" problem fixed
already in our subversion repository by utilizing mlockall() syscall.
Also we have IMHO, working solution for OOM during ERL=0 TCP re-connect.

Dmitry

2005-03-09 06:51:01

by Matt Mackall

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Tue, Mar 08, 2005 at 10:25:58PM -0800, Dmitry Yusupov wrote:
> On Tue, 2005-03-08 at 22:05 -0800, Matt Mackall wrote:
> > On Tue, Mar 08, 2005 at 09:51:39PM -0800, Alex Aizman wrote:
> > > Matt Mackall wrote:
> > >
> > > >How big is the userspace client?
> > > >
> > > Hmm.. x86 executable? source?
> > >
> > > Anyway, there's about 12,000 lines of user space code, and growing. In
> > > the kernel we have approx. 3,300 lines.
> > >
> > > >>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> > > >>size);
> > > >
> > > >With what network hardware and drives, please?
> > > >
> > > Neterion's 10GbE adapters. RAM disk on the target side.
> >
> > Ahh.
> >
> > Snipped my question about userspace deadlocks - that was the important
> > one. It is in fact why the sfnet one is written as it is - it
> > originally had a userspace component and turned out to be easy to
> > deadlock under load because of it.
>
> As Scott Ferris pointed out, the main reason for deadlock in sfnet was
> blocking behavior of page cache when daemon tried to do filesystem IO,
> namely syslog().

That was just one of several problems. And ISTR deciding that
particular one was quite nasty when we first encountered it though I
no longer remember the details.

> That was 2.4.x kernel. We don't know whether it is
> fixed in 2.6.x. If someone knows, please let us know. Meanwhile we came
> up with work-around design in user-space. "Paged out" problem fixed
> already in our subversion repository by utilizing mlockall()
> syscall.

I presume this is dynamically linked against glibc?

> Also we have IMHO, working solution for OOM during ERL=0 TCP re-connect.

Care to describe it?

--
Mathematics is the supreme nostalgia of our time.

2005-03-09 07:18:43

by Dmitry Yusupov

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Tue, 2005-03-08 at 22:50 -0800, Matt Mackall wrote:
> On Tue, Mar 08, 2005 at 10:25:58PM -0800, Dmitry Yusupov wrote:
> > On Tue, 2005-03-08 at 22:05 -0800, Matt Mackall wrote:
> > > On Tue, Mar 08, 2005 at 09:51:39PM -0800, Alex Aizman wrote:
> > > > Matt Mackall wrote:
> > > >
> > > > >How big is the userspace client?
> > > > >
> > > > Hmm.. x86 executable? source?
> > > >
> > > > Anyway, there's about 12,000 lines of user space code, and growing. In
> > > > the kernel we have approx. 3,300 lines.
> > > >
> > > > >>- 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> > > > >>size);
> > > > >
> > > > >With what network hardware and drives, please?
> > > > >
> > > > Neterion's 10GbE adapters. RAM disk on the target side.
> > >
> > > Ahh.
> > >
> > > Snipped my question about userspace deadlocks - that was the important
> > > one. It is in fact why the sfnet one is written as it is - it
> > > originally had a userspace component and turned out to be easy to
> > > deadlock under load because of it.
> >
> > As Scott Ferris pointed out, the main reason for deadlock in sfnet was
> > blocking behavior of page cache when daemon tried to do filesystem IO,
> > namely syslog().
>
> That was just one of several problems. And ISTR deciding that
> particular one was quite nasty when we first encountered it though I
> no longer remember the details.

that's bad. since all those details might help us to avoid problems and
save time in the future daemon design. I will really appreciate you will
point me to other potential problems once you recall.

>
> > That was 2.4.x kernel. We don't know whether it is
> > fixed in 2.6.x. If someone knows, please let us know. Meanwhile we came
> > up with work-around design in user-space. "Paged out" problem fixed
> > already in our subversion repository by utilizing mlockall()
> > syscall.
>
> I presume this is dynamically linked against glibc?

over time it will be linked against klibc as dm-multipath do. It will
also help to implement iSCSI boot, when control plane daemon will be
part of initramfs image.

> > Also we have IMHO, working solution for OOM during ERL=0 TCP re-connect.
>
> Care to describe it?

sure. the idea was to always keep second reserved/redundant TCP
connection per session opened. (please note, TCP connection, not iSCSI
connection). This way during recovery cycle in case of sane target,
initiator will switch into redundant TCP connection and send Login
request over. This could be implemented as a feature and might be
disabled via configuration utility if needed.

Dmitry

2005-03-10 02:40:28

by Alex Aizman

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

Lars Marowsky-Bree wrote:

>On 2005-03-08T22:25:29, Alex Aizman <[email protected]> wrote:
>
>
>>There's (or at least was up until today) an ongoing discussion on our
>>mailing list at http://groups-beta.google.com/group/open-iscsi. The
>>short and long of it: the problem can be solved, and it will. Couple
>>simple things we already do: mlockall() to keep the daemon un-swapped,
>>and also looking into potential dependency created by syslog (there's
>>one for 2.4 kernel, not sure if this is an issue for 2.6).
>>
>>
>
>BTW, to get around the very same issues, heartbeat does much the same:
>lock itself into memory, reserve a couple of pages more to spare on
>stack & heap, run at soft-realtime priority.
>
>
Heartbeat is good for reliability, etc. WRT "getting paged-out" -
non-deterministic (things depend on time), right?

>syslog(), however, sucks.
>
>
It does.

>We went down the path of using our non-blocking IPC library to have all
>our various components log to ha_logd, which then logs to syslog() or
>writes to disk or wherever.
>
>

Found ha_logd under http://linux-ha.org. The latter is extemely
interesting in the longer term. In the short term, there's quite a bit
of information on this site, need time.

>That works well in our current development series, and if you want to
>share code, you can either rip it off (Open Source, we love ya ;) or we
>can spin off these parts into a sub-package for you to depend on...
>
>
>
If it's not a big deal :-) let's do the "sub-package" option.

>>The sfnet is a learning experience; it is by no means a proof that it
>>cannot be done.
>>
>>
>
>I'd also argue that it MUST be done, because the current way of "Oh,
>it's somehow related to block stuff, must be in kernel" leads down to
>hell. We better figure out good ways around it ;-)
>
>
Yes, it MUST be done.

>
>Sincerely,
> Lars Marowsky-Br?e <[email protected]>
>
>
>
Alex

2005-03-09 23:05:04

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On 2005-03-08T22:25:29, Alex Aizman <[email protected]> wrote:

> There's (or at least was up until today) an ongoing discussion on our
> mailing list at http://groups-beta.google.com/group/open-iscsi. The
> short and long of it: the problem can be solved, and it will. Couple
> simple things we already do: mlockall() to keep the daemon un-swapped,
> and also looking into potential dependency created by syslog (there's
> one for 2.4 kernel, not sure if this is an issue for 2.6).

BTW, to get around the very same issues, heartbeat does much the same:
lock itself into memory, reserve a couple of pages more to spare on
stack & heap, run at soft-realtime priority.

syslog(), however, sucks.

We went down the path of using our non-blocking IPC library to have all
our various components log to ha_logd, which then logs to syslog() or
writes to disk or wherever.

That works well in our current development series, and if you want to
share code, you can either rip it off (Open Source, we love ya ;) or we
can spin off these parts into a sub-package for you to depend on...

> The sfnet is a learning experience; it is by no means a proof that it
> cannot be done.

I'd also argue that it MUST be done, because the current way of "Oh,
it's somehow related to block stuff, must be in kernel" leads down to
hell. We better figure out good ways around it ;-)


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

2005-03-10 10:31:04

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On 2005-03-09T18:36:37, Alex Aizman <[email protected]> wrote:

> Heartbeat is good for reliability, etc. WRT "getting paged-out" -
> non-deterministic (things depend on time), right?

Right, if we didn't get scheduled often enough for us to send our
heartbeat messages to the other peers, they'll evict us from the cluster
and fence us, causing a service disruption.

With all these protections in place though, we can run at roughly 50ms
heartbeat intervals from user-space, reliably, which allows us a node
dead timer of ~200ms. I think that's pretty damn good.

(Of course, realistically, even for subsecond fail-over, 200ms keep
alives are sufficient, and 50ms would be quite extreme. But, it works.)

> >That works well in our current development series, and if you want to
> >share code, you can either rip it off (Open Source, we love ya ;) or we
> >can spin off these parts into a sub-package for you to depend on...
> If it's not a big deal :-) let's do the "sub-package" option.

I've brought this up on the linux-ha-dev list. When do you need this?


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

2005-03-11 07:00:53

by Dmitry Yusupov

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Thu, 2005-03-10 at 11:27 +0100, Lars Marowsky-Bree wrote:
> On 2005-03-09T18:36:37, Alex Aizman <[email protected]> wrote:
> > >That works well in our current development series, and if you want to
> > >share code, you can either rip it off (Open Source, we love ya ;) or we
> > >can spin off these parts into a sub-package for you to depend on...
> > If it's not a big deal :-) let's do the "sub-package" option.
>
> I've brought this up on the linux-ha-dev list. When do you need this?

For open-iscsi, I think it would make sense to link open-iscs daemon
code against klibc. The same way dm-multipath do. This will allow as to
build iSCSI remote boot using early user-space. Not sure it will be
possible to use your package without modifications. Let me know.

Dmitry

2005-03-12 16:55:56

by Dave Wysochanski

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

Alex Aizman wrote:

>
> This is to announce Open-iSCSI project: High-Performance iSCSI
> Initiator for
> Linux.
>
> MOTIVATION
> ==========
>
> Our initial motivations for the project were: (1) implement the right
> user/kernel split, and (2) design iSCSI data path for performance.
> Recently
> we added (3): get accepted into the mainline kernel.
>
> As far as user/kernel, the existing iSCSI initiators bloat the kernel
> with
> ever-growing control plane code, including but not limited to: iSCSI
> discovery, Login (Authentication and Operational), session and connection
> management, connection-level error processing, iSCSI Text, Nop-Out/In,
> Async
> Message, iSNS, SLP, Radius... Open-iSCSI puts the entire control plane in
> the user space. This control plane talks to the data plane via well
> defined
> interface over the netlink transport.
>
> (Side note: prior to closing on the netlink we considered: sysfs,
> ioctl, and
> syscall. Because the entire control plane logic resides in the user
> space,
> we needed a real bi-directional transport that could support asynchronous
> API to transfer iSCSI control PDUs: Login, Logout, Nop-in, Nop-Out, Text,
> Async Message.
>
> Performance.
> This is the major goal and motivation for this project. As it happens,
> iSCSI
> has to compete with Fibre Channel, which is a more entrenched
> technology in
> the storage space. In addition, the "soft" iSCSI implementation have
> to show
> good results in presence of specialized hardware offloads.
>
> Our today's performance numbers are:
>
> - 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> size);
>
> - 320MB/sec Write on a single connection (2-way 2.4Ghz Opteron, 64KB
> block
> size);
>
> - 50,000 Read IOPS on a single connection (2-way 2.4Ghz Opteron, 4KB
> block
> size).
>
Has anyone on the list verified these #'s? I'm trying to
get open-iscsi to work but it looks like it's got a problem
in the very initial stages of lun scanning that prevents
my target from working. Open-iscsi guys I have a trace
if you want to look at it. Looks like despite the fact that
report luns is returned successfully and only 1 lun is
returned (lun 0), the initiator is still sending inquiry
commands to luns > 0, and it looks like it gets confused
when it gets a 0x3f inquiry response from the target
(for an inquiry to lun 1), tries to issue a TMF abort
task on the previous inquiry which has already completed,
and the target responds with "task not in task set", which
is understandable since the command has already completed.
I used the latest .169 code.

I don't see this problem with the latest linux-iscsi.sfnet
code and have interoperated with many other initiators,
so I'm fairly confident there's a bug in open-iscsi somewhere.


> Prior to starting from-scratch the data path code we did evaluate the
> sfnet
> Initiator. And eventually decided against patching it. Instead, we reused
> its Discovery, Login, etc. control plane code.
> Technically, it was the shortest way to achieve the (1) and (2) goals
> stated
> above. We believe that it remains the easiest and the most practical
> thing
> on the larger scale of: iSCSI for Linux.
>
>
> STATUS
> ======
>
> There's a 100% working code that interoperates with all (count=5) iSCSI
> targets we could get our hands on.
>
> The software was tested on AMD Opteron (TM) and Intel Xeon (TM).
>
> Code is available online via either Subversion source control database or
> the latest development release (i.e., the tarball containing Open-iSCSI
> sources, including user space, that will build and run on kernels
> starting
> 2.6.10).
>
> http://www.open-iscsi.org
>
> Features:
>
> - highly optimized and small-footprint data path;
> - multiple outstanding R2Ts;
> - thread-less receive;
> - sendpage() based transmit;
> - zero-copy header processing on receive;
> - no data path memory allocations at runtime;
> - persistent configuration database;
> - SendTargets discovery;
> - CHAP;
> - DataSequenceInOrder=No;
> - PDU header Digest;
> - multiple sessions;
> - MC/S (note: disabled in the patch);
> - SCSI-level recovery via Abort Task and session re-open.
>
>
> TODO
> ====
>
> The near term plan is: test, test, and test. We need to stabilize the
> existing code, after 5 months of development this seems to be the right
> thing to do.
>
> Other short-term plans include:
>
> a) process community feedback, implement comments and apply patches;
> b) cleanup user side of the iSCSI open interface; use API calls
> (instead of
> directly constructing events);
> c) eliminate runtime control path memory allocations (for Nop-In,
> Nop-Out,
> etc.);
> d) implement Write path optimizations (delayed because of the
> self-imposed
> submission deadline);
> e) oProfile the data path, use the reports for further optimization;
> f) complete the readme.
>
> Comments, code reviews, patches - are greatly appreciated!
>
>
> THANKS
> ======
>
> Special thanks to our first reviewers: Christoph Hellwig and Mike
> Christie.
>
> Special thanks to Ming Zhang for help in testing and for insightful
> questions.
>
>
> Regards,
>
> Alex Aizman & Dmitry Yusupov
>
> =============================================
>
> The following 6 patches alltogether represent the Open-iSCSI Initiator:
>
> Patch 1:
> SCSI LLDD consists of 3 files:
> - iscsi_if.c (iSCSI open interface over netlink);
> - iscsi_tcp.[ch] (iSCSI transport over TCP/IP).
>
> Patch 2:
> Common header files:
> - iscsi_if.h (iSCSI open interface over netlink);
> - iscsi_proto.h (RFC3720 #defines and types);
> - iscsi_ifev.h (user/kernel events).
>
> Patch 3:
> drivers/scsi/Kconfig changes.
>
> Patch 4:
> drivers/scsi/Makefile changes.
>
> Patch 5:
> include/linux/netlink.h changes (added new protocol NETLINK_ISCSI)
>
> Patch 6:
> Documentation/scsi/iscsi.txt
>

2005-03-12 17:12:04

by Dmitry Yusupov

[permalink] [raw]
Subject: Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

On Sat, 2005-03-12 at 11:55 -0500, Dave Wysochanski wrote:
> Alex Aizman wrote:
>
> >
> > This is to announce Open-iSCSI project: High-Performance iSCSI
> > Initiator for
> > Linux.
> >
> > MOTIVATION
> > ==========
> >
> > Our initial motivations for the project were: (1) implement the right
> > user/kernel split, and (2) design iSCSI data path for performance.
> > Recently
> > we added (3): get accepted into the mainline kernel.
> >
> > As far as user/kernel, the existing iSCSI initiators bloat the kernel
> > with
> > ever-growing control plane code, including but not limited to: iSCSI
> > discovery, Login (Authentication and Operational), session and connection
> > management, connection-level error processing, iSCSI Text, Nop-Out/In,
> > Async
> > Message, iSNS, SLP, Radius... Open-iSCSI puts the entire control plane in
> > the user space. This control plane talks to the data plane via well
> > defined
> > interface over the netlink transport.
> >
> > (Side note: prior to closing on the netlink we considered: sysfs,
> > ioctl, and
> > syscall. Because the entire control plane logic resides in the user
> > space,
> > we needed a real bi-directional transport that could support asynchronous
> > API to transfer iSCSI control PDUs: Login, Logout, Nop-in, Nop-Out, Text,
> > Async Message.
> >
> > Performance.
> > This is the major goal and motivation for this project. As it happens,
> > iSCSI
> > has to compete with Fibre Channel, which is a more entrenched
> > technology in
> > the storage space. In addition, the "soft" iSCSI implementation have
> > to show
> > good results in presence of specialized hardware offloads.
> >
> > Our today's performance numbers are:
> >
> > - 450MB/sec Read on a single connection (2-way 2.4Ghz Opteron, 64KB block
> > size);
> >
> > - 320MB/sec Write on a single connection (2-way 2.4Ghz Opteron, 64KB
> > block
> > size);
> >
> > - 50,000 Read IOPS on a single connection (2-way 2.4Ghz Opteron, 4KB
> > block
> > size).
> >
> Has anyone on the list verified these #'s?

as far I know, no one tried that but me. We've used disktest with
O_DIRECT flag set on 10Gbps network with jumbo frames enabled and big
big big TCP window & socket buffer. I really would like to see that this
number gets reproduced not on my setup only.

> I'm trying to
> get open-iscsi to work but it looks like it's got a problem
> in the very initial stages of lun scanning that prevents
> my target from working. Open-iscsi guys I have a trace
> if you want to look at it. Looks like despite the fact that
> report luns is returned successfully and only 1 lun is
> returned (lun 0), the initiator is still sending inquiry
> commands to luns > 0, and it looks like it gets confused
> when it gets a 0x3f inquiry response from the target
> (for an inquiry to lun 1), tries to issue a TMF abort
> task on the previous inquiry which has already completed,
> and the target responds with "task not in task set", which
> is understandable since the command has already completed.
> I used the latest .169 code.

its too old anyways. try subversion's repository. but I doubt it will
help in your case.

> I don't see this problem with the latest linux-iscsi.sfnet
> code and have interoperated with many other initiators,
> so I'm fairly confident there's a bug in open-iscsi somewhere.

i'm pretty sure it is a bug in open-iscsi. which target are you using?
can we get remote access?

>
> > Prior to starting from-scratch the data path code we did evaluate the
> > sfnet
> > Initiator. And eventually decided against patching it. Instead, we reused
> > its Discovery, Login, etc. control plane code.
> > Technically, it was the shortest way to achieve the (1) and (2) goals
> > stated
> > above. We believe that it remains the easiest and the most practical
> > thing
> > on the larger scale of: iSCSI for Linux.
> >
> >
> > STATUS
> > ======
> >
> > There's a 100% working code that interoperates with all (count=5) iSCSI
> > targets we could get our hands on.
> >
> > The software was tested on AMD Opteron (TM) and Intel Xeon (TM).
> >
> > Code is available online via either Subversion source control database or
> > the latest development release (i.e., the tarball containing Open-iSCSI
> > sources, including user space, that will build and run on kernels
> > starting
> > 2.6.10).
> >
> > http://www.open-iscsi.org
> >
> > Features:
> >
> > - highly optimized and small-footprint data path;
> > - multiple outstanding R2Ts;
> > - thread-less receive;
> > - sendpage() based transmit;
> > - zero-copy header processing on receive;
> > - no data path memory allocations at runtime;
> > - persistent configuration database;
> > - SendTargets discovery;
> > - CHAP;
> > - DataSequenceInOrder=No;
> > - PDU header Digest;
> > - multiple sessions;
> > - MC/S (note: disabled in the patch);
> > - SCSI-level recovery via Abort Task and session re-open.
> >
> >
> > TODO
> > ====
> >
> > The near term plan is: test, test, and test. We need to stabilize the
> > existing code, after 5 months of development this seems to be the right
> > thing to do.
> >
> > Other short-term plans include:
> >
> > a) process community feedback, implement comments and apply patches;
> > b) cleanup user side of the iSCSI open interface; use API calls
> > (instead of
> > directly constructing events);
> > c) eliminate runtime control path memory allocations (for Nop-In,
> > Nop-Out,
> > etc.);
> > d) implement Write path optimizations (delayed because of the
> > self-imposed
> > submission deadline);
> > e) oProfile the data path, use the reports for further optimization;
> > f) complete the readme.
> >
> > Comments, code reviews, patches - are greatly appreciated!
> >
> >
> > THANKS
> > ======
> >
> > Special thanks to our first reviewers: Christoph Hellwig and Mike
> > Christie.
> >
> > Special thanks to Ming Zhang for help in testing and for insightful
> > questions.
> >
> >
> > Regards,
> >
> > Alex Aizman & Dmitry Yusupov
> >
> > =============================================
> >
> > The following 6 patches alltogether represent the Open-iSCSI Initiator:
> >
> > Patch 1:
> > SCSI LLDD consists of 3 files:
> > - iscsi_if.c (iSCSI open interface over netlink);
> > - iscsi_tcp.[ch] (iSCSI transport over TCP/IP).
> >
> > Patch 2:
> > Common header files:
> > - iscsi_if.h (iSCSI open interface over netlink);
> > - iscsi_proto.h (RFC3720 #defines and types);
> > - iscsi_ifev.h (user/kernel events).
> >
> > Patch 3:
> > drivers/scsi/Kconfig changes.
> >
> > Patch 4:
> > drivers/scsi/Makefile changes.
> >
> > Patch 5:
> > include/linux/netlink.h changes (added new protocol NETLINK_ISCSI)
> >
> > Patch 6:
> > Documentation/scsi/iscsi.txt
> >