2010-08-23 16:07:28

by Chetan Loke

[permalink] [raw]
Subject: Re: [Scst-devel] Fwd: Re: linuxcon 2010...

On Mon, Aug 23, 2010 at 11:11 AM, Bart Van Assche <[email protected]> wrote:

>
> There is an important design difference between SCST and LIO: SCST by
> defaults creates multiple threads to process the I/O operations for a
> storage target, while LIO only creates a single thread per storage target.
> This makes SCST perform measurably faster.
>

Forget that. You could have discussed this if there were code reviews
or other mainline inclusion emails from James B. From what I have
heard, the decision was taken around 8-9 months back.
Would anyone like to either comment/validate/refute this please? If
not then I would kindly request these guys to stop taking us for a
test drive. And also I'm not sure when was the last time James B.
bench-marked our scsi-stack. Even if I ACK in the xmit-path then I
can't push more than 100K IOPs. But other folks have re-engineered our
linux-scsi stack and from what I've heard they can push > 300K+ IOPs.
So I would just ignore performance discussion because I don't think
folks have done even simple lame experiments in the last 1 year. Or
may be I'm completely wrong and so please enlighten me so that I can
re-run the tests.


> Bart.
>
Chetan Loke


2010-08-23 18:03:30

by Chetan Loke

[permalink] [raw]
Subject: Re: [Scst-devel] Fwd: Re: linuxcon 2010...

I actually received 3+ off-post emails asking whether I was talking
about initiator or target in the 100K IOPS case below and what did I
mean by the ACKs.
I was referring to the 'Initiator' side.
ACKs == When scsi-ML down-calls the LLD via the queue-command, process
the sgl's(if you like) and then trigger the scsi_done up-call path.

Chetan Loke

On Mon, Aug 23, 2010 at 12:07 PM, Chetan Loke <[email protected]> wrote:
> On Mon, Aug 23, 2010 at 11:11 AM, Bart Van Assche <[email protected]> wrote:
>
>>
>> There is an important design difference between SCST and LIO: SCST by
>> defaults creates multiple threads to process the I/O operations for a
>> storage target, while LIO only creates a single thread per storage target.
>> This makes SCST perform measurably faster.
>>
>
> Forget that. You could have discussed this if there were code reviews
> or other mainline inclusion emails from James B. From what I have
> heard, the decision was taken around 8-9 months back.
> Would anyone like to either comment/validate/refute this please?  If
> not then I would kindly request these guys to stop taking us for a
> test drive. And also I'm not sure when was the last time James B.
> bench-marked our scsi-stack. Even if I ACK in the xmit-path then I
> can't push more than 100K IOPs. But other folks have re-engineered our
> linux-scsi stack and from what I've heard they can push > 300K+ IOPs.
> So I would just ignore performance discussion because I don't think
> folks have done even simple lame experiments in the last 1 year. Or
> may be I'm completely wrong and so please enlighten me so that I can
> re-run the tests.
>
>
>> Bart.
>>
> Chetan Loke
>

2010-08-24 07:36:22

by Pasi Kärkkäinen

[permalink] [raw]
Subject: Re: [Scst-devel] Fwd: Re: linuxcon 2010...

On Mon, Aug 23, 2010 at 02:03:26PM -0400, Chetan Loke wrote:
> I actually received 3+ off-post emails asking whether I was talking
> about initiator or target in the 100K IOPS case below and what did I
> mean by the ACKs.
> I was referring to the 'Initiator' side.
> ACKs == When scsi-ML down-calls the LLD via the queue-command, process
> the sgl's(if you like) and then trigger the scsi_done up-call path.
>

Uhm, Intel and Microsoft demonstrated over 1 million IOPS
using software iSCSI and a single 10 Gbit Ethernet NIC (Intel 82599).

How come there is such a huge difference? What are we lacking in Linux?

-- Pasi

> Chetan Loke
>
> On Mon, Aug 23, 2010 at 12:07 PM, Chetan Loke <[email protected]> wrote:
> > On Mon, Aug 23, 2010 at 11:11 AM, Bart Van Assche <[email protected]> wrote:
> >
> >>
> >> There is an important design difference between SCST and LIO: SCST by
> >> defaults creates multiple threads to process the I/O operations for a
> >> storage target, while LIO only creates a single thread per storage target.
> >> This makes SCST perform measurably faster.
> >>
> >
> > Forget that. You could have discussed this if there were code reviews
> > or other mainline inclusion emails from James B. From what I have
> > heard, the decision was taken around 8-9 months back.
> > Would anyone like to either comment/validate/refute this please? ?If
> > not then I would kindly request these guys to stop taking us for a
> > test drive. And also I'm not sure when was the last time James B.
> > bench-marked our scsi-stack. Even if I ACK in the xmit-path then I
> > can't push more than 100K IOPs. But other folks have re-engineered our
> > linux-scsi stack and from what I've heard they can push > 300K+ IOPs.
> > So I would just ignore performance discussion because I don't think
> > folks have done even simple lame experiments in the last 1 year. Or
> > may be I'm completely wrong and so please enlighten me so that I can
> > re-run the tests.
> >
> >
> >> Bart.
> >>
> > Chetan Loke
> >
>
> ------------------------------------------------------------------------------
> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
> Be part of this innovative community and reach millions of netbook users
> worldwide. Take advantage of special opportunities to increase revenue and
> speed time-to-market. Join now, and jumpstart your future.
> http://p.sf.net/sfu/intel-atom-d2d
> _______________________________________________
> Scst-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scst-devel

2010-08-24 14:43:26

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: Linux I/O subsystem performance (was: linuxcon 2010...)

Pasi K?rkk?inen, on 08/24/2010 11:25 AM wrote:
> On Mon, Aug 23, 2010 at 02:03:26PM -0400, Chetan Loke wrote:
>> I actually received 3+ off-post emails asking whether I was talking
>> about initiator or target in the 100K IOPS case below and what did I
>> mean by the ACKs.
>> I was referring to the 'Initiator' side.
>> ACKs == When scsi-ML down-calls the LLD via the queue-command, process
>> the sgl's(if you like) and then trigger the scsi_done up-call path.
>>
>
> Uhm, Intel and Microsoft demonstrated over 1 million IOPS
> using software iSCSI and a single 10 Gbit Ethernet NIC (Intel 82599).
>
> How come there is such a huge difference? What are we lacking in Linux?

I also have an impression that Linux I/O subsystem has some performance
problems. For instance, in one recent SCST performance test only 8 Linux
initiators with fio as a load generator were able to saturate a single
SCST target with dual IB cards (SRP) on 4K AIO direct accesses over an
SSD backend. This rawly means that any initiator took several times (8?)
more processing time than the target. Hardware used for that target and
initiators was the same. I can't see on this load why the initiators
would need to do something more than the target. Well, I know we in SCST
did an excellent work to maximize performance, but such a difference
looks too much ;)

Also it looks very suspicious why nobody even tried to match that
Microsoft/Intel record, even Intel itself who closely works with Linux
community in the storage area and could do it using the same hardware.

Vlad

2010-08-24 14:55:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Linux I/O subsystem performance (was: linuxcon 2010...)

On Tue, Aug 24, 2010 at 06:43:29PM +0400, Vladislav Bolkhovitin wrote:
> Also it looks very suspicious why nobody even tried to match that
> Microsoft/Intel record, even Intel itself who closely works with Linux
> community in the storage area and could do it using the same hardware.

You seem to be under the impression that "Intel" is some monolithic
entity. Despite working with six different storage & performance
groups within Intel, I have no idea what record you're referring to,
nor what hardware it was accomplished with. Even if I did, I wouldn't
know which group within Intel to contact to see if they still have
the setup. Then I'd have to convince them that it's in their interest
to try to replicate this on Linux. And I'd have to be prepared to sink
a considerable quantity of my time into it ... which I don't have.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2010-08-24 14:55:46

by Chetan Loke

[permalink] [raw]
Subject: Re: [Scst-devel] Fwd: Re: linuxcon 2010...

On Tue, Aug 24, 2010 at 3:25 AM, Pasi Kärkkäinen <[email protected]> wrote:
> On Mon, Aug 23, 2010 at 02:03:26PM -0400, Chetan Loke wrote:
>> I actually received 3+ off-post emails asking whether I was talking
>> about initiator or target in the 100K IOPS case below and what did I
>> mean by the ACKs.
>> I was referring to the 'Initiator' side.
>> ACKs == When scsi-ML down-calls the LLD via the queue-command, process
>> the sgl's(if you like) and then trigger the scsi_done up-call path.
>>
>
> Uhm, Intel and Microsoft demonstrated over 1 million IOPS
> using software iSCSI and a single 10 Gbit Ethernet NIC (Intel 82599).

Uhm, that's MS(and it's closed tcp-chimney protocols and other
offloads?). And I think we discussed in bits and pieces about this on
scst already. Also, just because the driver is open sourced in linux
may not necessarily mean that we know all the ASIC registers that we
can bit-bang and squeeze every clock cycle out of the ASIC(just a
thought).

> How come there is such a huge difference? What are we lacking in Linux?
I'm not a iscsi-guy. So I can't comment on how the data is moved from
n/w buffers to scsi-buffers etc etc.


>
> -- Pasi
Chetan Loke

>>
>> On Mon, Aug 23, 2010 at 12:07 PM, Chetan Loke <[email protected]> wrote:
>> > On Mon, Aug 23, 2010 at 11:11 AM, Bart Van Assche <[email protected]> wrote:
>> >
>> >>
>> >> There is an important design difference between SCST and LIO: SCST by
>> >> defaults creates multiple threads to process the I/O operations for a
>> >> storage target, while LIO only creates a single thread per storage target.
>> >> This makes SCST perform measurably faster.
>> >>
>> >
>> > Forget that. You could have discussed this if there were code reviews
>> > or other mainline inclusion emails from James B. From what I have
>> > heard, the decision was taken around 8-9 months back.
>> > Would anyone like to either comment/validate/refute this please?  If
>> > not then I would kindly request these guys to stop taking us for a
>> > test drive. And also I'm not sure when was the last time James B.
>> > bench-marked our scsi-stack. Even if I ACK in the xmit-path then I
>> > can't push more than 100K IOPs. But other folks have re-engineered our
>> > linux-scsi stack and from what I've heard they can push > 300K+ IOPs.
>> > So I would just ignore performance discussion because I don't think
>> > folks have done even simple lame experiments in the last 1 year. Or
>> > may be I'm completely wrong and so please enlighten me so that I can
>> > re-run the tests.
>> >
>> >
>> >> Bart.
>> >>
>> > Chetan Loke
>> >
>>
>> ------------------------------------------------------------------------------
>> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
>> Be part of this innovative community and reach millions of netbook users
>> worldwide. Take advantage of special opportunities to increase revenue and
>> speed time-to-market. Join now, and jumpstart your future.
>> http://p.sf.net/sfu/intel-atom-d2d
>> _______________________________________________
>> Scst-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>

2010-08-24 17:51:55

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: Linux I/O subsystem performance

Matthew Wilcox, on 08/24/2010 06:55 PM wrote:
> On Tue, Aug 24, 2010 at 06:43:29PM +0400, Vladislav Bolkhovitin wrote:
>> Also it looks very suspicious why nobody even tried to match that
>> Microsoft/Intel record, even Intel itself who closely works with Linux
>> community in the storage area and could do it using the same hardware.
>
> You seem to be under the impression that "Intel" is some monolithic
> entity. Despite working with six different storage& performance
> groups within Intel, I have no idea what record you're referring to,
> nor what hardware it was accomplished with.

It is
http://communities.intel.com/community/wired/blog/2010/04/22/1-million-iops-how-about-125-million

> Even if I did, I wouldn't
> know which group within Intel to contact to see if they still have
> the setup. Then I'd have to convince them that it's in their interest
> to try to replicate this on Linux. And I'd have to be prepared to sink
> a considerable quantity of my time into it ... which I don't have.

Sorry if it looked like I was blaming you. I just was wondering why
Intel developed Linux drivers for those network adapters and isn't
interested to similarly demonstrate their performance on Linux.

Vlad

2010-08-24 20:40:13

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Linux I/O subsystem performance

On Tue, Aug 24, 2010 at 09:51:38PM +0400, Vladislav Bolkhovitin wrote:
> Matthew Wilcox, on 08/24/2010 06:55 PM wrote:
>> On Tue, Aug 24, 2010 at 06:43:29PM +0400, Vladislav Bolkhovitin wrote:
>>> Also it looks very suspicious why nobody even tried to match that
>>> Microsoft/Intel record, even Intel itself who closely works with Linux
>>> community in the storage area and could do it using the same hardware.
>>
>> You seem to be under the impression that "Intel" is some monolithic
>> entity. Despite working with six different storage& performance
>> groups within Intel, I have no idea what record you're referring to,
>> nor what hardware it was accomplished with.
>
> It is
> http://communities.intel.com/community/wired/blog/2010/04/22/1-million-iops-how-about-125-million

Ah, iSCSI. I don't work with that group. I'm a bit too busy with other projects to pursue a relationship with them right now :-)

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."