2007-12-03 23:06:42

by Chris Friesen

[permalink] [raw]
Subject: solid state drive access and context switching


Over on comp.os.linux.development.system someone asked an interesting
question, and I thought I'd mention it here.

Given a fast low-latency solid state drive, would it ever be beneficial
to simply wait in the kernel for synchronous read/write calls to
complete? The idea is that you could avoid at least two task context
switches, and if the data access can be completed at less cost than
those context switches it could be an overall win.

Has anyone played with this concept?

Chris


2007-12-03 23:10:55

by Alan

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> Given a fast low-latency solid state drive, would it ever be beneficial
> to simply wait in the kernel for synchronous read/write calls to
> complete? The idea is that you could avoid at least two task context
> switches, and if the data access can be completed at less cost than
> those context switches it could be an overall win.

In certain situations theoretically yes, the kernel is better off
continuing to poll than switching to the idle thread. You can do this to
some extent in a driver already today - just poll rather than sleeping
but respsect the reschedule hints and don't do it with irqs masked.

> Has anyone played with this concept?

For things like SATA based devices they aren't that fast yet.

Alan

2007-12-04 17:56:20

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> > Has anyone played with this concept?
>
> For things like SATA based devices they aren't that fast yet.

What is fast enough?

As I understand the basic memory technology, the hard limit is in the
100's of microseconds range for latency. SATA adds something to that.
I'd be surprised to see latencies on SATA SSD's as measured at the OS
level to get below 1 millisecond.

What happens we start placing NAND technology in lower latency, higher
bandwidth buses? I'm guessing we'll get down to that 100's of
microseconds level and an order of magnitude higher bandwidth than
SATA. Is that fast enough to warrant this more synchronous IO?

Magnetic drives have latencies ~10 milliseconds, current SSD's are an
order of magnitude better (~1 millisecond), new interfaces and
refinements could theoretically get us down one more (~100
microsecond). I'm guessing the current block driver subsystem would
negate a lot of that latency gain. Am I wrong?

BTW - This trend toward faster, lower latency busses is marching
forward. 2 examples; the ioDrive from Fusion IO, Micron's RAM-module
like SSD concept.

2007-12-04 20:40:21

by Alan

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> microseconds level and an order of magnitude higher bandwidth than
> SATA. Is that fast enough to warrant this more synchronous IO?

See the mtd layer.

> BTW - This trend toward faster, lower latency busses is marching
> forward. 2 examples; the ioDrive from Fusion IO, Micron's RAM-module
> like SSD concept.

Very much so but we can do quite a bit in 10,000 processor cycles ...

Alan

2007-12-04 20:46:45

by Chris Friesen

[permalink] [raw]
Subject: Re: solid state drive access and context switching

Jared Hulbert wrote:

> Magnetic drives have latencies ~10 milliseconds, current SSD's are an
> order of magnitude better (~1 millisecond), new interfaces and
> refinements could theoretically get us down one more (~100
> microsecond).

They've already done already better than that. Here's a solid state
drive with a claimed 20 microsecond access time:

http://www.curtisssd.com/products/drives/hyperxclr

Chris

2007-12-04 20:52:32

by Jeff Garzik

[permalink] [raw]
Subject: Re: solid state drive access and context switching

Alan Cox wrote:
> For things like SATA based devices they aren't that fast yet.

You forget the Gigabyte i-RAM.

For others: the i-RAM is a SATA-based device that plugs into a PCI slot
on your motherboard (for power), providing RAM+battery backup as fast as
your SATA bus and DIMMs will go.

Jeff


2007-12-04 21:07:34

by Alan

[permalink] [raw]
Subject: Re: solid state drive access and context switching

On Tue, 04 Dec 2007 15:52:20 -0500
Jeff Garzik <[email protected]> wrote:

> Alan Cox wrote:
> > For things like SATA based devices they aren't that fast yet.
>
> You forget the Gigabyte i-RAM.
>
> For others: the i-RAM is a SATA-based device that plugs into a PCI slot
> on your motherboard (for power), providing RAM+battery backup as fast as
> your SATA bus and DIMMs will go.

Actually even allowing for the iRAM the SATA stuff is way too slow to be
worth using synchronously. The latency is a killer.

2007-12-04 21:49:18

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> > refinements could theoretically get us down one more (~100
> > microsecond).
>
> They've already done already better than that. Here's a solid state
> drive with a claimed 20 microsecond access time:
>
> http://www.curtisssd.com/products/drives/hyperxclr

Right. That looks to be RAM based, which means $$$$ compared to NAND,
so that's not going to breakout of a server niche. I imagine the
latency is the device latency not the system latency. By the time you
send the request through the fibrechannel stack and get the block back
it's gonna be much closer to 100 microseconds. It's that OS visible
latency that you've got to design to.

2007-12-04 21:54:31

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> > microseconds level and an order of magnitude higher bandwidth than
> > SATA. Is that fast enough to warrant this more synchronous IO?
>
> See the mtd layer.

Right. The trend is to hide the nastiness of NAND technology changes
behind controllers. In general I think this is a good thing.
Basically the changes in ECC and reliability change very rapidly in
this technology. Having custom controller hardware to handle this is
faster than handling it in software and makes for a nice modular
interface. We don't rewrite our SATA drivers and filesystem
everything the magnetic media switches to a new recording scheme, we
just plug it in. SSD's are going to be like that even if they aren't
SATA. However, the MTD layer is more about managing the chips
themselves, which is what the controllers are for.

Maybe I'm missing something but I don't see it. We want a block
interface for these devices, we just need a faster slimmer interface.
Maybe a new mtdblock interface that doesn't do erase would be the
place for?

> > BTW - This trend toward faster, lower latency busses is marching
> > forward. 2 examples; the ioDrive from Fusion IO, Micron's RAM-module
> > like SSD concept.
>
> Very much so but we can do quite a bit in 10,000 processor cycles ...
>
> Alan
>

2007-12-04 22:50:55

by Jörn Engel

[permalink] [raw]
Subject: Re: solid state drive access and context switching

On Tue, 4 December 2007 13:54:21 -0800, Jared Hulbert wrote:
>
> Maybe I'm missing something but I don't see it. We want a block
> interface for these devices, we just need a faster slimmer interface.
> Maybe a new mtdblock interface that doesn't do erase would be the
> place for?

Doesn't do erase? MTD has to learn almost all tricks from the block
layer, as devices are becoming high-latency high-bandwidth, compared to
what MTD was designed for. In order to get any decent performance, we
need asynchronous operations, request queues and caching.

The only useful advantage MTD does have over block devices is an
_explicit_ erase operation. Did you mean "doesn't do _implicit_ erase".

Jörn

--
It's just what we asked for, but not what we want!
-- anonymous

2007-12-04 23:29:28

by Alan

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> Right. The trend is to hide the nastiness of NAND technology changes
> behind controllers. In general I think this is a good thing.

You miss the point - any controller you hide it behind almost inevitably
adds enough latency you don't want to use it synchronously.

Alan

2007-12-05 00:03:54

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> > Maybe I'm missing something but I don't see it. We want a block
> > interface for these devices, we just need a faster slimmer interface.
> > Maybe a new mtdblock interface that doesn't do erase would be the
> > place for?
>
> Doesn't do erase? MTD has to learn almost all tricks from the block
> layer, as devices are becoming high-latency high-bandwidth, compared to
> what MTD was designed for. In order to get any decent performance, we
> need asynchronous operations, request queues and caching.
>
> The only useful advantage MTD does have over block devices is an
> _explicit_ erase operation. Did you mean "doesn't do _implicit_ erase".


You're right. That the point I was trying to make, albeit badly, MTD
isn't the place for this. The fact that more and more of what the MTD
is being used for looks a lot like the block layer is a whole
different discussion.

2007-12-05 00:08:21

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

On Dec 4, 2007 3:24 PM, Alan Cox <[email protected]> wrote:
> > Right. The trend is to hide the nastiness of NAND technology changes
> > behind controllers. In general I think this is a good thing.
>
> You miss the point - any controller you hide it behind almost inevitably
> adds enough latency you don't want to use it synchronously.

I think I get it. We keep saying that it's the latency is too high.
I agree that most technologies out there have latencies that are too
high. Again I ask the question, what latencies do we have to hit
before the sync options become worth it?

2007-12-05 00:29:35

by Alan

[permalink] [raw]
Subject: Re: solid state drive access and context switching

On Tue, 4 Dec 2007 16:08:07 -0800
"Jared Hulbert" <[email protected]> wrote:

> On Dec 4, 2007 3:24 PM, Alan Cox <[email protected]> wrote:
> > > Right. The trend is to hide the nastiness of NAND technology changes
> > > behind controllers. In general I think this is a good thing.
> >
> > You miss the point - any controller you hide it behind almost inevitably
> > adds enough latency you don't want to use it synchronously.
>
> I think I get it. We keep saying that it's the latency is too high.
> I agree that most technologies out there have latencies that are too
> high. Again I ask the question, what latencies do we have to hit
> before the sync options become worth it?

Probably about 1000 clocks but its always going to depend upon the
workload and whether any other work can be done usefully.

Alan

2007-12-05 01:12:05

by Robert Hancock

[permalink] [raw]
Subject: Re: solid state drive access and context switching

Chris Friesen wrote:
>
> Over on comp.os.linux.development.system someone asked an interesting
> question, and I thought I'd mention it here.
>
> Given a fast low-latency solid state drive, would it ever be beneficial
> to simply wait in the kernel for synchronous read/write calls to
> complete? The idea is that you could avoid at least two task context
> switches, and if the data access can be completed at less cost than
> those context switches it could be an overall win.
>
> Has anyone played with this concept?

I don't think most SSDs are fast enough that it would really be worth
avoiding the context switch for.. I could be wrong though.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-12-05 22:01:44

by Jared Hulbert

[permalink] [raw]
Subject: Re: solid state drive access and context switching

> Probably about 1000 clocks but its always going to depend upon the
> workload and whether any other work can be done usefully.

Yeah. Sounds right, in the microsecond range. Be interesting to see data.

Anybody have ideas on what kind of experiments could confirm this
estimate is right?

2007-12-06 03:51:41

by Kyungmin Park

[permalink] [raw]
Subject: Re: solid state drive access and context switching

Hi,

On Dec 6, 2007 7:01 AM, Jared Hulbert <[email protected]> wrote:
> > Probably about 1000 clocks but its always going to depend upon the
> > workload and whether any other work can be done usefully.
>
> Yeah. Sounds right, in the microsecond range. Be interesting to see data.
>
> Anybody have ideas on what kind of experiments could confirm this
> estimate is right?

Is it the right place to write synchronously?
Now only concern the SATA.

Thank you,
Kyungmin Park

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3b927be..cce0618 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -3221,6 +3221,13 @@ static inline void __generic_make_request(struct bio *bio
if (bio_check_eod(bio, nr_sectors))
goto end_io;

+#if 1
+ /* FIXME simple hack */
+ if (MAJOR(bio->bi_bdev->bd_dev) == 8 && bio_data_dir(bio) == WRITE) {
+ /* WRITE_SYNC */
+ bio->bi_rw |= (1 << BIO_RW_SYNC);
+ }
+#endif
/*
* Resolve the mapping until finished. (drivers are
* still free to implement/resolve their own stacking