2008-02-19 05:48:33

by Kyungmin Park

[permalink] [raw]
Subject: Question about synchronous write on SSD

Hi,

Don't you remember the topic "solid state drive access and context
switching" [1].
I want to measure it is really better performance on SSD?

To write it on ssd synchronously, I hacked the
'generic_make_request()' [2] and got following results.

# echo 3 > /proc/sys/vm/drop_caches
# tiotest -f 100 -R -d /dev/sdd1
Tiotest results for 4 concurrent io threads:
| Write 400 MBs | 4.7 s | 84.306 MB/s | 8.4 % | 77.5 % |
| Read 400 MBs | 4.3 s | 92.945 MB/s | 7.2 % | 53.5 % |
Tiotest latency results:
| Write | 0.126 ms | 706.379 ms | 0.00000 | 0.00000 |
| Read | 0.161 ms | 311.738 ms | 0.00000 | 0.00000 |

# echo 3 > /proc/sys/vm/drop_caches
# tiotest -f 1000 -R -d /dev/sdd1
Tiotest results for 4 concurrent io threads:
| Write 4000 MBs | 47.5 s | 84.124 MB/s | 7.0 % | 83.6 % |
| Read 4000 MBs | 41.9 s | 95.530 MB/s | 7.8 % | 55.6 % |
Tiotest latency results:
| Write | 0.176 ms | 714.677 ms | 0.00000 | 0.00000 |
| Read | 0.161 ms | 311.815 ms | 0.00000 | 0.00000 |

However it's same performance as before. It means the patch is meaningless.

Could you tell me is it the proper place to hack or others?

Thank you,
Kyungmin Park

p.s. Of cource I got the following message
WARNING: at block/blk-core.c:1351 generic_make_request+0x126/0x3d8()

1. http://lkml.org/lkml/2007/12/3/247
2. simple hack
diff --git a/block/blk-core.c b/block/blk-core.c
index e9754dc..7262720 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1345,6 +1345,14 @@ static inline void
__generic_make_request(struct bio *bio)
if (bio_check_eod(bio, nr_sectors))
goto end_io;

+ /* FIXME simple hack by kmpark */
+ if (MINOR(bio->bi_bdev->bd_dev) == 49 &&
+ MAJOR(bio->bi_bdev->bd_dev) == 8 && bio_data_dir(bio) == WRITE) {
+ WARN_ON_ONCE(1);
+ /* Write synchronous */
+ bio->bi_rw |= (1 << BIO_RW_SYNC);
+ }
+
/*
* Resolve the mapping until finished. (drivers are
* still free to implement/resolve their own stacking


2008-02-19 08:16:48

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: Question about synchronous write on SSD

Hi,

Le Tue, 19 Feb 2008 14:48:18 +0900,
"Kyungmin Park" <[email protected]> a écrit :

> + /* Write synchronous */
> + bio->bi_rw |= (1 << BIO_RW_SYNC);

Adding BIO_RW_SYNC doesn't make generic_make_request() synchronous as
in "generic_make_request() returns only after write completion".
BIO_RW_SYNC only asks the I/O layer to unplug immediatly. But
generic_make_request() still returns before the completion of the I/O,
and the completion is notified asynchronously.

See:
http://lxr.free-electrons.com/source/include/linux/bio.h#139

Sincerly,

Thomas
--
Thomas Petazzoni, Free Electrons
Free Embedded Linux Training Materials
on http://free-electrons.com/training
(More than 1500 pages!)


Attachments:
signature.asc (189.00 B)

2008-02-19 09:40:18

by Kyungmin Park

[permalink] [raw]
Subject: RE: Question about synchronous write on SSD

> Le Tue, 19 Feb 2008 14:48:18 +0900,
> "Kyungmin Park" <[email protected]> a écrit :
>
> > + /* Write synchronous */
> > + bio->bi_rw |= (1 << BIO_RW_SYNC);
>
> Adding BIO_RW_SYNC doesn't make generic_make_request() synchronous as
> in "generic_make_request() returns only after write completion".
> BIO_RW_SYNC only asks the I/O layer to unplug immediatly. But
> generic_make_request() still returns before the completion of the I/O,
> and the completion is notified asynchronously.
>

Agree, however see the following sequence.

__generic_make_request call q->make_request_fn(q, bio);
It was set by blk_init_queue_node with __make_request.
There are two ways in __make_request.
Case 1, get_rq
Case 2, out or merged (otherwise you mean unplug case)

In case 1, if the BIO_RW_SYNC is set, the request gets the REQ_RW_SYNC
And REQ_RW_SYNC says
"include/linux/blkdev.h":112: __REQ_RW_SYNC, /* request is sync (O_DIRECT) */
It means it acts as O_DIRECT flag. Is it right?
And it also is same as case 2. Unplug the device.
So next time it hasn't chance to merge???

Is it right?

BR,
Kyungmin Park

2008-02-19 09:43:20

by Jens Axboe

[permalink] [raw]
Subject: Re: Question about synchronous write on SSD

On Tue, Feb 19 2008, Kyungmin Park wrote:
> > Le Tue, 19 Feb 2008 14:48:18 +0900,
> > "Kyungmin Park" <[email protected]> a ?crit :
> >
> > > + /* Write synchronous */
> > > + bio->bi_rw |= (1 << BIO_RW_SYNC);
> >
> > Adding BIO_RW_SYNC doesn't make generic_make_request() synchronous as
> > in "generic_make_request() returns only after write completion".
> > BIO_RW_SYNC only asks the I/O layer to unplug immediatly. But
> > generic_make_request() still returns before the completion of the I/O,
> > and the completion is notified asynchronously.
> >
>
> Agree, however see the following sequence.
>
> __generic_make_request call q->make_request_fn(q, bio);
> It was set by blk_init_queue_node with __make_request.
> There are two ways in __make_request.
> Case 1, get_rq
> Case 2, out or merged (otherwise you mean unplug case)
>
> In case 1, if the BIO_RW_SYNC is set, the request gets the REQ_RW_SYNC
> And REQ_RW_SYNC says
> "include/linux/blkdev.h":112: __REQ_RW_SYNC, /* request is sync (O_DIRECT) */
> It means it acts as O_DIRECT flag. Is it right?
> And it also is same as case 2. Unplug the device.
> So next time it hasn't chance to merge???

But that still doesn't make it sync. I think you are working the wrong
way. For ssd we still want merging and plugging also makes sense to some
degree, though it probably should be minimized. It'll only cause an
initial latency, for busy IO workloads you wont be plugging/unplugging
much anyway.

In fact your patch makes things WORSE, since the io schedulers will now
treat the IO as sync and introduce idling for the disk head. And you
definitely don't want that.

--
Jens Axboe

2008-02-19 09:55:47

by Kyungmin Park

[permalink] [raw]
Subject: RE: Question about synchronous write on SSD

> >
> > Agree, however see the following sequence.
> >
> > __generic_make_request call q->make_request_fn(q, bio);
> > It was set by blk_init_queue_node with __make_request.
> > There are two ways in __make_request.
> > Case 1, get_rq
> > Case 2, out or merged (otherwise you mean unplug case)
> >
> > In case 1, if the BIO_RW_SYNC is set, the request gets the REQ_RW_SYNC
> > And REQ_RW_SYNC says
> > "include/linux/blkdev.h":112: __REQ_RW_SYNC, /* request is sync (O_DIRECT) */
> > It means it acts as O_DIRECT flag. Is it right?
> > And it also is same as case 2. Unplug the device.
> > So next time it hasn't chance to merge???
>
> But that still doesn't make it sync. I think you are working the wrong
> way. For ssd we still want merging and plugging also makes sense to some
> degree, though it probably should be minimized. It'll only cause an
> initial latency, for busy IO workloads you wont be plugging/unplugging
> much anyway.
>
> In fact your patch makes things WORSE, since the io schedulers will now
> treat the IO as sync and introduce idling for the disk head. And you
> definitely don't want that.

Yes, you're right. It's for testing.
I just want to know the worst or corner case, if all writes are synchronous.
Of course I can measure the using tiotest "Do write synchronous" option.
Then you think it's the worse case?

Kyungmin Park

2008-02-19 10:00:48

by Jens Axboe

[permalink] [raw]
Subject: Re: Question about synchronous write on SSD

On Tue, Feb 19 2008, Kyungmin Park wrote:
> > >
> > > Agree, however see the following sequence.
> > >
> > > __generic_make_request call q->make_request_fn(q, bio);
> > > It was set by blk_init_queue_node with __make_request.
> > > There are two ways in __make_request.
> > > Case 1, get_rq
> > > Case 2, out or merged (otherwise you mean unplug case)
> > >
> > > In case 1, if the BIO_RW_SYNC is set, the request gets the REQ_RW_SYNC
> > > And REQ_RW_SYNC says
> > > "include/linux/blkdev.h":112: __REQ_RW_SYNC, /* request is sync (O_DIRECT) */
> > > It means it acts as O_DIRECT flag. Is it right?
> > > And it also is same as case 2. Unplug the device.
> > > So next time it hasn't chance to merge???
> >
> > But that still doesn't make it sync. I think you are working the wrong
> > way. For ssd we still want merging and plugging also makes sense to some
> > degree, though it probably should be minimized. It'll only cause an
> > initial latency, for busy IO workloads you wont be plugging/unplugging
> > much anyway.
> >
> > In fact your patch makes things WORSE, since the io schedulers will now
> > treat the IO as sync and introduce idling for the disk head. And you
> > definitely don't want that.
>
> Yes, you're right. It's for testing.
> I just want to know the worst or corner case, if all writes are synchronous.
> Of course I can measure the using tiotest "Do write synchronous" option.
> Then you think it's the worse case?

If you want to test when all writes are sync, then either mount with -o
sync, open with O_SYNC or use O_DIRECT writes. You can't force that
behaviour by changing the block layer code. Perhaps you could force
O_SYNC when a file is opened, if you want to experiment with worst case
generally. Not sure that makes a lot of sense, though.

--
Jens Axboe