2011-05-03 02:44:07

by Gao, Yunpeng

[permalink] [raw]
Subject: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

Currently, some new storage devices have the ability to do performance optimizations according to the type of data payload - say, file system metadata, time-stamps, sequential write in some granularity, random write and so on.

For example, the latest eMMC 4.5 device can support the so-called 'Context Management' and 'Data Tag Mechanism' features. By receiving the information of payload data type, the eMMC 4.5 device can improve the access rate during the following read and update operations and offer a more reliable and robust storage.

And obviously, to enable these kind of advanced features of storage device, it needs not only the low level block device driver supports, but also the file system supports since files system knows more info about the data type of the payload. But currently, seems there are no file systems implemented these kind of supports yet.

So, my question is, is there any plan or discussion on supporting this feature (passing data type info to low level block device driver) on file system developments? Especially for ext4/btrfs, since now they are very hot in Linux? Thanks.


Regards,
Yunpeng Gao


2011-05-03 04:10:26

by Kyungmin Park

[permalink] [raw]
Subject: Re: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

Hi,

It seems similar with TRIM. So how about to consider TRIM
implementation or extend it?
I'm not familiar with NCQ implementation but it's also helpful to implement it.

Thank you,
Kyungmin Park

On Tue, May 3, 2011 at 11:44 AM, Gao, Yunpeng <[email protected]> wrote:
> Currently, some new storage devices have the ability to do performance optimizations according to the type of data payload - say, file system metadata, time-stamps, sequential write in some granularity, random write and so on.
>
> For example, the latest eMMC 4.5 device can support the so-called 'Context Management' and 'Data Tag Mechanism' features. By receiving the information of payload data type, the eMMC 4.5 device can improve the access rate during the following read and update operations and offer a more reliable and robust storage.
>
> And obviously, to enable these kind of advanced features of storage device, it needs not only the low level block device driver supports, but also the file system supports since files system knows more info about the data type of the payload. But currently, seems there are no file systems implemented these kind of supports yet.
>
> So, my question is, is there any plan or discussion on supporting this feature (passing data type info to low level block device driver) on file system developments? Especially for ext4/btrfs, since now they are very hot in Linux? Thanks.
>
>
> Regards,
> Yunpeng Gao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-05-03 13:43:42

by Martin K. Petersen

[permalink] [raw]
Subject: Re: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

>>>>> "Yunpeng" == Gao, Yunpeng <[email protected]> writes:

Yunpeng> So, my question is, is there any plan or discussion on
Yunpeng> supporting this feature (passing data type info to low level
Yunpeng> block device driver) on file system developments? Especially
Yunpeng> for ext4/btrfs, since now they are very hot in Linux? Thanks.

Yes, I have been working on some changes that allow us to tag bios and
pass the information out to storage. These patches have been on the back
burner for a while due to other commitments. But I'll dig them out and
post them later. We just discussed them a couple of weeks ago at the
Linux Storage Workshop.

In the meantime: Can you point me to the relevant eMMC stuff so I can
see how many tiers or classes we have to work with there?

--
Martin K. Petersen Oracle Linux Engineering

2011-05-04 10:34:08

by Gao, Yunpeng

[permalink] [raw]
Subject: RE: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

Hi Park,

Thanks a lot for the response.

>It seems similar with TRIM. So how about to consider TRIM
>implementation or extend it?

Yes, file system set REQ_DISCARD flag to notify block device driver to execute TRIM.
And I noticed there's already a flag REQ_META used for file system meta data.
But seems only gfs2 file system uses the REQ_META flag, other file systems (including ext4 and btrfs) have not used this flag.
Is the flag useless or any other reason?

>I'm not familiar with NCQ implementation but it's also helpful to implement it.

I don't know much about NCQ, too. Just have a impression that NCQ is only for SATA hard disk, can it also be used for block device, such as the mmc block device?

Thanks.

Regards,
Yunpeng


2011-05-04 11:45:16

by Gao, Yunpeng

[permalink] [raw]
Subject: RE: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

>Yes, I have been working on some changes that allow us to tag bios and
>pass the information out to storage. These patches have been on the back
>burner for a while due to other commitments. But I'll dig them out and
>post them later. We just discussed them a couple of weeks ago at the
>Linux Storage Workshop.

That's great! Thanks for the update and look forward to your patches.

>In the meantime: Can you point me to the relevant eMMC stuff so I can
>see how many tiers or classes we have to work with there?
>
I'm investigating on add some eMMC 4.5 features support in current Linux mmc driver (drivers/mmc).
The Linux mmc driver register the eMMC device as a normal Linux block device. So, it can get all the Linux block layer bio flags.

To below eMMC 4.5 new features:

Data Tag:
'The mechanism permits the device to receive from the host information about specific data types (for instance file system metadata, time-stamps, configuration parameters, etc.). The information is conveyed before a write multiple blocks operation at well defined addresses. By receiving this information the device can improve the access rate during the following read and update operations and offer a more reliable and robust storage.'

I guess the exist block layer flag REQ_META can be used to notify the low level block driver/device to execute the Data Tag feature. But don't know why currently most of Linux file systems don't use the REQ_META flag at all (seems only gfs2 uses it now).

Context Management:
'To better differentiate between large sequential operations and small random operations, and to improve multitasking support, contexts can be associated with groups of read or write commands ... A context can be seen as an active session, configured for a specific read/write pattern (e.g. sequential in some granularity). Multiple read or write commands are associated with this context to create some logical association between them, to allow device to optimize performance.'

To my understanding, to support this feature, it needs file system (or application?) to notify the low level driver that the following data access will be large sequential read/write operations.
And I'm not sure is it possible for file system to pass this kind of information to low level driver? Any idea?

Thanks.

Regards,
Yunpeng





2011-05-04 14:51:39

by Andreas Dilger

[permalink] [raw]
Subject: Re: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

On 2011-05-04, at 5:45 AM, "Gao, Yunpeng" <[email protected]> wrote:
>> Yes, I have been working on some changes that allow us to tag bios and
>> pass the information out to storage. These patches have been on the back
>> burner for a while due to other commitments. But I'll dig them out and
>> post them later. We just discussed them a couple of weeks ago at the
>> Linux Storage Workshop.
>
> That's great! Thanks for the update and look forward to your patches.
>
>> In the meantime: Can you point me to the relevant eMMC stuff so I can
>> see how many tiers or classes we have to work with there?
>>
> I'm investigating on add some eMMC 4.5 features support in current Linux mmc driver (drivers/mmc).
> The Linux mmc driver register the eMMC device as a normal Linux block device. So, it can get all the Linux block layer bio flags.
>
> To below eMMC 4.5 new features:
>
> Data Tag:
> 'The mechanism permits the device to receive from the host information about specific data types (for instance file system metadata, time-stamps, configuration parameters, etc.). The information is conveyed before a write multiple blocks operation at well defined addresses. By receiving this information the device can improve the access rate during the following read and update operations and offer a more reliable and robust storage.'
>
> I guess the exist block layer flag REQ_META can be used to notify the low level block driver/device to execute the Data Tag feature. But don't know why currently most of Linux file systems don't use the REQ_META flag at all (seems only gfs2 uses it now).

I was aware of REQ_META, but I didn't know there was any benefit to using it. I think it would be easy to set REQ_META on all ext4 metadata if there was a reason to do so.

> Context Management:
> 'To better differentiate between large sequential operations and small random operations, and to improve multitasking support, contexts can be associated with groups of read or write commands ... A context can be seen as an active session, configured for a specific read/write pattern (e.g. sequential in some granularity). Multiple read or write commands are associated with this context to create some logical association between them, to allow device to optimize performance.'
>
> To my understanding, to support this feature, it needs file system (or application?) to notify the low level driver that the following data access will be large sequential read/write operations.
> And I'm not sure is it possible for file system to pass this kind of information to low level driver? Any idea?

A simple way to do this would be to use the inode number as the context, so writes to the same inode are grouped together. Another possibility is the PID, though that is less clearly correct (e.g. if tar is extracting a buch of files, are they all related or not).

Cheers, Andreas

2011-05-05 18:10:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

On Wed, May 04, 2011 at 08:51:39AM -0600, Andreas Dilger wrote:
> I was aware of REQ_META, but I didn't know there was any benefit to
> using it. I think it would be easy to set REQ_META on all ext4 metadata
> if there was a reason to do so.

The CFQ ioscheduler pays attention to it (prioritising metadata accesses
over data accesses), and blocktrace will print an 'M' for metadata
requests if it's set, so I think that's two excellent reasons to set
REQ_META today.

However, ext3, ext4, and XFS already use it:

fs/ext3/inode.c:1105: ll_rw_block(READ_META, 1, &bh);
fs/ext3/inode.c:2754: submit_bh(READ_META, bh);
fs/ext3/namei.c:924: ll_rw_block(READ_META, 1, &bh);
fs/ext4/inode.c:1500: ll_rw_block(READ_META, 1, &bh);
fs/ext4/inode.c:4775: submit_bh(READ_META, bh);
fs/ext4/namei.c:924: ll_rw_block(READ_META, 1, &bh);
fs/gfs2/log.c:597: submit_bh(WRITE_SYNC | REQ_META, bh);
fs/gfs2/log.c:599: submit_bh(WRITE_FLUSH_FUA | REQ_META, bh);
fs/gfs2/meta_io.c:39: int write_op = REQ_META |
fs/gfs2/meta_io.c:228: submit_bh(READ_SYNC | REQ_META, bh);
fs/gfs2/meta_io.c:435: ll_rw_block(READ_SYNC | REQ_META, 1, &first_bh);
fs/gfs2/ops_fstype.c:221: submit_bio(READ_SYNC | REQ_META, bio);
fs/gfs2/quota.c:710: ll_rw_block(READ_META, 1, &bh);
fs/xfs/linux-2.6/xfs_buf.c:1321: rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
include/linux/fs.h:164:#define READ_META (READ | REQ_META)
include/linux/fs.h:168:#define WRITE_META (WRITE | REQ_META)

btrfs seems to not use REQ_META yet. *poke* *poke* :-)

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2011-05-05 20:11:51

by Andreas Dilger

[permalink] [raw]
Subject: Re: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

On May 5, 2011, at 12:10, Matthew Wilcox wrote:
> On Wed, May 04, 2011 at 08:51:39AM -0600, Andreas Dilger wrote:
>> I was aware of REQ_META, but I didn't know there was any benefit to
>> using it. I think it would be easy to set REQ_META on all ext4 metadata
>> if there was a reason to do so.
>
> The CFQ ioscheduler pays attention to it (prioritising metadata accesses
> over data accesses), and blocktrace will print an 'M' for metadata
> requests if it's set, so I think that's two excellent reasons to set
> REQ_META today.
>
> However, ext3, ext4, and XFS already use it:
>
> fs/ext4/inode.c:1500: ll_rw_block(READ_META, 1, &bh);
- ext4_bread()

> fs/ext4/inode.c:4775: submit_bh(READ_META, bh);
- __ext4_get_inode_loc()
> fs/ext4/namei.c:924: ll_rw_block(READ_META, 1, &bh);
- ext4_find_entry()

Looking more closely at the code it seems that this is handling only a subset of the code. There are many places in ext4 that are using sb_bread() instead of ext4_bread(), in particular in the extents and migration code that was developed more recently, but also in the xattr code which has been around a long time.

Cheers, Andreas






2011-05-09 05:50:46

by Gao, Yunpeng

[permalink] [raw]
Subject: RE: Is it possible for the ext4/btrfs file system to pass some context related info to low level block driver?

>-----Original Message-----
>From: Andreas Dilger [mailto:[email protected]]
>Sent: Wednesday, May 04, 2011 10:52 PM
>To: Gao, Yunpeng
>Cc: Martin K. Petersen; [email protected];
>[email protected]; [email protected];
>[email protected]
>Subject: Re: Is it possible for the ext4/btrfs file system to pass some context
>related info to low level block driver?
>
>On 2011-05-04, at 5:45 AM, "Gao, Yunpeng" <[email protected]>
>wrote:
>>> Yes, I have been working on some changes that allow us to tag bios and
>>> pass the information out to storage. These patches have been on the back
>>> burner for a while due to other commitments. But I'll dig them out and
>>> post them later. We just discussed them a couple of weeks ago at the
>>> Linux Storage Workshop.
>>
>> That's great! Thanks for the update and look forward to your patches.
>>
>>> In the meantime: Can you point me to the relevant eMMC stuff so I can
>>> see how many tiers or classes we have to work with there?
>>>
>> I'm investigating on add some eMMC 4.5 features support in current Linux
>mmc driver (drivers/mmc).
>> The Linux mmc driver register the eMMC device as a normal Linux block
>device. So, it can get all the Linux block layer bio flags.
>>
>> To below eMMC 4.5 new features:
>>
>> Data Tag:
>> 'The mechanism permits the device to receive from the host information
>about specific data types (for instance file system metadata, time-stamps,
>configuration parameters, etc.). The information is conveyed before a write
>multiple blocks operation at well defined addresses. By receiving this
>information the device can improve the access rate during the following read
>and update operations and offer a more reliable and robust storage.'
>>
>> I guess the exist block layer flag REQ_META can be used to notify the low
>level block driver/device to execute the Data Tag feature. But don't know why
>currently most of Linux file systems don't use the REQ_META flag at all (seems
>only gfs2 uses it now).
>
>I was aware of REQ_META, but I didn't know there was any benefit to using it.
>I think it would be easy to set REQ_META on all ext4 metadata if there was a
>reason to do so.

To NAND-flash based storage device, such as eMMC card, it's possible that the REQ_META tag can be used to notify the low level eMMC card - 'Hi, the following data is important meta data, it should be stored into some more reliable area of the NAND flash media'. For example, stored the meta data into SLC NAND partition/area inside of the eMMC card. Of course, it depends on the card vendor's implementation.

>> Context Management:
>> 'To better differentiate between large sequential operations and small
>random operations, and to improve multitasking support, contexts can be
>associated with groups of read or write commands ... A context can be seen
>as an active session, configured for a specific read/write pattern (e.g.
>sequential in some granularity). Multiple read or write commands are
>associated with this context to create some logical association between them,
>to allow device to optimize performance.'
>>
>> To my understanding, to support this feature, it needs file system (or
>application?) to notify the low level driver that the following data access will
>be large sequential read/write operations.
>> And I'm not sure is it possible for file system to pass this kind of information
>to low level driver? Any idea?
>
>A simple way to do this would be to use the inode number as the context, so
>writes to the same inode are grouped together. Another possibility is the PID,
>though that is less clearly correct (e.g. if tar is extracting a buch of files, are
>they all related or not).

Thanks a lot for the suggestions. I'm not sure whether they're feasible, but will look into it and do more investigations.

Thanks.

Regards,
Yunpeng