2019-05-10 17:03:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] Extend write-hint framework, and add write-hint for Ext4 journal

I think this fundamentally goes in the wrong direction. We explicitly
designed the block layer infrastructure around life time hints and
not the not fish not flesh streams interface, which causes all kinds
of problems.

Including the one this model causes on at least some SSDs where you
now statically allocate resources to a stream that is now not globally
available. All for the little log with very short date lifetime that
any half decent hot/cold partitioning algorithm in the SSD should be
able to detect.


2019-05-17 05:46:30

by Kanchan Joshi

[permalink] [raw]
Subject: RE: [PATCH v5 0/7] Extend write-hint framework, and add write-hint for Ext4 journal

Hi Christoph,

> Including the one this model causes on at least some SSDs where you now
statically allocate resources to a stream that is now not globally
available.

Sorry but can you please elaborate the issue? I do not get what is being
statically allocated which was globally available earlier.
If you are referring to nvme driver, available streams at subsystem level
are being reflected for all namespaces. This is same as earlier.
There is no attempt to explicitly allocate (using dir-receive) or reserve
streams for any namespace.
Streams will continue to get allocated/released implicitly as and when
writes (with stream id) arrive.

> All for the little log with very short date lifetime that any half decent
hot/cold partitioning algorithm in the SSD should be able to detect.

With streams, hot/cold segregation is happening at the time of placement
itself, without algorithm; that is a clear win over algorithms which take
time/computation to be able to do the same.
And infrastructure update (write-hint-to-stream-id conversion in
block-layer, in-kernel hints etc.) seems to be required anyway for streams
to extend its reach beyond nvme and user-space hints.

Thanks,

-----Original Message-----
From: Christoph Hellwig [mailto:[email protected]]
Sent: Friday, May 10, 2019 10:33 PM
To: Kanchan Joshi <[email protected]>
Cc: [email protected]; [email protected];
[email protected]; [email protected];
[email protected]; [email protected]; [email protected]
Subject: Re: [PATCH v5 0/7] Extend write-hint framework, and add write-hint
for Ext4 journal

I think this fundamentally goes in the wrong direction. We explicitly
designed the block layer infrastructure around life time hints and not the
not fish not flesh streams interface, which causes all kinds of problems.

Including the one this model causes on at least some SSDs where you now
statically allocate resources to a stream that is now not globally
available. All for the little log with very short date lifetime that any
half decent hot/cold partitioning algorithm in the SSD should be able to
detect.

2019-05-20 18:10:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] Extend write-hint framework, and add write-hint for Ext4 journal

On Fri, May 17, 2019 at 11:01:55AM +0530, kanchan wrote:
> Sorry but can you please elaborate the issue? I do not get what is being
> statically allocated which was globally available earlier.
> If you are referring to nvme driver, available streams at subsystem level
> are being reflected for all namespaces. This is same as earlier.
> There is no attempt to explicitly allocate (using dir-receive) or reserve
> streams for any namespace.
> Streams will continue to get allocated/released implicitly as and when
> writes (with stream id) arrive.

We have made a concious decision that we do not want to expose streams
as an awkward not fish not flesh interface, but instead life time hints.

I see no reason to change from and burden the whole streams complexity
on other in-kernel callers.

2019-05-21 08:25:52

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] Extend write-hint framework, and add write-hint for Ext4 journal

On Mon 20-05-19 07:27:19, 'Christoph Hellwig' wrote:
> On Fri, May 17, 2019 at 11:01:55AM +0530, kanchan wrote:
> > Sorry but can you please elaborate the issue? I do not get what is being
> > statically allocated which was globally available earlier.
> > If you are referring to nvme driver, available streams at subsystem level
> > are being reflected for all namespaces. This is same as earlier.
> > There is no attempt to explicitly allocate (using dir-receive) or reserve
> > streams for any namespace.
> > Streams will continue to get allocated/released implicitly as and when
> > writes (with stream id) arrive.
>
> We have made a concious decision that we do not want to expose streams
> as an awkward not fish not flesh interface, but instead life time hints.
>
> I see no reason to change from and burden the whole streams complexity
> on other in-kernel callers.

I'm not following the "streams complexity" you talk about. At least the
usecase Kanchan speaks about here is pretty simple for the filesystem -
tagging journal writes with special stream id. I agree that something like
dynamically allocating available stream ids to different purposes is
complex and has uncertain value but this "static stream id for particular
purpose" looks simple and sensible to me and Kanchan has shown significant
performance benefits for some drives. After all you can just think about it
like RWH_WRITE_LIFE_JOURNAL type of hint available for the kernel...

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-05-21 08:29:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] Extend write-hint framework, and add write-hint for Ext4 journal

On Tue, May 21, 2019 at 10:25:28AM +0200, Jan Kara wrote:
> performance benefits for some drives. After all you can just think about it
> like RWH_WRITE_LIFE_JOURNAL type of hint available for the kernel...

Except that it actuallys adds a parallel insfrastructure. A
RWH_WRITE_LIFE_JOURNAL would be much more palatable, but someone needs
to explain how that is:

a) different from RWH_WRITE_LIFE_SHORT
b) would not apply to a log/journal maintained in userspace that works
exactly the same