Received: by 10.223.164.202 with SMTP id h10csp3277467wrb; Sun, 19 Nov 2017 18:13:30 -0800 (PST) X-Google-Smtp-Source: AGs4zMa2Sd4S9MfPX41sEv8+dMYly2zXGoci7w1pV3kbxcU0yUg6VFdlByGD9bTp6RZXNrWKmPlj X-Received: by 10.84.130.33 with SMTP id 30mr12629201plc.161.1511144009980; Sun, 19 Nov 2017 18:13:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511144009; cv=none; d=google.com; s=arc-20160816; b=UfSjOFcYVVSsnTiINTbh8sTfDMV3gjpINtIgaVOk5na3n8jBbyeFL+l7ymTBxpEgHz i2QN7APdhB5gWElA2W5UReVbldblt4B6qjttkj1Wf6tM76DtSJZJd/JEp6lsiQzbU4/8 7SFYDjKujFfKUH+hg5U5vYIE254o5aAMDzx0uNP1ZLJeoOHkIN9mdYSiOq4XMqL92Xdr w3chUV1QAfIIwCQjN8MDVMOxxbmygJIegQ+6JtEYR9dZBVib43ZbU26DDs3O12OrGPrb M1WgRUVTeen8Gl3KQR6I8ktN9CFpQNoED99QpzWygNfxHpojKIT/rh3GZwZZQoRUFcAb io0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:subject:cc:to:mime-version:user-agent:from:date :message-id:arc-authentication-results; bh=LCfuxMJUuDm0VT2O3iqIVnLdTMUAcp+vFm00zz5xQZY=; b=wjh/62qD3HJ0bujcWzknp19Xq8iNr8xVHj5t1K3m5bEiM4a7HKri+WGOJexDubd+qL wyOeUHvyiard+VkVIFDl5Qwbadz5fmCr/Nwo5umJKmHJjj4pOy8OGwGlR6btPS8LVrxz Bjt+9gDX+Bkzg3ck6z4xwhrgKAqk0/czNK5ZyOpyopitlejVdwd+Py/u+cVVl8kedHTc VVGLC5M6qxzFhmJj7MG8/+uSaMBwpkZNoaZ6vFCCtEvYJnb62rg7o0/Sm1yw1cnQD1yb QxjMDuN4HlOJ8nf42DhFFYMiQMpZ59WhZb08i/2qEWk7dNj23M4Pq63PG5JrXJUtw1wA +hfQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f19si6936419plr.675.2017.11.19.18.13.17; Sun, 19 Nov 2017 18:13:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751277AbdKTCMi (ORCPT + 72 others); Sun, 19 Nov 2017 21:12:38 -0500 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:46477 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751043AbdKTCMg (ORCPT ); Sun, 19 Nov 2017 21:12:36 -0500 Received: from unknown (HELO lgemrelse7q.lge.com) (156.147.1.151) by 156.147.23.51 with ESMTP; 20 Nov 2017 11:12:34 +0900 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: hyc.lee@gmail.com Received: from unknown (HELO ?192.168.0.11?) (10.177.225.35) by 156.147.1.151 with ESMTP; 20 Nov 2017 11:12:33 +0900 X-Original-SENDERIP: 10.177.225.35 X-Original-MAILFROM: hyc.lee@gmail.com Message-ID: <5A123A10.1050801@gmail.com> Date: Mon, 20 Nov 2017 11:12:32 +0900 From: Hyunchul Lee User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Jaegeuk Kim CC: Chao Yu , linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, kernel-team@lge.com, Hyunchul Lee , Chao Yu , linux-block@vger.kernel.org, axboe@kernel.dk, hch@infradead.org Subject: Re: [RFC PATCH 0/2] apply write hints to select the type of segments References: <5A08F6CA.6040507@gmail.com> <5bd3945c-16f8-a718-a140-44589ceb490a@huawei.com> <5A090283.60206@gmail.com> <20171114042024.GA13008@jaegeuk-macbookpro.roam.corp.google.com> <3dd3f540-f5e5-2d58-99ef-6abf18bad923@huawei.com> <20171115162730.GC33528@jaegeuk-macbookpro.roam.corp.google.com> <5A0CE25A.9090506@gmail.com> <533fb91e-21af-513e-f587-619498b1f848@huawei.com> <20171116035858.GA73172@jaegeuk-macbookpro.roam.corp.google.com> <5A0D15A9.3090706@gmail.com> <20171117185338.GB77642@jaegeuk-macbookpro.roam.corp.google.com> In-Reply-To: <20171117185338.GB77642@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2017 03:53 AM, Jaegeuk Kim wrote: > ... >>>>>>>>>>>>>>>>> From: Hyunchul Lee >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time of the data >>>>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints patch >>>>>>>>>>>>>>>>> decreased writes in NAND by 25%. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This hints help F2FS to determine the followings. >>>>>>>>>>>>>>>>> 1) the segment types where the data will be written. >>>>>>>>>>>>>>>>> 2) the hints that will be passed down to devices with the data of segments. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This patch set implements the first mapping from write hints to segment types >>>>>>>>>>>>>>>>> as shown below. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> hints segment type >>>>>>>>>>>>>>>>> ----- ------------ >>>>>>>>>>>>>>>>> WRITE_LIFE_SHORT CURSEG_COLD_DATA >>>>>>>>>>>>>>>>> WRITE_LIFE_EXTREME CURSEG_HOT_DATA >>>>>>>>>>>>>>>>> others CURSEG_WARM_DATA >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, And >>>>>>>>>>>>>>>>> hints are not applied in in-place update. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is existing? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am afraid that this makes side effects. for example, this could cause >>>>>>>>>>>>>>> out-of-place updates even when there are not enough free segments. >>>>>>>>>>>>>>> I can write the patch that handles these situations. But I wonder >>>>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can be disabled. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects filesystem >>>>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it will be okay >>>>>>>>>>>>>> to not consider it. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Before the second mapping is implemented, write hints are not passed down >>>>>>>>>>>>>>>>> to devices. Because it is better that the data of a segment have the same >>>>>>>>>>>>>>>>> hint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35 >>>>>>>>>>>>>>>>> [2]: https://lwn.net/Articles/726477/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could you write a patch to support passing write hint to block layer for >>>>>>>>>>>>>>>> buffered writes as below commit: >>>>>>>>>>>>>>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered writes") >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sure I will. I wrote it already ;) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cool, ;) >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think that datas from the same segment should be passed down with the same >>>>>>>>>>>>>>> hint, and the following mapping is reasonable. I wonder what is your opinion >>>>>>>>>>>>>>> about it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> segment type hints >>>>>>>>>>>>>>> ------------ ----- >>>>>>>>>>>>>>> CURSEG_COLD_DATA WRITE_LIFE_EXTREME >>>>>>>>>>>>>>> CURSEG_HOT_DATA WRITE_LIFE_SHORT >>>>>>>>>>>>>>> CURSEG_COLD_NODE WRITE_LIFE_NORMAL >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h? >>>>>>>>>>>>>> >>>>>>>>>>>>>>> CURSEG_HOT_NODE WRITE_LIFE_MEDIUM >>>>>>>>>>>>>> >>>>>>>>>>>>>> As I know, in scenario of cell phone, data of meta_inode is hottest, then hot >>>>>>>>>>>>>> data, warm node, and cold node should be coldest. So I suggested we can define >>>>>>>>>>>>>> as below: >>>>>>>>>>>>>> >>>>>>>>>>>>>> META_DATA WRITE_LIFE_SHORT >>>>>>>>>>>>>> HOT_DATA & WARM_NODE WRITE_LIFE_MEDIUM >>>>>>>>>>>>>> HOT_NODE & WARM_DATA WRITE_LIFE_LONG >>>>>>>>>>>>>> COLD_NODE & COLD_DATA WRITE_LIFE_EXTREME >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I agree, But I am not sure that assigning the same hint to a node and data >>>>>>>>>>>>> segment is good. Because NVMe is likely to write them in the same erase >>>>>>>>>>>>> block if they have the same hint. >>>>>>>>>>>> >>>>>>>>>>>> If we do not give the hint, they can still be written to the same erase block, >>>>>>>>>> >>>>>>>>>> I mean it's possible to write them to the same erase block. :) >>>>>>>>>> >>>>>>>>>>>> right? it will not be worse? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If the hint is not given, I think that they could be written to >>>>>>>>>>> the same erase block, or not. But if we give the same hint, they are written >>>>>>>>>>> to the same block. >>>>>>>>>> >>>>>>>>>> IMO, Only if underlying device can support more hint type or opened channels, >>>>>>>>>> and actual temperature of data segment and node segment is quite different, we >>>>>>>>>> can separate them. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Okay, If Jaegeuk Kim agrees with this, I will submit the patch that >>>>>>>>> implements your proposed mapping. >>>>>>>> >>>>>>>> How about this? We'd better to split data and node blocks as much as possible. >>>>>>>> >>>>>>>> segment type hints >>>>>>>> ------------ ----- >>>>>>>> COLD_NODE & COLD_DATA WRITE_LIFE_NONE >>>>>>> >>>>>>> WRITE_LIFE_NONE means there is no hints about write life time. >>>>>>> >>>>>>> Shouldn't we define COLD_NODE & COLD_DATA as WRITE_LIFE_EXTERME? >>>>>> >>>>>> The assumption would be to split different types of blocks by flash firmware, >>>>>> so I think we can use WRITE_LIFE_NONE as a type as well. >>>>>> >>>>> >>>>> WRITE_LIFE_NONE means that no stream id is specified. It equals WRITE_LIFE_NOT_SET. >>>> >>>> Rgith, I just saw nvme implementation: >>>> >>>> nvme_assign_write_stream >>>> >>>> enum rw_hint streamid = req->write_hint; >>>> >>>> if (streamid == WRITE_LIFE_NOT_SET || streamid == WRITE_LIFE_NONE) >>>> streamid = 0; >>>> else { >>>> streamid--; >>>> ... >>>> >>>>> So I think that we can define WARM_DATA as WRITE_LIFE_NONE, and >>>>> COLD_NODE & COLD_DATA as WRITE_LIFE_EXTREME. >>> >>> What's the point? >>> >>> segment type hints streamid >>> ------------- ----- ------- >>> COLD_NODE & COLD_DATA WRITE_LIFE_NONE 0 >>> WARM_DATA WRITE_LIFE_EXTERME 4 >>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG 3 >>> HOT_DATA WRITE_LIFE_MEDIUM 2 >>> META_DATA WRITE_LIFE_SHORT 1 >>> >>> So, I don't think something is wrong. Again, I don't care about its hotness >>> given to the naming, but do care how to split different types of blocks with >>> different stream ids. Exceptions would be giving _SHORT or _MEDIUM which are >>> likely to be latency-critical, since I guess firmware may be able to store them >>> into SLC buffer. >>> >>> Am I missing that _NONE has another meaning? >>> >> >> What I am worried about is that datas with no hint have WRITE_LIFE_NOT_SET(id 0). >> If block devices have swap partitions and anothor file systems, cold datas could >> be mixed with datas from that. Does this seems way too much? > > That seems like how to distinguish write_hints across multiple partitions? > What I intend is that because there could be another partitions and the default stream ID is 0, WRITE_LIFE_EXTREAM could be better than WRITE_LIFE_NONE for cold datas. Thanks. >> And I think that stream id 0 means disabling stream directives. >> Becasue NVME_RW_DTYPE_STREAMS is clear. > > Then, I guess SSD FW will just handle 5 stream IDs including disabled 0. > > Thanks, > From 1584368625100669897@xxx Sat Nov 18 02:21:05 +0000 2017 X-GM-THRID: 1583566556242695472 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread