Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1098271yba; Tue, 2 Apr 2019 02:09:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqwbN+Bjb1Scr2K2Pg/7BcJ3vGlB3Yi6gAHJLOlsfcQ8jI5p1cIRojSxePfrMbHgkV+GNSME X-Received: by 2002:a17:902:eb84:: with SMTP id cx4mr60673459plb.52.1554196198233; Tue, 02 Apr 2019 02:09:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554196198; cv=none; d=google.com; s=arc-20160816; b=GaJ0Yy8WbRHwrpZCuHMy2L1PwiXT5tNpEoCNmjLRdWg6a49ZuvxjDY5W9SZIWgiBF5 Hnzmq9PN9wpNTXagPwbR8belXB1TNV84unCHRce7eezVR6veWurkUG2feu7ioiyGTHLZ 7SNE1QRxrwhg4jz+DGap4h28/fIDUUCUfbtbXX0pAF6lk0phQ8chOiKYeG2JuJH/Nf07 68A8uqH7vnkhjyBXPytdY09IcVS+kGIiD07P4OButBcoArCrUH6dTxcaIryXpz8csriQ clhVb3gnnHU0Fik+uNarIeDM6C5qQYbrk8jQcPTWkjQC+hux88mq0YEHoCr/6aDlSP1C PQfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=1QeRS08auigEyjb1k0k2ItOgJ/Dkugw/Lz5jFW3giI4=; b=Za66k4FcVnSiLB84cdhyyIeoEdl6Ts5eh0ST4EzvGfjP1brNMWe/Dph+DATBmskM/k VYohaHjNseQbWxuo0Yw68ednWbs+Vt6fhMJ1jK0ph5PfKrTB42PijJEpCJMQ4BymLdZY raeyhX3IBtkbPpVaQhBDgOupJZqKt3Lr/LZY/T8Kxt+s3zYsIHnBYzTeVbbErIog/Xo6 KH0VBrAQKeOHhabIUxyK+F6z9yYU1OvuUj34PSUfoUW0uK+zRB/gCDunFzgwcjkPLZF2 EusJGMYamFzTMgKu7d+J9QrresIC9TPZYAVDuUTv6tH08GfS32PNeJqHBG/WXEa1h/ol 4jcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m5si10528146pll.132.2019.04.02.02.09.42; Tue, 02 Apr 2019 02:09:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729854AbfDBJHK (ORCPT + 99 others); Tue, 2 Apr 2019 05:07:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:44336 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726790AbfDBJHK (ORCPT ); Tue, 2 Apr 2019 05:07:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 93E16AC4C; Tue, 2 Apr 2019 09:07:07 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id ECC501E42C7; Tue, 2 Apr 2019 11:07:06 +0200 (CEST) Date: Tue, 2 Apr 2019 11:07:06 +0200 From: Jan Kara To: Andreas Dilger Cc: Kanchan Joshi , open list , linux-block , linux-nvme@lists.infradead.org, linux-fsdevel , linux-ext4@vger.kernel.org, axboe@fb.com, prakash.v@samsung.com, anshul@samsung.com, joshiiitr@gmail.com Subject: Re: [PATCH v3 7/7] fs/ext4,jbd2: add support for passing write-hint with journal Message-ID: <20190402090706.GD12133@quack2.suse.cz> References: <1553846032-4451-1-git-send-email-joshi.k@samsung.com> <1553846032-4451-8-git-send-email-joshi.k@samsung.com> <4B0B5E8B-3B30-4CF6-AA9D-D1294542D65D@dilger.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B0B5E8B-3B30-4CF6-AA9D-D1294542D65D@dilger.ca> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 30-03-19 11:49:54, Andreas Dilger wrote: > On Mar 29, 2019, at 1:53 AM, Kanchan Joshi wrote: > > > > For NAND based SSDs, mixing of data with different life-time reduces > > efficiency of internal garbage-collection. During FS operations, series > > of journal updates will follow/precede series of data/meta updates, causing > > intermixing inside SSD. By passing a write-hint with journal, its write > > can be isolated from other data/meta writes, leading to endurance/performance > > benefit on SSD. > > > > This patch introduces "j_writehint" member in JBD2 journal, using which > > Ext4 specifies write-hint (as SHORT) for journal > > The comment here says the "WRITE_LIFE_SHORT" hint is used for the journal, > but the code uses WRITE_LIFE_KERN_MIN. However, it seems that "MIN" will > be mapped to "NONE" if it exceeds the number of streams available in the > underlying device. It would be better to use "SHORT" if there are not > enough streams available. > > It should call blk_queue_stream_limits() to see if there are extra stream > IDs available, and fall back to WRITE_LIFE_SHORT if not. I disagree. I'd first keep the behavior implemented in this patch to keep things simple. Later if we decide more smarts are needed when SSDs don't have enough hints available, we can always add them. But this patch either keeps the current behavior (i.e., no hint) or improves the situation by providing a special hint. So it is a clear win. I'm not so convinced using WRITE_LIFE_SHORT is always a win when userspace's idea of "short" is different from the kernel's idea of "short"... Honza > > Cheers, Andreas > > > Signed-off-by: Kanchan Joshi > > --- > > fs/ext4/ext4_jbd2.h | 1 + > > fs/ext4/super.c | 2 ++ > > fs/jbd2/commit.c | 11 +++++++---- > > fs/jbd2/journal.c | 3 ++- > > fs/jbd2/revoke.c | 3 ++- > > include/linux/jbd2.h | 8 ++++++++ > > 6 files changed, 22 insertions(+), 6 deletions(-) > > > > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h > > index 15b6dd7..b589ca4 100644 > > --- a/fs/ext4/ext4_jbd2.h > > +++ b/fs/ext4/ext4_jbd2.h > > @@ -16,6 +16,7 @@ > > #include > > #include "ext4.h" > > > > +#define EXT4_JOURNAL_WRITE_HINT (WRITE_LIFE_KERN_MIN) > > #define EXT4_JOURNAL(inode) (EXT4_SB((inode)->i_sb)->s_journal) > > > > /* Define the number of blocks we need to account to a transaction to > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > > index fb12d3c..9c2c73e 100644 > > --- a/fs/ext4/super.c > > +++ b/fs/ext4/super.c > > @@ -4289,6 +4289,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > > > > set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); > > > > + sbi->s_journal->j_writehint = EXT4_JOURNAL_WRITE_HINT; > > + > > sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; > > > > no_journal: > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c > > index 2eb55c3..6da4c28 100644 > > --- a/fs/jbd2/commit.c > > +++ b/fs/jbd2/commit.c > > @@ -153,10 +153,12 @@ static int journal_submit_commit_record(journal_t *journal, > > > > if (journal->j_flags & JBD2_BARRIER && > > !jbd2_has_feature_async_commit(journal)) > > - ret = submit_bh(REQ_OP_WRITE, > > - REQ_SYNC | REQ_PREFLUSH | REQ_FUA, bh); > > + ret = submit_bh_write_hint(REQ_OP_WRITE, > > + REQ_SYNC | REQ_PREFLUSH | REQ_FUA, bh, > > + journal->j_writehint); > > else > > - ret = submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); > > + ret = submit_bh_write_hint(REQ_OP_WRITE, REQ_SYNC, bh, > > + journal->j_writehint); > > > > *cbh = bh; > > return ret; > > @@ -711,7 +713,8 @@ void jbd2_journal_commit_transaction(journal_t *journal) > > clear_buffer_dirty(bh); > > set_buffer_uptodate(bh); > > bh->b_end_io = journal_end_buffer_io_sync; > > - submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); > > + submit_bh_write_hint(REQ_OP_WRITE, REQ_SYNC, > > + bh, journal->j_writehint); > > } > > cond_resched(); > > stats.run.rs_blocks_logged += bufs; > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c > > index 8ef6b6d..804dc2c 100644 > > --- a/fs/jbd2/journal.c > > +++ b/fs/jbd2/journal.c > > @@ -1384,7 +1384,8 @@ static int jbd2_write_superblock(journal_t *journal, int write_flags) > > jbd2_superblock_csum_set(journal, sb); > > get_bh(bh); > > bh->b_end_io = end_buffer_write_sync; > > - ret = submit_bh(REQ_OP_WRITE, write_flags, bh); > > + ret = submit_bh_write_hint(REQ_OP_WRITE, write_flags, bh, > > + journal->j_writehint); > > wait_on_buffer(bh); > > if (buffer_write_io_error(bh)) { > > clear_buffer_write_io_error(bh); > > diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c > > index a1143e5..376b1d8 100644 > > --- a/fs/jbd2/revoke.c > > +++ b/fs/jbd2/revoke.c > > @@ -642,7 +642,8 @@ static void flush_descriptor(journal_t *journal, > > set_buffer_jwrite(descriptor); > > BUFFER_TRACE(descriptor, "write"); > > set_buffer_dirty(descriptor); > > - write_dirty_buffer(descriptor, REQ_SYNC); > > + write_dirty_buffer_with_hint(descriptor, REQ_SYNC, > > + journal->j_writehint); > > } > > #endif > > > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > > index 0f919d5..918f21e 100644 > > --- a/include/linux/jbd2.h > > +++ b/include/linux/jbd2.h > > @@ -1139,6 +1139,14 @@ struct journal_s > > */ > > __u32 j_csum_seed; > > > > + /** > > + * @j_writehint: > > + * > > + * write-hint for journal (set by FS). > > + */ > > + enum rw_hint j_writehint; > > + > > + > > #ifdef CONFIG_DEBUG_LOCK_ALLOC > > /** > > * @j_trans_commit_map: > > -- > > 2.7.4 > > > > > Cheers, Andreas > > > > > -- Jan Kara SUSE Labs, CR