Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp2130732imj; Mon, 18 Feb 2019 00:06:21 -0800 (PST) X-Google-Smtp-Source: AHgI3IaLFuOQgPP33EvAdIioXzlEG2KF8W60HozXqLF6JcGBSltPWbuXO1v9OwyjL2iDqQRCj0dO X-Received: by 2002:a62:ab04:: with SMTP id p4mr22971304pff.142.1550477181866; Mon, 18 Feb 2019 00:06:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550477181; cv=none; d=google.com; s=arc-20160816; b=K0abKABQQ6Jg5El0zryr7SAPW8C2fhZaJo/dtFf724UNI15ugZB/atHmKn3uXviC88 12dpVYydLZ87NuCvXXosNeYEGO9wTtc8RCZaGvGnIx3h1xgwUZDMTzUp92PTxqnemW8E VsHu4cC7dneC05Lu1aYQMJaYVb3rrmPcgxbR4uEi9ItWEZ+SpAhjVhfQAtY8FquZ1y0T VYPkKnIPkdasP4adgeHq0/02AEUo4khh3Rn9vAuQk/Jc+eC2CVsacsRJKmJ9a3sFo9Hx g++roCNMwJmosW28VDTgfv1VkRDWBykvq0Q99+Da46bednpYy3zDPs1yL/3/hv8hul2D srLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=drkJz+TDHq8btJFEo03IIxszTU6UhEPx3LnkIUBJSlg=; b=D5RsthVMJQJtXqXL8MxLXOgq0VvGFo1wqeQ2pUIxOcfOQSdHGKjdpzJwzXPpu2gjiD zHE/mDWb6e3kI/Sy/VAtwlJDMT9wzbt/a8YAiLyEBWc8QNN5jw6gzHiz3LE5j/lZcq/9 3PO6DJNh5XexEwMH1ETxmBkCSIbZxH4sByNRVosWthlW+T1Mc2Iep28thkROgOPCpbga gMeFk0YqLi9zjoHBhDGBPdBUNWd/Bn19SuEM7wf6jVn8vpm/MwkTJ6npEshUELY4pbg9 mzmE3+Ww0+IRFcID7cBt9lTe18w+vazvTUK2zYdjwNFd7FipHwhDlVGmLqZlnl3RbhV5 fDMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 66si12747125pfk.209.2019.02.18.00.06.05; Mon, 18 Feb 2019 00:06:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728906AbfBRHtl (ORCPT + 99 others); Mon, 18 Feb 2019 02:49:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53296 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725881AbfBRHtk (ORCPT ); Mon, 18 Feb 2019 02:49:40 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 50FDE36887; Mon, 18 Feb 2019 07:49:39 +0000 (UTC) Received: from ming.t460p (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 01D95608C7; Mon, 18 Feb 2019 07:49:12 +0000 (UTC) Date: Mon, 18 Feb 2019 15:49:08 +0800 From: Ming Lei To: Jens Axboe Cc: Bart Van Assche , Mike Snitzer , linux-mm@kvack.org, dm-devel@redhat.com, Christoph Hellwig , Sagi Grimberg , "Darrick J . Wong" , Omar Sandoval , cluster-devel@redhat.com, linux-ext4@vger.kernel.org, Kent Overstreet , Boaz Harrosh , Gao Xiang , Coly Li , linux-raid@vger.kernel.org, Bob Peterson , linux-bcache@vger.kernel.org, Alexander Viro , Dave Chinner , David Sterba , linux-block@vger.kernel.org, Theodore Ts'o , linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [dm-devel] [PATCH V15 00/18] block: support multi-page bvec Message-ID: <20190218074907.GA806@ming.t460p> References: <20190215111324.30129-1-ming.lei@redhat.com> <1550250855.31902.102.camel@acm.org> <18c711a9-ca13-885d-43cd-4d48e683a6a2@kernel.dk> <20190217131332.GC7296@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190217131332.GC7296@ming.t460p> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 18 Feb 2019 07:49:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 17, 2019 at 09:13:32PM +0800, Ming Lei wrote: > On Fri, Feb 15, 2019 at 10:59:47AM -0700, Jens Axboe wrote: > > On 2/15/19 10:14 AM, Bart Van Assche wrote: > > > On Fri, 2019-02-15 at 08:49 -0700, Jens Axboe wrote: > > >> On 2/15/19 4:13 AM, Ming Lei wrote: > > >>> This patchset brings multi-page bvec into block layer: > > >> > > >> Applied, thanks Ming. Let's hope it sticks! > > > > > > Hi Jens and Ming, > > > > > > Test nvmeof-mp/002 fails with Jens' for-next branch from this morning. > > > I have not yet tried to figure out which patch introduced the failure. > > > Anyway, this is what I see in the kernel log for test nvmeof-mp/002: > > > > > > [ 475.611363] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 > > > [ 475.621188] #PF error: [normal kernel read fault] > > > [ 475.623148] PGD 0 P4D 0 > > > [ 475.624737] Oops: 0000 [#1] PREEMPT SMP KASAN > > > [ 475.626628] CPU: 1 PID: 277 Comm: kworker/1:1H Tainted: G B 5.0.0-rc6-dbg+ #1 > > > [ 475.630232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > [ 475.633855] Workqueue: kblockd blk_mq_requeue_work > > > [ 475.635777] RIP: 0010:__blk_recalc_rq_segments+0xbe/0x590 > > > [ 475.670948] Call Trace: > > > [ 475.693515] blk_recalc_rq_segments+0x2f/0x50 > > > [ 475.695081] blk_insert_cloned_request+0xbb/0x1c0 > > > [ 475.701142] dm_mq_queue_rq+0x3d1/0x770 > > > [ 475.707225] blk_mq_dispatch_rq_list+0x5fc/0xb10 > > > [ 475.717137] blk_mq_sched_dispatch_requests+0x256/0x300 > > > [ 475.721767] __blk_mq_run_hw_queue+0xd6/0x180 > > > [ 475.725920] __blk_mq_delay_run_hw_queue+0x25c/0x290 > > > [ 475.727480] blk_mq_run_hw_queue+0x119/0x1b0 > > > [ 475.732019] blk_mq_run_hw_queues+0x7b/0xa0 > > > [ 475.733468] blk_mq_requeue_work+0x2cb/0x300 > > > [ 475.736473] process_one_work+0x4f1/0xa40 > > > [ 475.739424] worker_thread+0x67/0x5b0 > > > [ 475.741751] kthread+0x1cf/0x1f0 > > > [ 475.746034] ret_from_fork+0x24/0x30 > > > > > > (gdb) list *(__blk_recalc_rq_segments+0xbe) > > > 0xffffffff816a152e is in __blk_recalc_rq_segments (block/blk-merge.c:366). > > > 361 struct bio *bio) > > > 362 { > > > 363 struct bio_vec bv, bvprv = { NULL }; > > > 364 int prev = 0; > > > 365 unsigned int seg_size, nr_phys_segs; > > > 366 unsigned front_seg_size = bio->bi_seg_front_size; > > > 367 struct bio *fbio, *bbio; > > > 368 struct bvec_iter iter; > > > 369 > > > 370 if (!bio) > > > > Just ran a few tests, and it also seems to cause about a 5% regression > > in per-core IOPS throughput. Prior to this work, I could get 1620K 4k > > rand read IOPS out of core, now I'm at ~1535K. The cycler stealer seems > > to be blk_queue_split() and blk_rq_map_sg(). > > Could you share us your test setting? > > I will run null_blk first and see if it can be reproduced. Looks this performance drop isn't reproduced on null_blk with the following setting by me: - modprobe null_blk nr_devices=4 submit_queues=48 - test machine : dual socket, two NUMA nodes, 24cores/socket - fio script: fio --direct=1 --size=128G --bsrange=4k-4k --runtime=40 --numjobs=48 --ioengine=libaio --iodepth=64 --group_reporting=1 --filename=/dev/nullb0 --name=randread --rw=randread result: 10.7M IOPS(base kernel), 10.6M IOPS(patched kernel) And if 'bs' is increased to 256k, 512k, 1024k, IOPS improvement can be ~8% with multi-page bvec patches in above test. BTW, there isn't cost added to bio_for_each_bvec(), so blk_queue_split() and blk_rq_map_sg() should be fine. However, bio_for_each_segment_all() may not be quick as before. Thanks, Ming