Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756113AbaFZJmR (ORCPT ); Thu, 26 Jun 2014 05:42:17 -0400 Received: from mail-pb0-f44.google.com ([209.85.160.44]:33081 "EHLO mail-pb0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753358AbaFZJmQ (ORCPT ); Thu, 26 Jun 2014 05:42:16 -0400 From: Ming Lei To: Jens Axboe , linux-kernel@vger.kernel.org Cc: Rusty Russell , linux-api@vger.kernel.org, virtualization@lists.linux-foundation.org, "Michael S. Tsirkin" , Stefan Hajnoczi , Paolo Bonzini Subject: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk Date: Thu, 26 Jun 2014 17:41:46 +0800 Message-Id: <1403775708-22244-1-git-send-email-ming.lei@canonical.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, These patches try to support multi virtual queues(multi-vq) in one virtio-blk device, and maps each virtual queue(vq) to blk-mq's hardware queue. With this approach, both scalability and performance on virtio-blk device can get improved. For verifying the improvement, I implements virtio-blk multi-vq over qemu's dataplane feature, and both handling host notification from each vq and processing host I/O are still kept in the per-device iothread context, the change is based on qemu v2.0.0 release, and can be accessed from below tree: git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1 For enabling the multi-vq feature, 'num_queues=N' need to be added into '-device virtio-blk-pci ...' of qemu command line, and suggest to pass 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature depends on x-data-plane. Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to verify the improvement. I just create a small quadcore VM and run fio inside the VM, and num_queues of the virtio-blk device is set as 2, but looks the improvement is still obvious. The host is 2 sockets, 8cores(16threads) server. 1), about scalability - jobs = 2, thoughput: +33% - jobs = 4, thoughput: +100% 2), about top thoughput: +39% So in my test, even for a quad-core VM, if the virtqueue number is increased from 1 to 2, both scalability and performance can get improved a lot. In above qemu implementation of virtio-blk-mq device, only one IOthread handles requests from all vqs, and the above throughput data has been very close to same fio test in host side with single job. So more improvement should be observed once more IOthreads are used for handling requests from multi vqs. TODO: - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask V3: - fix use-after-free on vq->name reported by Michael V2: (suggestions from Michael and Dave Chinner) - allocate virtqueues' pointers dynamically - make sure the per-queue spinlock isn't kept in same cache line - make each queue's name different V1: - remove RFC since no one objects - add '__u8 unused' for pending as suggested by Rusty - use virtio_cread_feature() directly, suggested by Rusty Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/