From: Jens Axboe <axboe@fb.com>
To: <linux-kernel@vger.kernel.org>, <linux-block@vger.kernel.org>
CC: <keith.busch@intel.com>, <hch@infradead.org>
Subject: [PATCH 0/5] Initial support for polled IO
Date: Fri, 6 Nov 2015 10:20:18 -0700
Message-ID: <1446830423-25027-1-git-send-email-axboe@fb.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2212
Lines: 60

Hi,

This is a basic framework for supporting polled IO with the block
layer stack, and added support for sync O_DIRECT IO and the NVMe
driver.

There are a few things missing to truly productize this, but it's
very useful for testing. For now, it's a per-device opt-in feature.
To enable it, you echo 1 to /sys/block/<dev>/queue/io_poll.

Some basic test results:

# dd if=/dev/nvme2n1 of=/dev/null bs=4096 iflag=direct count=200k
[...]
838860800 bytes (839 MB) copied, 3.98791 s, 210 MB/s
# echo 1 > /sys/block/nvme2n1/queue/io_poll
# dd if=/dev/nvme2n1 of=/dev/null bs=4096 iflag=direct count=200k
[...]
838860800 bytes (839 MB) copied, 2.15479 s, 389 MB/s

This is a DRAM backed NVMe device, per IO latencies drop from ~19.5usec
to ~10.5usec.

# dd if=/dev/nvme0n1 of=/dev/null bs=4096 iflag=direct count=200k
[...]
838860800 bytes (839 MB) copied, 5.90349 s, 142 MB/s
# echo 1 > /sys/block/nvme0n1/queue/io_poll
# dd if=/dev/nvme0n1 of=/dev/null bs=4096 iflag=direct count=200k
838860800 bytes (839 MB) copied, 3.15852 s, 266 MB/s

Samsung NVMe device, ~28.8 -> ~15.4 usec

# dd if=/dev/nvme1n1 of=/dev/null bs=4096 iflag=direct count=200k
[...]
838860800 bytes (839 MB) copied, 1.78069 s, 471 MB/s
# echo 1 > /sys/block/nvme1n1/queue/io_poll 
# dd if=/dev/nvme1n1 of=/dev/null bs=4096 iflag=direct count=200k
[...]
838860800 bytes (839 MB) copied, 1.31546 s, 638 MB/s

Intel NVMe device, ~8.7usec -> ~6.4usec.

Three different devices, different but notable wins on all of them.
Contrary to intuition, sometimes the slower devices benefit more, since
the slower completion yields a deeper C-state on the processor.

I'd like to get this framework in so we can more easily experiment
with polling. I've got another branch, mq-stats, that wires up a
scalable device IO completion stats collection. We could potentially
use that for enabling smart decisions about when to poll and for how
long. We'll also work on enabling libaio support for this.

Thanks, Jens


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/