Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3804272ybi; Mon, 29 Jul 2019 12:58:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqwsaPWIomUDrFDzVEVJtrqjo9GxmSkkTJNwfSf6bmsPD3YQZ7b+Yk1Dg2EE1/ski4kYkscP X-Received: by 2002:a17:902:2808:: with SMTP id e8mr107458559plb.317.1564430314805; Mon, 29 Jul 2019 12:58:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564430314; cv=none; d=google.com; s=arc-20160816; b=0CqpEF8B+temAgjToT7zuRDuYDCduI9J7YcpDQ5+M58FdpIyh3uOJPROOqf9oZnoWq Ka6sZdWMvu9injJrqhYlACRN3Lnw8CEfHiqlkH2KKC5J7a2IBG6XAJOfodkpmrl1jR31 V5FSEwJFAAmwIg03Cad6Y6U/FN2H8XSiHYMMbiIsjM155k/VYt+OMGNTW/W6+dFyaGdx 33peXwB+BZ10gybvwPyufC/GJSbAvnu7Wzk1Gvt7BPlA3N1XUohzrBYILvKfPCXNbn2d 29UdUGQm3+YpBPd5wfVJhuUzQakArjVt2Sq2vK/yhTt5uADuYqcKaRBtzbK/1UTGwCGx I8AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=4//hvK/WTeBwOivt4rXG73c4ygcNkiSSav1ib88wr5E=; b=yLAwzhPmbUpKvN32dNTsnciAwd25bzno4FI+knh0VQ3z+fCSGNgJnozRoorfdmkdhi bmCiAMm7eUC0BtOj6oRXkBPX8+c1eKy4GU8Iies/89m+nU6aRKG52EutSBniOuV65qfd b1M/bVgU1loiTW6uUk6zxsAKk5MTuMNg1BcLlkRLTdcoX7Mdo6v5E6Kcc/v0tTQgFCnu SdEkXIM9/HkPr0P3GpiZ0hhNSinRFdfcEnsvUahDnn4GLAnHhEKTPqhHw4+FLAY5cbDA /tfO+t8IUl4WSlIptsxBVnsgzFOcxQ3gaaDLoyigNfKoMKCHhuqWdszPo9n/dMYoRYVX 0lzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=M3tuxRCY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s62si24317309pjc.75.2019.07.29.12.58.19; Mon, 29 Jul 2019 12:58:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=M3tuxRCY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404291AbfG2Tz7 (ORCPT + 99 others); Mon, 29 Jul 2019 15:55:59 -0400 Received: from mail.kernel.org ([198.145.29.99]:48826 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404250AbfG2Tz5 (ORCPT ); Mon, 29 Jul 2019 15:55:57 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E2A02204EC; Mon, 29 Jul 2019 19:55:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1564430156; bh=wh1zoquakyM/0TPKZ8B93mtc776SuJUVqVduk7mbb1c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M3tuxRCYanLZzQp/zqBPSh71Iix5KaghalvfjpR7lHpWjc+L+t2Ddss9VRBz/z4h/ LYS2TT9P7F7gJ/kvZ6TtZt8ocy2e7GPkK7ohWQQvqz5xHPgZV5J+rMem+t/vaMw+VA NnFUy4vcMYXqdtcHWbnHXHUpIYfuQ6wAGJW+EM2Q= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Hrvoje Zeba , Jens Axboe Subject: [PATCH 5.2 215/215] io_uring: dont use iov_iter_advance() for fixed buffers Date: Mon, 29 Jul 2019 21:23:31 +0200 Message-Id: <20190729190817.178599978@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190729190739.971253303@linuxfoundation.org> References: <20190729190739.971253303@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jens Axboe commit bd11b3a391e3df6fa958facbe4b3f9f4cca9bd49 upstream. Hrvoje reports that when a large fixed buffer is registered and IO is being done to the latter pages of said buffer, the IO submission time is much worse: reading to the start of the buffer: 11238 ns reading to the end of the buffer: 1039879 ns In fact, it's worse by two orders of magnitude. The reason for that is how io_uring figures out how to setup the iov_iter. We point the iter at the first bvec, and then use iov_iter_advance() to fast-forward to the offset within that buffer we need. However, that is abysmally slow, as it entails iterating the bvecs that we setup as part of buffer registration. There's really no need to use this generic helper, as we know it's a BVEC type iterator, and we also know that each bvec is PAGE_SIZE in size, apart from possibly the first and last. Hence we can just use a shift on the offset to find the right index, and then adjust the iov_iter appropriately. After this fix, the timings are: reading to the start of the buffer: 10135 ns reading to the end of the buffer: 1377 ns Or about an 755x improvement for the tail page. Reported-by: Hrvoje Zeba Tested-by: Hrvoje Zeba Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- fs/io_uring.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1001,8 +1001,43 @@ static int io_import_fixed(struct io_rin */ offset = buf_addr - imu->ubuf; iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len); - if (offset) - iov_iter_advance(iter, offset); + + if (offset) { + /* + * Don't use iov_iter_advance() here, as it's really slow for + * using the latter parts of a big fixed buffer - it iterates + * over each segment manually. We can cheat a bit here, because + * we know that: + * + * 1) it's a BVEC iter, we set it up + * 2) all bvecs are PAGE_SIZE in size, except potentially the + * first and last bvec + * + * So just find our index, and adjust the iterator afterwards. + * If the offset is within the first bvec (or the whole first + * bvec, just use iov_iter_advance(). This makes it easier + * since we can just skip the first segment, which may not + * be PAGE_SIZE aligned. + */ + const struct bio_vec *bvec = imu->bvec; + + if (offset <= bvec->bv_len) { + iov_iter_advance(iter, offset); + } else { + unsigned long seg_skip; + + /* skip first vec */ + offset -= bvec->bv_len; + seg_skip = 1 + (offset >> PAGE_SHIFT); + + iter->bvec = bvec + seg_skip; + iter->nr_segs -= seg_skip; + iter->count -= (seg_skip << PAGE_SHIFT); + iter->iov_offset = offset & ~PAGE_MASK; + if (iter->iov_offset) + iter->count -= iter->iov_offset; + } + } /* don't drop a reference to these pages */ iter->type |= ITER_BVEC_FLAG_NO_REF;