Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp6074830yba; Wed, 1 May 2019 05:44:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwueq/Y49T6nPqB3aWawl3f0ZBKM6zzFKmb/XVYIatMvZ6vYqSvZYsgMz5R1QiU7ez0CfJH X-Received: by 2002:a63:8e4b:: with SMTP id k72mr30967470pge.428.1556714651644; Wed, 01 May 2019 05:44:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556714651; cv=none; d=google.com; s=arc-20160816; b=Vb63tDKl9ZzL+qMrGZa3zKD32ORD7oaiy4R2xI1Z5Zg/wT2B3J5rZIHK8iXSU8ULxv mfVDPBVNn6IBu1LMdyi8iWFw3ZYfK2tRnLqBrNKM1X3J8Hvm+aRm63lLQMkIYcLp2uJW XxPXoBdb/2bz2Uj5hu1MYOhtbYohFWL/m6OXppyAGNvaTvj3DVlZ1n+DZ7ELlFrSbUMD b1p5S403xDHdFlaglPeLsk47AMPFk2OZiA/6b8caYRzqr3LtlJbjdxfNmH3d7lhpjKeb 4l6XcCzTD4Mmju7kiX0xIteuEc+1s6BCkp5wT7Aj/cD8I5Y1e27v3OK+AUdzBSDgvAI/ zt/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=d7xrkUfJOvuEcy/MxrLHTxsIlMzizmM/IKOsgVzHrss=; b=0uDiK6J6IXGBT8lV/ChQIRZNqUhKChVzdk9CLAEE+zi6VBOQh2a6vqi7isat7+MzuJ W6Vt36dIojBYsWtAJNiIaPdhNefEotCoKQpGwI45LGaM0sphuzMfxiTQWEJB4rcvyJas Z0YFvZQbjl7QAiKWWIYNjDRKCDN+z22imMrJuHcyFkbz5UXuqxsWmrzNFbLcnBFe9HzP sPDIidOg24/wHYo0iTZlKmeZ2TKGdZ6O3IywztOV1JyCalBO+9BLJmYHUMZAQ/PEYVt2 VT09pCV/s+EGiMP5AbLdYosNMzoT/7M1IrSJY7rX7+V5A55QUUR1nMkeTViyq0QjKD3L Vn9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=2D6ufJKp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12si976365pfa.110.2019.05.01.05.43.55; Wed, 01 May 2019 05:44:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=2D6ufJKp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726435AbfEAMlr (ORCPT + 99 others); Wed, 1 May 2019 08:41:47 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:38592 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726327AbfEAMlr (ORCPT ); Wed, 1 May 2019 08:41:47 -0400 Received: by mail-pf1-f196.google.com with SMTP id 10so8567255pfo.5 for ; Wed, 01 May 2019 05:41:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=d7xrkUfJOvuEcy/MxrLHTxsIlMzizmM/IKOsgVzHrss=; b=2D6ufJKpjQs/C8AEiwpSbiA3m8Ey1zqeO2YTOGzalxj66UrmGjyo5LkL+iea0V0uj1 rXqh2PCCr93pBAoDQk3UQoUV1u7E1uxJDxuX8fRYSGAh9AsRB/LD1Q6xMmIe9Kc4SsUW LPlnVFPB3EM/itW0PbBSPiyYFZhhC279Y4qOmGvJnP+l7KIeafILcW2KUM+0Rmk56SWy 3lDw1GlRHQGOxkRRFfeIQkjZ1nfK6FBeTPQwSxJPChk66u3e5N2EnyW7r8YdZmHg/Zoo GuEHS8pKxD2o8tMKXr/zHOCuYnBrHgDjRRTnJvBQvDVKfAs5ULXfq88R8MeIDTzSPZZa AYVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=d7xrkUfJOvuEcy/MxrLHTxsIlMzizmM/IKOsgVzHrss=; b=hjr0xpMFIaPeZe+Guob0NmmvehxRugGqQ2p1o1rv+xhN6cPQR810yQhl4KoME4Pa4G MsPkU5zCeEvthm9VhXQeGudlrzw97Yp/bFYTGdrLd3o26uF3IkPwisaR48vUnkjMMdra FkgLpv02qFPtCqm+nY5/aTTG2Rmhz6a3RR9gWWmshCQCS283mubT/RlEHVXeLGXuI0dg YY4HuvNZu76PfzTiyVU0E9dd5JPdJz2Ypb5QgViLf8pSEPSDhPXDuBAuF46N+M14ulKD 9lukMG1wnRuEg+dUJkHSVRBUF5S72Bk+FYtRjO3Mh2Hw7meQPFgtGyNzcSu+J8MBAnTZ LUMg== X-Gm-Message-State: APjAAAXMcglcCD2beBEMdpSXlIoCvX8sRX5iH7h+m7sViAdYbGg82eia xt9QOB75rvToXypJRY/r4/CE9g== X-Received: by 2002:a63:8a4a:: with SMTP id y71mr30200038pgd.270.1556714506557; Wed, 01 May 2019 05:41:46 -0700 (PDT) Received: from [192.168.1.121] (66.29.164.166.static.utbb.net. [66.29.164.166]) by smtp.gmail.com with ESMTPSA id m131sm577985pfc.25.2019.05.01.05.41.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 05:41:45 -0700 (PDT) Subject: Re: [PATCH] io_uring: avoid page allocation warnings To: Mark Rutland Cc: Matthew Wilcox , linux-kernel@vger.kernel.org, Alexander Viro , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org References: <20190430132405.8268-1-mark.rutland@arm.com> <20190430141810.GF13796@bombadil.infradead.org> <20190430145938.GA8314@lakrids.cambridge.arm.com> <20190430170302.GD8314@lakrids.cambridge.arm.com> <0bd395a0-e0d3-16a5-e29f-557e97782a48@kernel.dk> <20190501103026.GA11740@lakrids.cambridge.arm.com> From: Jens Axboe Message-ID: <710a3048-ccab-260d-d8b7-1d51ff6d589d@kernel.dk> Date: Wed, 1 May 2019 06:41:43 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190501103026.GA11740@lakrids.cambridge.arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/1/19 4:30 AM, Mark Rutland wrote: > On Tue, Apr 30, 2019 at 12:11:59PM -0600, Jens Axboe wrote: >> On 4/30/19 11:03 AM, Mark Rutland wrote: >>> On Tue, Apr 30, 2019 at 10:21:03AM -0600, Jens Axboe wrote: >>>> On 4/30/19 8:59 AM, Mark Rutland wrote: >>>>> On Tue, Apr 30, 2019 at 07:18:10AM -0700, Matthew Wilcox wrote: >>>>>> On Tue, Apr 30, 2019 at 02:24:05PM +0100, Mark Rutland wrote: >>>>>>> In io_sqe_buffer_register() we allocate a number of arrays based on the >>>>>>> iov_len from the user-provided iov. While we limit iov_len to SZ_1G, >>>>>>> we can still attempt to allocate arrays exceeding MAX_ORDER. >>>>>>> >>>>>>> On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and >>>>>>> iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to >>>>>>> allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320 >>>>>>> bytes, which is greater than 4MiB. This results in SLUB warning that >>>>>>> we're trying to allocate greater than MAX_ORDER, and failing the >>>>>>> allocation. >>>>>>> >>>>>>> Avoid this by passing __GFP_NOWARN when allocating arrays for the >>>>>>> user-provided iov_len. We'll gracefully handle the failed allocation, >>>>>>> returning -ENOMEM to userspace. >>>>>>> >>>>>>> We should probably consider lowering the limit below SZ_1G, or reworking >>>>>>> the array allocations. >>>>>> >>>>>> I'd suggest that kvmalloc is probably our friend here ... we don't really >>>>>> want to return -ENOMEM to userspace for this case, I don't think. >>>>> >>>>> Sure. I'll go verify that the uring code doesn't assume this memory is >>>>> physically contiguous. >>>>> >>>>> I also guess we should be passing GFP_KERNEL_ACCOUNT rateh than a plain >>>>> GFP_KERNEL. >>>> >>>> kvmalloc() is fine, the io_uring code doesn't care about the layout of >>>> the memory, it just uses it as an index. >>> >>> I've just had a go at that, but when using kvmalloc() with or without >>> GFP_KERNEL_ACCOUNT I hit OOM and my system hangs within a few seconds with the >>> syzkaller prog below: >>> >>> ---- >>> Syzkaller reproducer: >>> # {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false EnableNetDev:false EnableNetReset:false EnableCgroups:false EnableBinfmtMisc:false EnableCloseFds:false UseTmpDir:false HandleSegv:false Repro:false Trace:false} >>> r0 = io_uring_setup(0x378, &(0x7f00000000c0)) >>> sendmsg$SEG6_CMD_SET_TUNSRC(0xffffffffffffffff, &(0x7f0000000240)={&(0x7f0000000000)={0x10, 0x0, 0x0, 0x40000000}, 0xc, 0x0, 0x1, 0x0, 0x0, 0x10}, 0x800) >>> io_uring_register$IORING_REGISTER_BUFFERS(r0, 0x0, &(0x7f0000000000), 0x1) >>> ---- >>> >>> ... I'm a bit worried that opens up a trivial DoS. >>> >>> Thoughts? >> >> Can you post the patch you used? > > Diff below. And the reproducer, that was never posted. Patch looks fine to me. Note that buffer registration is under the protection of RLIMIT_MEMLOCK. That's usually very limited for non-root, as root you can of course consume as much as you want and OOM the system. -- Jens Axboe