Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752510AbdLDW72 (ORCPT ); Mon, 4 Dec 2017 17:59:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39778 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752440AbdLDW7Y (ORCPT ); Mon, 4 Dec 2017 17:59:24 -0500 From: Jeff Moyer To: Kirill Tkhai Cc: Tejun Heo , axboe@kernel.dk, bcrl@kvack.org, viro@zeniv.linux.org.uk, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-aio@kvack.org, oleg@redhat.com Subject: Re: [PATCH 0/5] blkcg: Limit maximum number of aio requests available for cgroup References: <151240305010.10164.15584502480037205018.stgit@localhost.localdomain> <20171204200756.GC2421075@devbig577.frc2.facebook.com> <17b22d53-ad3d-1ba8-854f-fc2a43d86c44@virtuozzo.com> <20171204215234.GN2421075@devbig577.frc2.facebook.com> <6eaa11a6-a087-42ab-df65-9142b59bf726@virtuozzo.com> X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? Date: Mon, 04 Dec 2017 17:59:20 -0500 In-Reply-To: <6eaa11a6-a087-42ab-df65-9142b59bf726@virtuozzo.com> (Kirill Tkhai's message of "Tue, 5 Dec 2017 01:49:42 +0300") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 04 Dec 2017 22:59:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2145 Lines: 46 Kirill Tkhai writes: > On 05.12.2017 00:52, Tejun Heo wrote: >> Hello, Kirill. >> >> On Tue, Dec 05, 2017 at 12:44:00AM +0300, Kirill Tkhai wrote: >>>> Can you please explain how this is a fundamental resource which can't >>>> be controlled otherwise? >>> >>> Currently, aio_nr and aio_max_nr are global. In case of containers this >>> means that a single container may occupy all aio requests, which are >>> available in the system, and to deprive others possibility to use aio >>> at all. This may happen because of evil intentions of the container's >>> user or because of the program error, when the user makes this occasionally. >> >> Hmm... I see. It feels really wrong to me to make this a first class >> resource because there is a system wide limit. The only reason I can >> think of for the system wide limit is to prevent too much kernel >> memory consumed by creating a lot of aios but that squarely falls >> inside cgroup memory controller protection. If there are other >> reasons why the number of aios should be limited system-wide, please >> bring them up. >> >> If the only reason is kernel memory consumption protection, the only >> thing we need to do is making sure that memory used for aio commands >> are accounted against cgroup kernel memory consumption and >> relaxing/removing system wide limit. > > So, we just use GFP_KERNEL_ACCOUNT flag for allocation of internal aio > structures and pages, and all the memory will be accounted in kmem and > limited by memcg. Looks very good. > > One detail about memory consumption. io_submit() calls primitives > file_operations::write_iter and read_iter. It's not clear for me whether > they consume the same memory as if writev() or readv() system calls > would be used instead. writev() may delay the actual write till dirty > pages limit will be reached, so it seems logic of the accounting should > be the same. So aio mustn't use more not accounted system memory in file > system internals, then simple writev(). > > Could you please to say if you have thoughts about this? I think you just need to account the completion ring. Cheers, Jeff