Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1061163ybe; Wed, 11 Sep 2019 08:47:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqyRtSCm+X254bgSWhh3ouACnOFO+I6QF6ubluiu9TpeQUt7OUQZZp4vRcJgMP7of8r6a5Nc X-Received: by 2002:a50:fe17:: with SMTP id f23mr25014017edt.224.1568216845114; Wed, 11 Sep 2019 08:47:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568216845; cv=none; d=google.com; s=arc-20160816; b=e3mH/1p98DdkbaP8jXogHxK9BTsPwdh4Sy6IaUFcYuOp7UvRu8Q4rWlRnO3RjdD+ZV n5ikS/CkUhKD2KQOD8dG0bKpyo4q/DE/2KZCa76/KJl88zh8zfpEaiGdMJIabCORXCPf w6scprQjhDWDQ7r/EoPfmfNuhZmQVFgf7H+41tb8Djdxacv9zlfv3Ls6xA0PmCPqryjB sZJNMz7k3l1ZYB0R8/OKvxbL7n7SAHzIeHoPby3bCPRAsGnATC4ijAAeSrM+FhHVCUdA ZTejc8l2AwOybMRC0y8Gx3368ZeuELj1W5lYKQeUEIXqgBESJx+djMXe7DSBbSKR8u2i U4DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=TRWeVJk4teoqq8BIMrR5TSsLtunyMagL/q/gqspMX9s=; b=SR6xy6X5x2a5pqC0bEHFV3eqOQrML9IVIfXAVckpVw3TsjiMWlf99pXS2aG4pu8PHw HjOtU6IB6HQ2UaQgucSw+oV8iztBN8EtMUZd5wTQ99oKBpyA9k2I8py/qi8mFb516Csp VU6CyjdcRE0c0Btx2AHXcURDXFiDT7Ic8GNr8eaCp9TlBWdaAavKz1pvj4xegQyryEEb 4/wNNmMxHE7ZdGCUX7xSz6sSyGg0GaJB2/PZm121EtnJ/dcjtcxW1JlafT+MgUzu/ATA 5tcbBJAIuCftsO+/PTwrt7AhyIB92B7PUb49ves8O83PCXBx5ftKA+T8Yep1eVNDUK9y M+tA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v19si10721432eja.46.2019.09.11.08.47.01; Wed, 11 Sep 2019 08:47:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728672AbfIKPom (ORCPT + 99 others); Wed, 11 Sep 2019 11:44:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53446 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728266AbfIKPol (ORCPT ); Wed, 11 Sep 2019 11:44:41 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 61BBAA371AF; Wed, 11 Sep 2019 15:44:41 +0000 (UTC) Received: from [10.10.125.194] (ovpn-125-194.rdu2.redhat.com [10.10.125.194]) by smtp.corp.redhat.com (Postfix) with ESMTP id 176E35D6A5; Wed, 11 Sep 2019 15:44:38 +0000 (UTC) Subject: Re: [RFC PATCH] Add proc interface to set PF_MEMALLOC flags To: Tetsuo Handa , Hillf Danton References: <20190911031348.9648-1-hdanton@sina.com> Cc: "Kirill A. Shutemov" , axboe@kernel.dk, James.Bottomley@HansenPartnership.com, martin.petersen@oracle.com, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-block@vger.kernel.org, Linux-MM From: Mike Christie Message-ID: <5D791666.7080302@redhat.com> Date: Wed, 11 Sep 2019 10:44:38 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.68]); Wed, 11 Sep 2019 15:44:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/11/2019 05:07 AM, Tetsuo Handa wrote: > On 2019/09/11 12:13, Hillf Danton wrote: >> >> On Tue, 10 Sep 2019 11:06:03 -0500 From: Mike Christie >>> >>>> Really? Without any privilege check? So any random user can tap into >>>> __GFP_NOIO allocations? >>> >>> That was a mistake on my part. I will add it in. >>> >> You may alternatively madvise a nutcracker as long as you would have >> added a sledgehammer under /proc instead of a gavel. >> >> --- a/include/uapi/asm-generic/mman-common.h >> +++ b/include/uapi/asm-generic/mman-common.h >> @@ -45,6 +45,7 @@ >> #define MADV_SEQUENTIAL 2 /* expect sequential page references */ >> #define MADV_WILLNEED 3 /* will need these pages */ >> #define MADV_DONTNEED 4 /* don't need these pages */ >> +#define MADV_NOIO 5 /* set PF_MEMALLOC_NOIO */ >> >> /* common parameters: try to keep these consistent across architectures */ >> #define MADV_FREE 8 /* free pages only if memory pressure */ >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -716,6 +716,7 @@ madvise_behavior_valid(int behavior) >> case MADV_WILLNEED: >> case MADV_DONTNEED: >> case MADV_FREE: >> + case MADV_NOIO: >> #ifdef CONFIG_KSM >> case MADV_MERGEABLE: >> case MADV_UNMERGEABLE: >> @@ -813,6 +814,11 @@ SYSCALL_DEFINE3(madvise, unsigned long, >> if (!madvise_behavior_valid(behavior)) >> return error; >> >> + if (behavior == MADV_NOIO) { >> + current->flags |= PF_MEMALLOC_NOIO; > > Yes, for "modifying p->flags when p != current" is not permitted. > > But I guess that there is a problem. Setting PF_MEMALLOC_NOIO causes > current_gfp_context() to mask __GFP_IO | __GFP_FS, but the OOM killer cannot > be invoked when __GFP_FS is masked. As a result, any userspace thread which > has PF_MEMALLOC_NOIO cannot invoke the OOM killer. If the userspace thread > which uses PF_MEMALLOC_NOIO is involved in memory reclaiming activities, > the memory reclaiming activities won't be able to make forward progress when > the userspace thread triggered e.g. a page fault. Can the "userspace components > that can run in the IO path" survive without any memory allocation? > Yes and no, when they can they will have preallocated the resources they need to make forward progress similar to how kernel storage drivers do. However for some resources, like in the network layer, both userspace and kernel drivers are not able to preallocate and may fail. >> + return 0; >> + } >> + >> if (start & ~PAGE_MASK) >> return error; >> len = (len_in + ~PAGE_MASK) & PAGE_MASK; >