Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2975368pxu; Mon, 7 Dec 2020 23:27:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJwM8mK3ri/t1UzrMlFcBst5xm7PftLvj6bCQ3pcK/8lsWTMvndqUAFv/9XGy7BWjjhAkVWg X-Received: by 2002:a17:906:c408:: with SMTP id u8mr22003266ejz.364.1607412474894; Mon, 07 Dec 2020 23:27:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607412474; cv=none; d=google.com; s=arc-20160816; b=J2k8M/TD92b/Z7NMrNim/z7drv84uxZtBISFEKe8ftxi8wbhkqC/SapAGys7fkmN14 JkOczE9pSbZtKj+V2r3VBY8jGPjnLghygIMBEQTuWyIldldCdCtqnf+ip7TNpvWJr4Cm oFEy38KP4j0UeB1eS3dtEXoKdoo1KSmhNDY3c/g4WHhCf9SKHnHedte7hyUG3NIvFnk3 50HPhseLTkSKU1mh20gpuNFPmfW05vuEC34T9+8amZoCEfogIQ7udaIiHrD5X8U8JSki ltslv0cQOpyDQH3Hw7/b5/M7jOKdKfnUL0uA02w6aGatARPeNgswkBu3YVX58dpnjU8S ir3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bwnZcJKOltLC78EdCJTTIZysg34xmbCEL+7ZUfnNors=; b=KcZfBu58lbNpjoc2koyB2gHmsOQw6Ywyt8PU+3lZT8rwWVr9WogWkWA3JPAcSaU67f opd52JZXywfoIiVRS/Z6s2u8NF1EAas/1qfbV9v+k/LqswSUo+nj4AztZYgmPnbI0bLD nwtORI2gm+/c7w9c+9yz9+hANH0A3cIjkL3O94IdyqK6iZqEWyrIRZIA+AbmO4/r0txY cFk5WIwys/fiH9KAPYxoL90kxpKfpov4PBOZ9kyNLs5N4d6lQOFRsTBLdlbr6U8PbDWb q4KiyYJjKgK/kGYsHptA9103qKFrtapJ+39IIe4z5NAE8MNI4mkti2Axvmtj75ORPTAU VLMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AZufqmYy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d27si7762161ejc.384.2020.12.07.23.27.31; Mon, 07 Dec 2020 23:27:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AZufqmYy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725972AbgLHHYi (ORCPT + 99 others); Tue, 8 Dec 2020 02:24:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725208AbgLHHYi (ORCPT ); Tue, 8 Dec 2020 02:24:38 -0500 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA0E2C0613D6 for ; Mon, 7 Dec 2020 23:23:57 -0800 (PST) Received: by mail-wm1-x342.google.com with SMTP id y23so1432334wmi.1 for ; Mon, 07 Dec 2020 23:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bwnZcJKOltLC78EdCJTTIZysg34xmbCEL+7ZUfnNors=; b=AZufqmYyhIhiIjIXjTNyh2J9tNPyJ/WjQvSjd60viGbcy6GE4yOkipfzzs3Yd3nGrK zzdEigae73Y3oNfJfTSEIGxTOKW0g7mSZpzCX+B1KhzUnNvuEVi21ujVjlN7uj8LZQ5/ CxhTuiDmpGfWSIVaLeECasOAflAsFj2jM7M125qmr1N+8dP/rD28XmdRHK/WrRyEz1Rl Lvxva7DN7Cd94jq7XnSe/3DBFF2Gy3MvH8ArnQVryjdrMF1sJ4/JORUiWCHX+4tYQt8H R4f+uolTnhOINvG6Jc+hTUauTim4L/AKv8xWP/4pnjJRZEaSybGonIFRIXggYWpclFCQ ucHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bwnZcJKOltLC78EdCJTTIZysg34xmbCEL+7ZUfnNors=; b=LYm7dWC6TRltlUtu09cqie6PvIzcRIkmD/7dz8Z0alK/POYT1KA6Vrm3G/ex0mCvu9 8fMUBQngeooDNU71VsDsmzGCShRVCpecQYNqFfxvJYwd3pSy3GtMbbDpAOxoegNfb25O G9Dv5QyS5kQMwuRpUsfLlc2WRfTDbeAaYigjtbFS1P7Ffoq5q87FGqdyB/YLJW6oT4TL F/yQ9wFikn+seLQvJEuE3J03D31mioVCxSbYim5GWfW4d62OPkw6D4cCM2zzcJKi1nrA t5ztOKEkpOJJFQ89k3KYfTlGDOTj21ajr4S8ImjVLwbp+T0lTXeiQST6GBrtnYl4Uh+b c0Kg== X-Gm-Message-State: AOAM533BdYY1m+SS1ahf6IvgJ5FlaxeaYqQAADwmfu83kzMNpAdsL9+a YiPMBpNdnlbH1K0dARE9VqW4s4wtaN4dvFVG560tkQ== X-Received: by 2002:a7b:cf37:: with SMTP id m23mr2415823wmg.37.1607412236344; Mon, 07 Dec 2020 23:23:56 -0800 (PST) MIME-Version: 1.0 References: <20201124053943.1684874-1-surenb@google.com> <20201124053943.1684874-2-surenb@google.com> <20201125231322.GF1484898@google.com> <20201125234322.GG1484898@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 7 Dec 2020 23:23:45 -0800 Message-ID: Subject: Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range To: Minchan Kim Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Christian Brauner , Oleg Nesterov , Tim Murray , linux-api@vger.kernel.org, linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 30, 2020 at 11:01 AM Suren Baghdasaryan wrote: > > On Wed, Nov 25, 2020 at 3:43 PM Minchan Kim wrote: > > > > On Wed, Nov 25, 2020 at 03:23:40PM -0800, Suren Baghdasaryan wrote: > > > On Wed, Nov 25, 2020 at 3:13 PM Minchan Kim wrote: > > > > > > > > On Mon, Nov 23, 2020 at 09:39:42PM -0800, Suren Baghdasaryan wrote: > > > > > process_madvise requires a vector of address ranges to be provided for > > > > > its operations. When an advice should be applied to the entire process, > > > > > the caller process has to obtain the list of VMAs of the target process > > > > > by reading the /proc/pid/maps or some other way. The cost of this > > > > > operation grows linearly with increasing number of VMAs in the target > > > > > process. Even constructing the input vector can be non-trivial when > > > > > target process has several thousands of VMAs and the syscall is being > > > > > issued during high memory pressure period when new allocations for such > > > > > a vector would only worsen the situation. > > > > > In the case when advice is being applied to the entire memory space of > > > > > the target process, this creates an extra overhead. > > > > > Add PMADV_FLAG_RANGE flag for process_madvise enabling the caller to > > > > > advise a memory range of the target process. For now, to keep it simple, > > > > > only the entire process memory range is supported, vec and vlen inputs > > > > > in this mode are ignored and can be NULL and 0. > > > > > Instead of returning the number of bytes that advice was successfully > > > > > applied to, the syscall in this mode returns 0 on success. This is due > > > > > to the fact that the number of bytes would not be useful for the caller > > > > > that does not know the amount of memory the call is supposed to affect. > > > > > Besides, the ssize_t return type can be too small to hold the number of > > > > > bytes affected when the operation is applied to a large memory range. > > > > > > > > Can we just use one element in iovec to indicate entire address rather > > > > than using up the reserved flags? > > > > > > > > struct iovec { > > > > .iov_base = NULL, > > > > .iov_len = (~(size_t)0), > > > > }; > > > > > > > > Furthermore, it would be applied for other syscalls where have support > > > > iovec if we agree on it. > > > > > > > > > > The flag also changes the return value semantics. If we follow your > > > suggestion we should also agree that in this mode the return value > > > will be 0 on success and negative otherwise instead of the number of > > > bytes madvise was applied to. > > > > Well, return value will depends on the each API. If the operation is > > desruptive, it should return the right size affected by the API but > > would be okay with 0 or error, otherwise. > > I'm fine with dropping the flag, I just thought with the flag it would > be more explicit that this is a special mode operating on ranges. This > way the patch also becomes simpler. > Andrew, Michal, Christian, what do you think about such API? Should I > change the API this way / keep the flag / change it in some other way? Friendly ping to get some feedback on the proposed API please.