Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2380348pxp; Fri, 18 Mar 2022 09:20:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxsHkpsaeNipSKbpVTx4dcja4q1Q3HUVfOxxTwzvTYwGGGWLQpmVgl2DPRC3MEOG84z4E2S X-Received: by 2002:a05:6402:51d2:b0:415:c171:346c with SMTP id r18-20020a05640251d200b00415c171346cmr10113667edd.19.1647620452296; Fri, 18 Mar 2022 09:20:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647620452; cv=none; d=google.com; s=arc-20160816; b=LBNOt08AN/ztW6O48WJu+Dj/d6GnOUqZ6/t97F9x8aMDnxAK/Hie7LxfJRaOlse8PW U8favPILEAzpvP+BZQm00cF7kVSjiLujvZU3dmtvvSDWGvTrG5YNLd+vfys3UCL0yMCt mVsdshzGFV2x2tfKWrRH1GMrbUqY5v73XKRXdvzDF42f5BOXECphWFyzK39yx0zhq88s CzxjZp5az+DFWEKUcneHmWsU/+6pX7oNLSMe33JnXiBgwocfRzlNARFADsTbzXgCnfCw P6jPaCYKVKSNCVpeUlNoeJQsZ/EDtzo6FJz3kcdbYDn2iPwwXFK20pjx6lOJWmppoRJB h8fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=ffbZZDrcdGMKTpclOYncRs09YC3Awsec51kId4S+aIk=; b=kPZbrwBX1gOKKEats5qHea28l/CGO/Veimr/RxFdN0W4fgMYkgRpttYVeTa60aZFSW 0BB0xNd/o4zzYLcyOp1xE2if0PdCl1f4o2OuUIEFilnpNnMe7JwsHOZ8dM083D/Gg3/O EIdN5xSkNWiumd4dvmAz9QiCpErAvE3LbAt+3tWpoocwz9cd0aTCKy0sz4KSV0OVcKiR BZH7uTOCQNjvAI2bSSUlyDEt5dH7KCVFvrulV/q4jmfza3Z5xM/FwGzcMvQGMOW4SMAd kBGzF0IdV7scC66dyx3nlsuD109kn1ztKQIZ8A4NUkrkErLiMUmKgZInl67nW4/z9CZb ch9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=VtE7hRMB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j12-20020a170906278c00b006df806a05d2si1425912ejc.767.2022.03.18.09.20.26; Fri, 18 Mar 2022 09:20:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=VtE7hRMB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237016AbiCROHM (ORCPT + 99 others); Fri, 18 Mar 2022 10:07:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233096AbiCROHK (ORCPT ); Fri, 18 Mar 2022 10:07:10 -0400 Received: from alexa-out-sd-01.qualcomm.com (alexa-out-sd-01.qualcomm.com [199.106.114.38]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23DD923456D; Fri, 18 Mar 2022 07:05:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1647612351; x=1679148351; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=ffbZZDrcdGMKTpclOYncRs09YC3Awsec51kId4S+aIk=; b=VtE7hRMBPrjhqsAPxrXpW4SOwLgqOoO7wcgeZtMW40f8a5gETDIsH3q1 dVeVnMyWZIGW8MpkthsL+d4r1bqmNRVDcz2g61DvwYglC3aqWOidjl8Mz KUCI6XLC1BdeXXFneM64hY/GBION0Zf4BNTkuygvT+ngifGUOwZsrh/Gq s=; Received: from unknown (HELO ironmsg05-sd.qualcomm.com) ([10.53.140.145]) by alexa-out-sd-01.qualcomm.com with ESMTP; 18 Mar 2022 07:05:50 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg05-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 07:05:50 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Fri, 18 Mar 2022 07:05:49 -0700 Received: from [10.216.20.137] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Fri, 18 Mar 2022 07:05:45 -0700 Message-ID: <74852e90-003b-84b8-9836-72258e3c5057@quicinc.com> Date: Fri, 18 Mar 2022 19:35:41 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Content-Language: en-US To: Nadav Amit , Suren Baghdasaryan CC: Minchan Kim , Andrew Morton , Vlastimil Babka , David Rientjes , Stephen Rothwell , =?UTF-8?Q?Edgar_Arriaga_Garc=c3=ada?= , Michal Hocko , linux-mm , LKML , "# 5 . 10+" References: <4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com> <20220315164807.7a9cf1694ee2db8709a8597c@linux-foundation.org> <5428f192-1537-fa03-8e9c-4a8322772546@quicinc.com> <20220316142906.e41e39d2315e35ef43f4aad6@linux-foundation.org> From: Charan Teja Kalla In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thank you for valuable inputs. On 3/18/2022 2:08 AM, Nadav Amit wrote: >>>>>> IMO, it's worth to note in man page. >>>>>> >>>>> Or the current patch for just ENOMEM is sufficient here and we just have >>>>> to update the man page? >>>> I think the "On success, process_madvise() returns the number of bytes >>>> advised" behaviour sounds useful. But madvise() doesn't do that. >>>> >>>> RETURN VALUE >>>> On success, madvise() returns zero. On error, it returns -1 and errno >>>> is set to indicate the error. >>>> >>>> So why is it desirable in the case of process_madvise()? >>> Since process_madvise deal with multiple ranges and could fail at one of >>> them in the middle or pocessing, people could decide where the call >>> failed and then make a strategy whether they will abort at the point or >>> continue to hint next addresses. Here, problem of the strategy is API >>> doesn't return any error vaule if it has processed any bytes so they >>> would have limitation to decide a policy. That's the limitation for >>> every vector IO syscalls, unfortunately. >>> >>>> >>>> >>>> And why was process_madvise() designed this way? Or was it >>>> always simply an error in the manpage? >> Taking a closer look, indeed manpage seems to be wrong. >> https://elixir.bootlin.com/linux/v5.17-rc8/source/mm/madvise.c#L1154 >> indicates that in the presence of unmapped holes madvise will skip >> them but will return ENOMEM and that's what process_madvise is >> ultimately returning in this case. So, the manpage claim of "This >> return value may be less than the total number of requested bytes, if >> an error occurred after some iovec elements were already processed." >> does not reflect the reality in our case because the return value will >> be -ENOMEM. After the desired behavior is finalized I'll modify the >> manpage accordingly. > Since process_madvise() might be used in sort of non-cooperative mode, > I think that the caller cannot guarantee that it knows exactly the > memory layout of the process whose memory it madvise’s. I know that > MADV_DONTNEED for instance is not supported (at least today) by > process_madvise(), but if it were, the caller may want which exact > memory was madvise'd even if the target process ran some other > memory layout changing syscalls (e.g., munmap()). > > IOW, skipping holes and just returning the total number of madvise’d > bytes might not be enough. Then does the advised bytes range by default including holes is a correct design? Say the [start, len) range passed in the iovec by the user contains the layout like, vma1 -- hole-- vma2 -- hole -- vma3. Under ideal case, where all vma's are eligible for advise, the total bytes processed returning should be vma3->end - vma1->start. This is success case. Now, say that vma1 is succeeded but vma2(say VM_LOCKED) is failed at advise. In such case processed bytes will be vma2->start-vma1->start(still consider hole as bytes processed), so that user may restart/skip at vma2, then continue. This return type will be partially processed bytes. If the system doesn't found any VMA in the passed range by user, it returns ENOMEM as not a single advisable vma is found in the range. >