Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4828010ybl; Wed, 22 Jan 2020 05:30:02 -0800 (PST) X-Google-Smtp-Source: APXvYqwLYsdLRG7nE3hAIRvNepYA+ORkCqGy32KV0qTckIMrC7Qk1EuSlVeOKrF5D4mqc9bFHSYh X-Received: by 2002:aca:b3d6:: with SMTP id c205mr7008712oif.67.1579699802802; Wed, 22 Jan 2020 05:30:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579699802; cv=none; d=google.com; s=arc-20160816; b=tW295goUKbE4lyo47wd9Ey/KIbwWjM7i7azgFZXHBpLRV0o6gggSS4m74r0DqDCqfn KhLGE0GoOmveWX/SD+GCnngIWPlpLmurnY2UaT6kxMb7u54q5C5ixIteghZfxu/WWMTW 99W8wMSvThSFn9Ck5EuHM/DhJBDbTFqrMTsTBpp5lGJKtxgzsMR6MobSO+pAIVp1S6ME T0s8BrcXJBxnxKnstnDSNMncseoTp9E2ISiH0iyUH7FPZNxek++HFLImTPVzm067A1R2 cyUlBPbMmNy2AUx+GMiWw3pVqRTlVYnJaKJ8Mjbl12Pgc/dNQ87Ap1FudmjsLHhyFHbn hnlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:dkim-signature; bh=ey5rjg4mda5FhTLJWi1oq6Krp59ycus1HT0dQ739AXY=; b=rruKSSAApOKutuZV3eA9iVX+6YbRPgW05wMqOjXMIaRY+cUdYRxbRX6y1MkQpTe/yr p9jMEVqQr3pnKiGXJqCVEIA1ruHks/myHoz+cOYwBaxLOBwapYfOEctCCzNao7MYFUVt 1KY9m59xP5ZDHvJTPWLd9Ikco/ZmvkOJMTwklNubp28Aq8xtcZqJebDLCUL8XW2usPKH Nnh3eziFlejXSYbT2Ww+YnSFkkaj3tsMdc7JWqefBN8YYVR+SG2eICfz9AA6kCNBCj6B dCWzvvPsr9Ageyp4x2SKYDOIHJ4XFaaoO7rCNTM/gxGDG+iIJ1G3GX6Rqpr3beHxkDl+ 7XBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=GDoX0rS0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x24si24022212oto.255.2020.01.22.05.29.50; Wed, 22 Jan 2020 05:30:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=GDoX0rS0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730751AbgAVN2y (ORCPT + 99 others); Wed, 22 Jan 2020 08:28:54 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:13019 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729106AbgAVN2w (ORCPT ); Wed, 22 Jan 2020 08:28:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1579699731; x=1611235731; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=ey5rjg4mda5FhTLJWi1oq6Krp59ycus1HT0dQ739AXY=; b=GDoX0rS0g7Bgl6Msw+ER+sczNg11mIxNhIIYX2OcdQEpLOoIH5oon+ID mOLxlRE0DzfUe1VUAhAb/QVp4oXlHuPUXllsXcsJ/KvwlSiEUYsT2Y0UQ /Aoasn2KzNWIldjo6GmBGAaxyduXuwc8z+rnak0owxHU/F+g3RkY6v+sh E=; IronPort-SDR: HiUJIsusC/kKMOyUWLsjcfATwG4l6ckaqzbkN0XGdVQ/Dqilutw6gaCaKrUGvqQUEBjxqcR0Cr G5ccOjyeb07g== X-IronPort-AV: E=Sophos;i="5.70,350,1574121600"; d="scan'208";a="12762489" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-62350142.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 22 Jan 2020 13:28:50 +0000 Received: from EX13MTAUEA002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-62350142.us-east-1.amazon.com (Postfix) with ESMTPS id A1B3CA2210; Wed, 22 Jan 2020 13:28:44 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Wed, 22 Jan 2020 13:28:43 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.161.16) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 22 Jan 2020 13:28:35 +0000 From: SeongJae Park To: Michal Hocko CC: SeongJae Park , Minchan Kim , , , , , , , , , , , , , , , , , , Subject: Re: Re: Re: [PATCH v2 2/5] mm: introduce external memory hinting API Date: Wed, 22 Jan 2020 14:28:20 +0100 Message-ID: <20200122132820.29180-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200122100233.GT29276@dhcp22.suse.cz> (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.161.16] X-ClientProxiedBy: EX13D08UWB001.ant.amazon.com (10.43.161.104) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 22 Jan 2020 11:02:33 +0100 Michal Hocko wrote: > On Wed 22-01-20 10:36:24, SeongJae Park wrote: > > On Wed, 22 Jan 2020 09:28:53 +0100 Michal Hocko wrote: > > > > > On Tue 21-01-20 10:32:12, Minchan Kim wrote: > > > > On Mon, Jan 20, 2020 at 08:58:25AM +0100, Michal Hocko wrote: > > > [...] > > > > > The interface really has to be robust to future potential usecases. > > > > > > > > I do understand your concern but for me, it's chicken and egg problem. > > > > We usually do best effort to make something perfect as far as possible > > > > but we also don't do over-engineering without real usecase from the > > > > beginning. > > > > > > > > I already told you how we could synchronize among processes and potential > > > > way to be extended Daniel suggested(That's why current API has extra field > > > > for the cookie) even though we don't need it right now. > > > > > > If you can synchronize with the target task then you do not need a > > > remote interface. Just use ptrace and you are done with it. > > > > > > > If you want to suggest the other way, please explain why your idea is > > > > better and why we need it at this moment. > > > > > > I believe I have explained my concerns and why they matter. All you are > > > saying is that you do not care because your particular usecase doesn't > > > care. And that is a first signal of a future disaster when we end up > > > with a broken and unfixable interface we have to maintain for ever. > > > > > > I will not go as far as to nack this but you should seriously think > > > about other potential usecases and how they would work and what we are > > > going to do when a first non-cooperative userspace memory management > > > usecase materializes. > > > > Beside of the specific environment of Android, I think there are many ways to > > know the address space layout and access patterns of other processes. The > > idle_page_tracking might be an example that widelay available. > > > > Of course, the information might not strictly correct due to the timing issue, > > but could be still worth to be used under some extreme situations, such as > > memory pressure or fragmentation. For the same reason, ptrace() would not be > > sufficient, as we have no perfect control, but only some level of control that > > would be useful under specific situations. > > I am not sure I see your point. I am talking about races where a remote > task is operating on a completely different object because the one it > checked for has been unmapped and new one mapped over it. Memory > pressure or a fragmentation will not change the object itself. Sure the > memory might be reclaimed but that should be completely OK unless I am > missing something. Thank you for pointing out your concerns in more detail. I was assuming a case using MADV_PAGEOUT or MADV_HUGEPAGE like hints under access frequency monitoring for better performance under memory pressure or fragmentation, respectively. Under the race, such hints might incur some performance degradation, but no critical problem such as SEGV. I previously implemented such optimization for research purpose and it was worthy. Nonetheless, it was just a research purpose hack. MADV_FREE like hints might result in SEGV and thus of course should be avoided. But, to my perspective, the 4 hints madvise_process() is currently supporting (COLD, PAGEOUT, MERGEABLE, UNMERGEABLE) are not too risky even under the race. That's why I said the incorrect information could be worth to be used under some extreme situations. > > > I assume the users of this systemcall would understand the tradeoff and make > > decisions. > > I disagree. My experience tells me that users tend to squeeze the > maximum and beyond and hope they get what they want. > > > Also, as the users already have the right to do the tradeoff, I > > think it's fair. In other words, I think the caller has both the power and the > > responsibility to deal with the time-to-check-time-to-react problem. > > > > Nonetheless, I also agree this is important concern and the patch would be > > better if it adds more detailed documentation regarding this issue. > > If there is _really_ a strong consensus that the racy interface is > reasonable then it absolutely has to be described with a clearly state > that those races might result in hard to predict behavior unless all > tasks sharing the address space are blocked between the check and the > madvise call. So, it's still too risky to simply believe users to do the things well on their responsibility, but a strong real consensus on needs and clear description might justify this. I also agreed. Thanks, SeongJae Park > -- > Michal Hocko > SUSE Labs >