Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3468803imu; Mon, 7 Jan 2019 04:04:29 -0800 (PST) X-Google-Smtp-Source: ALg8bN4ilfGz4CSRv1EDGo1AqAOCy2P6ceq3I5LzYdKphC6U4vIYObc5ozA6dcRmHhSvmu8K94GJ X-Received: by 2002:a63:f615:: with SMTP id m21mr10921196pgh.428.1546862669534; Mon, 07 Jan 2019 04:04:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546862669; cv=none; d=google.com; s=arc-20160816; b=qAlwb2U8FVLYD+ttreUXtP2PSvOXMnuwJTAUuCfq5Glga1PM+4lfsOjWX9j7tfqBdc ujsdEkCv3kyCTwnw4K2emEFo8DDNBHgg6iIkd6tqqWx65Yq+oJI53438ns7EGmifacy5 aIHSsiG6Kk2ZTjkyj4KH0k5luKAGTlK7JX3pVfgQ3UXOMnyf1YaFyKEZTm3ZN22PeqN5 uhm88/gVZPyseunFa0QrBkpmvDEMa9jWlX/t/SyRFzSMZJCN07tS1Pkicr4L9tuJ36LN B0hZKMX7Sfj0vw3tiD/1bddEA9w2enxKQZCzdz3AvnHS3gqK741uJMjtJYqs4tB569d0 0Oow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:openpgp:from:references:cc:to:subject; bh=qKedcvDgwxubXM9VajcvHsCitYzcqy9K2bXrT43ddJ8=; b=DL3hvq7RuXMezEckijKfPvq9YWXi0K45+IsdcwEZNM4wXaCKwFU/g6DXf/ETd/ToTk vJgBvcMUANZzlrwAzJTIkCxvJx8cUs7zGIN0AX2H03Ehal1rRplgcA0jpa/t00gUKLKs lM8lMByR88SyqNcs/98czk137eezlr0QilFGZIn1yylzBQCtdqKdVMWsUmjAEN8LY2rf NCUlh/j2/T4RKnb4odIaNtw7HZ3FMGQEOITZlvz4T6OpLXY3tuET5Dpj8dSj5oz19QIu lHKDfgkzMumuTc4aD5i7/vpuhd6p0BobrpX2L9oQlOxEAVSqXMubzjDuhOcek2ae/+NY X9Wg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bd12si60458236plb.193.2019.01.07.04.04.13; Mon, 07 Jan 2019 04:04:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726712AbfAGL7X (ORCPT + 99 others); Mon, 7 Jan 2019 06:59:23 -0500 Received: from mx2.suse.de ([195.135.220.15]:52380 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726535AbfAGL7X (ORCPT ); Mon, 7 Jan 2019 06:59:23 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EB933AC32; Mon, 7 Jan 2019 11:59:20 +0000 (UTC) Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged To: Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , Jann Horn , Jiri Kosina , Andrew Morton , Greg KH , Peter Zijlstra , Michal Hocko , Linux-MM , kernel list , Linux API , daniel@gruss.cc References: <20190107043227.GA3325@nautica> <151b4ac8-5cfc-ed30-db30-e4d67a324c4b@suse.cz> <20190107110827.GA15249@nautica> From: Vlastimil Babka Openpgp: preference=signencrypt Message-ID: <43f734de-2b1c-aa9f-d373-b95663e913dd@suse.cz> Date: Mon, 7 Jan 2019 12:59:19 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <20190107110827.GA15249@nautica> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/7/19 12:08 PM, Dominique Martinet wrote: > > With the current mincore change, it will think everything was "in core" > and not flush anything unless my option to just fadvise dontneed > everything is passed though ; so even if we can make it work it is a > change of behaviour that is breaking an existing application, and it has > no way of telling it didn't work. IIUC the current change is commit 574823bfab82 ("Change mincore() to count "mapped" pages rather than "cached" pages") which will not pretend everything is "in core", but only pages that the calling process has populated page table mapping for (which implies in core, but the opposite doesn't hold). "nocache" most certainly doesn't populate the mappings before calling mincore(), as that would bring pages to page cache and defeat the purpose of determining if they were already there prior the nocache execution. Instead it will think that nothing was "in core", and thus later call fadvise dontneed or everything, but as I've said earlier that shouldn't matter much. > Honestly though, as I said, mincore() is much more useful for debugging > for me ; the application can be changed if required. I just pointed it > out as it'll need changing, and it has no obvious way of testing at > runtime if the syscall works (except dumb kernel version check, but that > won't work with stable backports); so it's not that obvious. Agree. >>> FWIW I personally don't care much about "only for owner" or depending on >>> mmap options; I don't understand much of the security implications >>> honestly so I'm not sure how these limitations actually help. >>> On the other hand, a simple CAP_SYS_ADMIN check making the call take >>> either behaviour should be safe and would cover what I described above. >> >> So without CAP_SYS_ADMIN, mincore() would return mapping status, and >> with CAP_SYS_ADMIN, it would return cache residency status? Very clumsy >> :( Maybe if we introduced mincore2() with flags similar to BSD mentioned >> earlier in the thread, and the cache residency flag would require >> CAP_SYS_ADMIN or something similar. > > I agree, that's rather clumsy... Or rather might lead to some unexpected > behaviours. I'm open to other ideas. > I'm not sure how the BSD flags help though? Definitely it would be a long-term solution, introducing new API, waiting for userspace to use it... and meanwhile we would have to keep the status quo or some kind of the clumsy/subtle approach. >>> (by the way, while we are discussing permissions, a regular user can use >>> fadvise dontneed on files it doesn't own as well as long as it can open >>> them for reading; I'm not sure if that would need restricting as well in >>> the context of the security issue. >> >> Probably not, as I've mentioned it won't evict what's mapped by somebody >> else. And eviction is also possible via controlling LRU, which is what >> the paper [1] does anyway (and also mentions that DONTNEED doesn't >> work). Being able to evict somebody's page is AFAIU not sufficient for >> attack, the side channel is about knowing that somebody brought that >> page back to RAM by touching it. > > Thanks for the link to the paper, I hadn't taken the time to extract it > from the news article but it's much more interesting indeed. It went public only this morning, the article was older :)