Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3395579imu; Mon, 7 Jan 2019 02:34:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN5/Gi2Fy7S9cWjIRvkneU/2caX+vMWKhejWSvnJ+kbGTxG26NGPEkhHqay4Vs2GJyRxhVO9 X-Received: by 2002:a17:902:64c1:: with SMTP id y1mr60180475pli.64.1546857281719; Mon, 07 Jan 2019 02:34:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546857281; cv=none; d=google.com; s=arc-20160816; b=zyveUJAOL0Un9o/6eruDM27OK0ZSBgwyYYFBoYNDNmxFtmj2gQpqkdE3/qUULPE8ZW Bw4JJv8NvwyQAgGQoaxA0849jioEHi4nnSY678OAiLJqpG9YUOQvCc1Ayu4M+pQg3nmT zFfTiHGPShz7TOk71Wa99J0oCYWLyWCGWnAwVRks5QcNnJVhXne4xWRPxIjzvRvpLW+a f6Ga8/lawpa+UmUl40w3W1oiKDFOaEaoTzXlYFd38BWfSyw0AN1nDav4yUG8M23y2p9L YvINIm062iPeK6eNdVGNrB+Pm4Sk4Q4q/eK2tBDzI1QHKo8CwXDAsPiG9P6Yj2PcYaTV TQ2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:openpgp:from:references:cc:to:subject; bh=DDjQeIOluzfN7yepWNoIKneYQhDcza7wIzAWZI0SjJw=; b=Tv2dtKANfbSS0pKfD2glDxSUTVFFNqwozg6MPqyy6TiVPoksnL9XQIsYm00p7fE4Cd dX41i0UWMeYWZYUy1yEzSJiqk7rGHivyfl9APcDHp4UZpODd/1PKZX2Xn9LlUv0V4zwu i7uGlUJ5BnovJB/LBSH9WSoNp3Wk5F5LAqIK+bFiSq0jJZRIk7zuWU4ipVkgIU9otK9E E/XB9wJNwOHFGqqnodNFtRDQ3scus45/no9EuzhB7DrAQr75XaF0zzPIpx+cybhMyPQZ zeu7lZRLkB/WYORoggRXXGibKrW4sz/R3Vpt2zvKAPXrug3Fw+rku2osoong0glUzE81 2kEQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f11si59880568plo.254.2019.01.07.02.34.26; Mon, 07 Jan 2019 02:34:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726776AbfAGKdL (ORCPT + 99 others); Mon, 7 Jan 2019 05:33:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:37470 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726454AbfAGKdL (ORCPT ); Mon, 7 Jan 2019 05:33:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 28697AECD; Mon, 7 Jan 2019 10:33:09 +0000 (UTC) Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged To: Dominique Martinet , Linus Torvalds Cc: Matthew Wilcox , Jann Horn , Jiri Kosina , Andrew Morton , Greg KH , Peter Zijlstra , Michal Hocko , Linux-MM , kernel list , Linux API , daniel@gruss.cc References: <20190107043227.GA3325@nautica> From: Vlastimil Babka Openpgp: preference=signencrypt Message-ID: <151b4ac8-5cfc-ed30-db30-e4d67a324c4b@suse.cz> Date: Mon, 7 Jan 2019 11:33:08 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <20190107043227.GA3325@nautica> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/7/19 5:32 AM, Dominique Martinet wrote: > Linus Torvalds wrote on Sat, Jan 05, 2019: >> But I think my patch to just rip out all that page lookup, and just >> base it on the page table state has the fundamental advantage that it >> gets rid of code. Maybe I should jst commit it, and see if anything >> breaks? We do have options in case things break, and then we'd at >> least know who cares (and perhaps a lot more information of _why_ they >> care). > > There actually are many tools like fincore which depend on mincore to > try to tell whether a file is "loaded in cache" or not (I personally use > vmtouch[1], but I know of at least nocache[2] uses it as well to only > try to evict used pages) nocache could probably do fine without mincore. IIUC the point is to not evict anything that was already resident prior to running some command wrapped in nocache. Without the mincore checks, posix_fadvise(POSIX_FADV_DONTNEED) will still not drop anything that others have mapped. That means without mincore() it will drop data that's in cache but not currently in use by anybody, which shouldn't cause large performance regressions? > [1] https://hoytech.com/vmtouch/ > [2] https://github.com/Feh/nocache > > > I mostly use these to either fadvise(POSIX_FADV_DONTNEED) or > prefetch/lock whole files so my "production" use-cases don't actually > rely on the mincore part of them; Ah so you seem to confirm my above point. ... > FWIW I personally don't care much about "only for owner" or depending on > mmap options; I don't understand much of the security implications > honestly so I'm not sure how these limitations actually help. > On the other hand, a simple CAP_SYS_ADMIN check making the call take > either behaviour should be safe and would cover what I described above. So without CAP_SYS_ADMIN, mincore() would return mapping status, and with CAP_SYS_ADMIN, it would return cache residency status? Very clumsy :( Maybe if we introduced mincore2() with flags similar to BSD mentioned earlier in the thread, and the cache residency flag would require CAP_SYS_ADMIN or something similar. > (by the way, while we are discussing permissions, a regular user can use > fadvise dontneed on files it doesn't own as well as long as it can open > them for reading; I'm not sure if that would need restricting as well in > the context of the security issue. Probably not, as I've mentioned it won't evict what's mapped by somebody else. And eviction is also possible via controlling LRU, which is what the paper [1] does anyway (and also mentions that DONTNEED doesn't work). Being able to evict somebody's page is AFAIU not sufficient for attack, the side channel is about knowing that somebody brought that page back to RAM by touching it. > Frankly even with mincore someone > could likely tell the difference through timing, if they just do it a > few times. Do magic, probe, flush out, repeat until satisfied.) That's my bigger concern here. In [1] there's described a remote attack (on webserver) using the page fault timing differences for present/not present page cache pages. Noisy but works, and I expect locally it to be much less noisy. Yet the countermeasures section only mentions restricting mincore() as if it was sufficient (and also how to make evictions harder, but that's secondary IMHO). [1] https://arxiv.org/abs/1901.01161 > > Thanks, >