Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp685543yba; Thu, 16 May 2019 07:23:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqzsVEhu8wXHdm19dELjfV13IndTv0OSYV6UNZdayLN+tWywUgPWw/ZppEfTvD1gYZPHiFQB X-Received: by 2002:a62:1c06:: with SMTP id c6mr43338751pfc.168.1558016621198; Thu, 16 May 2019 07:23:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558016621; cv=none; d=google.com; s=arc-20160816; b=SjzXcmh9IJL2BkzW5cJTSBv/esAh4NRWTb0UQ6fg3U0Jt+q+OheCtv0q1dJZEmumZc pEoqjwPRObK6xXfw+Us4vkUcTErrmU13HEBPNopoP41QBExjYQ7YVOEnpQDoo80nGmMd 1hu5GOVUC1lmXE8NtNVUE2h0hfe+N2K96qjF1VmRsyBYW/lVxergPbX60o2eyJ9S5p+C HjKFJ913BJWD7JDd9/7FQtsIuaTfk8TN4r/2rSWIGlRsv11rThfV4uXJjnDU+rC2ElJE s1vMFf9vPmiMWDQSXeS2yXoc4klusqyD0we13DLE/W8p6H0wQWYA0PTitcbsdA7HiljO a7xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ctvVQmdp/+3DlTEoEjBJpR8lgxdRXW/8ilCZfl4pfR8=; b=EP4Hljsk4uEI/cf7We3cjV2rU9fL/q2X8g108/WLxRmuq4n1wDo4HvuB0tmxO1BN9o Z/mJMyRNSFCVnDSMpEKwGmwOzky9/XsRS4oQqI9+eEfBurklsukCvaWd6O2NiNxzw39a fGudyUXlJxAa4bYYrGA0fIkrmbeXxEVoOEyXEho/tNySMVfKjrWrvjbT3q4BtVaTJrAK tpQSpJp/KCWOaDkyEE4S4k/Vhnk6E9pIPPPBnQROy8UKz6k3agvdfmBflXLUr7BNvBk0 c0+o4E+Jo948EOlXS48vdcQAI0Rpc8IjGUER4jpLqAB/QDibLE4myXAwSjQQCBK0Gxeq USIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12si4885421pgv.586.2019.05.16.07.23.23; Thu, 16 May 2019 07:23:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726889AbfEPOUR (ORCPT + 99 others); Thu, 16 May 2019 10:20:17 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:40574 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726717AbfEPOUQ (ORCPT ); Thu, 16 May 2019 10:20:16 -0400 Received: by mail-wr1-f67.google.com with SMTP id h4so3619017wre.7 for ; Thu, 16 May 2019 07:20:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ctvVQmdp/+3DlTEoEjBJpR8lgxdRXW/8ilCZfl4pfR8=; b=Bxdm7acmXiY3EC5ZJTl+wLozpu3lAZC1FzM6q2NMJP+644YMBiN+oRkdCs69aFSRsW 2l9OnWsctYT+JdQmT5fJArm/Exj3glATL7c6kUf8K+2LeaWIUly58IvLobBlSfHkTTfh hMauvaZgwXkKWsMgdQF52X+xaYxzOyyxxKD1HfrQTwuIRAE1SawvcIg3gzDGQ5sOd4OZ +p8bUoGg2g8XZpiaGIoLUk49YEEUziz/xTsSgcBzlfGwDkK+lRjySKp2r8Hpq1pVuuOl wekNUaMjvAk9/z8U2twIl9SiIFsIE48rhBwqMwjvGvrTi6MRYkk+l7JUyw1t3LsXub1a 8KNw== X-Gm-Message-State: APjAAAVNUR07/52XsXw4rVlgP0Ax3d+kM30W2xoLmSqXfLIIYub8fzGB ArxchHyV/yhAo9qHfj4Hnr3njQ== X-Received: by 2002:a5d:434c:: with SMTP id u12mr5534937wrr.92.1558016415071; Thu, 16 May 2019 07:20:15 -0700 (PDT) Received: from localhost (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id w13sm9370113wmk.0.2019.05.16.07.20.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 16 May 2019 07:20:13 -0700 (PDT) Date: Thu, 16 May 2019 16:20:13 +0200 From: Oleksandr Natalenko To: Jann Horn Cc: kernel list , Kirill Tkhai , Hugh Dickins , Alexey Dobriyan , Vlastimil Babka , Michal Hocko , Matthew Wilcox , Pavel Tatashin , Greg KH , Suren Baghdasaryan , Minchan Kim , Timofey Titovets , Aaron Tomlin , Grzegorz Halat , Linux-MM , Linux API Subject: Re: [PATCH RFC 4/5] mm/ksm, proc: introduce remote merge Message-ID: <20190516142013.sf2vitmksvbkb33f@butterfly.localdomain> References: <20190516094234.9116-1-oleksandr@redhat.com> <20190516094234.9116-5-oleksandr@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi. On Thu, May 16, 2019 at 12:00:24PM +0200, Jann Horn wrote: > On Thu, May 16, 2019 at 11:43 AM Oleksandr Natalenko > wrote: > > Use previously introduced remote madvise knob to mark task's > > anonymous memory as mergeable. > > > > To force merging task's VMAs, "merge" hint is used: > > > > # echo merge > /proc//madvise > > > > Force unmerging is done similarly: > > > > # echo unmerge > /proc//madvise > > > > To achieve this, previously introduced ksm_madvise_*() helpers > > are used. > > Why does this not require PTRACE_MODE_ATTACH_FSCREDS to the target > process? Enabling KSM on another process is hazardous because it > significantly increases the attack surface for side channels. > > (Note that if you change this to require PTRACE_MODE_ATTACH_FSCREDS, > you'll want to use mm_access() in the ->open handler and drop the mm > in ->release. mm_access() from a ->write handler is not permitted.) Sounds reasonable. So, something similar to what mem_open() & friends do now: static int madvise_open(...) ... struct task_struct *task = get_proc_task(inode); ... if (task) { mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); put_task_struct(task); if (!IS_ERR_OR_NULL(mm)) { mmgrab(mm); mmput(mm); ... Then: static ssize_t madvise_write(...) ... if (!mmget_not_zero(mm)) goto out; down_write(&mm->mmap_sem); if (!mmget_still_valid(mm)) goto skip_mm; ... skip_mm: up_write(&mm->mmap_sem); mmput(mm); out: return ...; And, finally: static int madvise_release(...) ... mmdrop(mm); ... Right? > [...] > > @@ -2960,15 +2962,63 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns, > > static ssize_t madvise_write(struct file *file, const char __user *buf, > > size_t count, loff_t *ppos) > > { > > + /* For now, only KSM hints are implemented */ > > +#ifdef CONFIG_KSM > > + char buffer[PROC_NUMBUF]; > > + int behaviour; > > struct task_struct *task; > > + struct mm_struct *mm; > > + int err = 0; > > + struct vm_area_struct *vma; > > + > > + memset(buffer, 0, sizeof(buffer)); > > + if (count > sizeof(buffer) - 1) > > + count = sizeof(buffer) - 1; > > + if (copy_from_user(buffer, buf, count)) > > + return -EFAULT; > > + > > + if (!memcmp("merge", buffer, min(sizeof("merge")-1, count))) > > This means that you also match on something like "mergeblah". Just use strcmp(). I agree. Just to make it more interesting I must say that /sys/kernel/mm/transparent_hugepage/enabled uses memcmp in the very same way, and thus echoing "alwaysssss" or "madviseeee" works perfectly there, and it was like that from the very beginning, it seems. Should we fix it, or it became (zomg) a public API? > > + behaviour = MADV_MERGEABLE; > > + else if (!memcmp("unmerge", buffer, min(sizeof("unmerge")-1, count))) > > + behaviour = MADV_UNMERGEABLE; > > + else > > + return -EINVAL; > > > > task = get_proc_task(file_inode(file)); > > if (!task) > > return -ESRCH; > > > > + mm = get_task_mm(task); > > + if (!mm) { > > + err = -EINVAL; > > + goto out_put_task_struct; > > + } > > + > > + down_write(&mm->mmap_sem); > > Should a check for mmget_still_valid(mm) be inserted here? See commit > 04f5866e41fb70690e28397487d8bd8eea7d712a. Yeah, it seems so :/. Thanks for the pointer. I've put it into the madvise_write snippet above. > > + switch (behaviour) { > > + case MADV_MERGEABLE: > > + case MADV_UNMERGEABLE: > > This switch isn't actually necessary at this point, right? Yup, but it is there to highlight a possibility of adding other, non-KSM options. So, let it be, and I'll just re-arrange CONFIG_KSM ifdef instead. Thank you. > > + vma = mm->mmap; > > + while (vma) { > > + if (behaviour == MADV_MERGEABLE) > > + ksm_madvise_merge(vma->vm_mm, vma, &vma->vm_flags); > > + else > > + ksm_madvise_unmerge(vma, vma->vm_start, vma->vm_end, &vma->vm_flags); > > + vma = vma->vm_next; > > + } > > + break; > > + } > > + up_write(&mm->mmap_sem); > > + > > + mmput(mm); > > + > > +out_put_task_struct: > > put_task_struct(task); > > > > - return count; > > + return err ? err : count; > > +#else > > + return -EINVAL; > > +#endif /* CONFIG_KSM */ > > } -- Best regards, Oleksandr Natalenko (post-factum) Senior Software Maintenance Engineer