Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6858686ybi; Wed, 31 Jul 2019 23:11:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqye/j+3MO52hQzou1L5QKvN93mreErozEAwk11XIyi0jshaXkjXk3H3tlycA/zMjpb0Gqas X-Received: by 2002:a17:90a:8b98:: with SMTP id z24mr6743373pjn.77.1564639872949; Wed, 31 Jul 2019 23:11:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564639872; cv=none; d=google.com; s=arc-20160816; b=dvQYE2GkPJjWAhTAZ0v5nRg+4+EoMSKwC455ET9WCJtMIjse1007yaJ/c5aLcT0At4 0CILFGhao9euEP0B6gxxPdsufY0jTNS79LSiJPV13iMhhU7/6AI7Rb+OYgsp8iyUumff 9dZyaQiel+fOGIB72nejXlyb87/9MEu1aBqiC2m2aLl7POI20OWqYimUKqDYmUxg076k OH+uy7YGlx0IZ6Y3rvH9CplgBziDeSKgAZc5lWFo0TZycLvzTOpVPukr0dKF85KZo/aS 6u1gOEOoqxAqCSdNFr9/TFXt5b3H9H2nB2eFRR/XawM6tWnD85USgLMLvtZphKmkw1+I UQDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:from:cc:to:subject :content-transfer-encoding:mime-version:references:in-reply-to :user-agent:date:dkim-signature:dkim-filter; bh=hj4XLnWmaqXTliAjlGiL/tNc3I1ADZJcqQ2Xbh5pybc=; b=PYyw8vNE1ncuZ6vfXPUaR8gux+OXGS42pLWgs6eOC/NYqSOwoi+Yo1/ywHOCn7KrQr RobHAzoMX1F6Xxf9Efh6sy+pM4sVFDoa6FvxKN/VctIDCm3+qCMLLaH0dA25rFzRb1eS 0eCVFuBqBnzi/IZSGtHyuTMsJPtUJCUKSspCzN1nZg58RwFn16utfI6rJSESV+244Nyo +fbDecqsIWg+TfHQWgEj8+mINr2UmJTA2x2yXceQYyvHW++Dog/qm6OdReOXdiXOwFYx 50he2mHVr3xnn4W5c20ULYfsSbZk/hsAf4Ui3Kl6tA9v1PYUwOsC7fKCnjCzjyherj6W ZdNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@zytor.com header.s=2019071901 header.b=1R5ScGiU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zytor.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z7si34267576pfz.154.2019.07.31.23.10.57; Wed, 31 Jul 2019 23:11:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@zytor.com header.s=2019071901 header.b=1R5ScGiU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zytor.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728847AbfHAGKS (ORCPT + 99 others); Thu, 1 Aug 2019 02:10:18 -0400 Received: from terminus.zytor.com ([198.137.202.136]:32877 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725790AbfHAGKS (ORCPT ); Thu, 1 Aug 2019 02:10:18 -0400 Received: from [IPv6:2601:646:8600:3281:f549:c2d0:4f21:f394] ([IPv6:2601:646:8600:3281:f549:c2d0:4f21:f394]) (authenticated bits=0) by mail.zytor.com (8.15.2/8.15.2) with ESMTPSA id x7169TFe4004431 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 31 Jul 2019 23:09:39 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com x7169TFe4004431 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2019071901; t=1564639782; bh=hj4XLnWmaqXTliAjlGiL/tNc3I1ADZJcqQ2Xbh5pybc=; h=Date:In-Reply-To:References:Subject:To:CC:From:From; b=1R5ScGiUiT/AKNgnBJZVD+Yzs4BenICfhc/hiOMvQTDCsJLO/oZiKUr/AU+WxppBg xIOi7INakhU3dDsKqlfmr6Au0b+rQ6mScDwLXbqw98QaanC9XPxyJk9tyikd776tto Ni4m9PdTrzQKw0cvpA5KH3ZTlDFuo9JMLsV2cKEd+oztUZrnYJ7lRcUT+cIBB8vV5I cljsVaSkX598DyxVmj/jmh6HYNSLpnGnz+dypFp4egRrtySlyaR3kYp9HFKV5bPkdr k7lpuQy8oEJX6lAHhxxcoEDXf5dUyR51V0MPRyTxwz/zZteZ3Mm4HGWUKvKZ+l7HR2 q/+Ol/ldfFJAw== Date: Wed, 31 Jul 2019 23:09:20 -0700 User-Agent: K-9 Mail for Android In-Reply-To: References: <20190729215758.28405-1-dima@arista.com> <20190729215758.28405-26-dima@arista.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() To: Andy Lutomirski , Dmitry Safonov CC: LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andrei Vagin , Arnd Bergmann , Christian Brauner , Cyrill Gorcunov , "Eric W. Biederman" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , Linux Containers , criu@openvz.org, Linux API , X86 ML , Andrei Vagin From: hpa@zytor.com Message-ID: <4D0E6734-066D-4A72-A119-2FD6482F857D@zytor.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski wrote= : >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote: >> >> As it has been discussed on timens RFC, adding a new conditional >branch >> `if (inside_time_ns)` on VDSO for all processes is undesirable=2E >> It will add a penalty for everybody as branch predictor may >mispredict >> the jump=2E Also there are instruction cache lines wasted on cmp/jmp=2E > > >> >> +#ifdef CONFIG_TIME_NS >> +int vdso_join_timens(struct task_struct *task) >> +{ >> + struct mm_struct *mm =3D task->mm; >> + struct vm_area_struct *vma; >> + >> + if (down_write_killable(&mm->mmap_sem)) >> + return -EINTR; >> + >> + for (vma =3D mm->mmap; vma; vma =3D vma->vm_next) { >> + unsigned long size =3D vma->vm_end - vma->vm_start; >> + >> + if (vma_is_special_mapping(vma, &vvar_mapping) || >> + vma_is_special_mapping(vma, &vdso_mapping)) >> + zap_page_range(vma, vma->vm_start, size); >> + } > >This is, unfortunately, fundamentally buggy=2E If any thread is in the >vDSO or has the vDSO on the stack (due to a signal, for example), this >will crash it=2E I can think of three solutions: > >1=2E Say that you can't setns() if you have other mms and ignore the >signal issue=2E Anything with green threads will disapprove=2E It's als= o >rather gross=2E > >2=2E Make it so that you can flip the static branch safely=2E As in my >other email, you'll need to deal with CoW somehow, > >3=2E Make it so that you can't change timens, or at least that you can't >turn timens on or off, without execve() or fork()=2E > >BTW, that static branch probably needs to be aligned to a cache line >or something similar to avoid all the nastiness with trying to poke >text that might be concurrently executing=2E This will be a mess=2E Since we are talking about different physical addresses I believe we shoul= d be okay as long as they don't cross page boundaries, and even if they do = it can be managed with proper page invalidation sequencing =E2=80=93 it's n= ot like the problems of having to deal with XMC on live pages like in the k= ernel=2E Still, you really need each instruction sequence to be present, with the o= nly difference being specific patch sites=2E Any fundamental reason this can't be strictly data driven? Seems odd to me= if it couldn't, but I might be missing something obvious=2E --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E