Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp394299ybi; Thu, 1 Aug 2019 21:16:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqy1Ra6cbLOVnIJ87XtWlOM4YUvWg9o3pE5/k9CZMX+lHLjpWexd5Sh+9lsDqcapqB7gZRxD X-Received: by 2002:a63:7c0d:: with SMTP id x13mr82344471pgc.360.1564719408506; Thu, 01 Aug 2019 21:16:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564719408; cv=none; d=google.com; s=arc-20160816; b=XFi1m1X8kwXU9Eh+MQZIHeR/VBqGOMf5hk6fd0IQol2kCPvPmLEhTIDnBKGS4+k/mh yPd3T+B9mvaybgdN5HQwCBBNd8Ot3FfYuvl91/z3RS9+aLWEzYeVmk9o7FjGuScJHq86 V3us3MmXDHyREA49cgJCLn5hr/sbNPWdO6mUL8oC8+BWCigCcT0iDkPm593YN2Wz4Oe6 BmXgbn3gDKjgCRAN9LwO5MnYe+PTDSJRc5WZKwJuMlkOHHvYN2prQr710GQJrkvGMGTL 1q3NXf4jg2w99me2yi7Y3+A/xR8XWnz1GgmT2ZVLps5sxiN9P8EopnwFjG0HDit1JGRb Uz1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=MljI43NbbFFGt+149oqH9Qc8jSJbCAuoVDXaYLAvKvk=; b=IHd71+HZeWsjDaN5RrD7X0SXlrLuDXvIJGDpqGIziQtqtI3ljKQ5MePsf7Ew/YmaLx 2dkiZxlm2UhLoYpxiZJ2wSKVNTESoYqJo1u1gN9KZR40v6M15tIWj2Z13xpU42q85ZGV va6bRozZTd5avKBvDg+IYtasEtS1ngh0xT1T3gdkAnGGz0K9uc3hU48ZXSzBhuZkVPtN F0+4cBtUnOaBnU8dfUIGUNG3mvy+wtI8AreYbxYH7Pk0WEJ+f9XOCTH5TUjYx0c6THel g1yVVwmR3rTHze3ndGVlDujkzlptLBauHhfzwEa1yD5cMQ7juKGEnfAURCTTPkg9Itu7 Ze9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="WOvxb0/q"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t7si32813581plr.27.2019.08.01.21.16.31; Thu, 01 Aug 2019 21:16:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="WOvxb0/q"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389299AbfHAVkI (ORCPT + 99 others); Thu, 1 Aug 2019 17:40:08 -0400 Received: from mail.kernel.org ([198.145.29.99]:35582 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731723AbfHAVkH (ORCPT ); Thu, 1 Aug 2019 17:40:07 -0400 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6D554217D4 for ; Thu, 1 Aug 2019 21:40:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1564695605; bh=gHeVvR22yg/BEAVWLA5zkJGRlEoRdzJ5A+Yzt/2dnxc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=WOvxb0/qC/fA+zskPassO/yOlfOY7EnXCS3QAMKfImgnKo2bvlppC1c7A3uIhqYY5 LSuTvWMIYfxSK1BC5+t+uQWTEJFqyN+fQU20ZdSAwSEcP2EAcy/0VSVf5yohlZAlv4 Oc7BCZSKhKussZZOEXYcPGJDf9mcKH3B/UyNuzNI= Received: by mail-wr1-f48.google.com with SMTP id x1so25181195wrr.9 for ; Thu, 01 Aug 2019 14:40:05 -0700 (PDT) X-Gm-Message-State: APjAAAVVhK72ys2xVoymBaI0Dw2CB6cCT0WZG695kLiIT9h4Y6fND+DN 4H+wVqMdlttBvPhMJGOIvhnIAeu6ZtJBRJkPoqpRtg== X-Received: by 2002:adf:f2d0:: with SMTP id d16mr45723172wrp.221.1564695603860; Thu, 01 Aug 2019 14:40:03 -0700 (PDT) MIME-Version: 1.0 References: <20190729215758.28405-1-dima@arista.com> <20190729215758.28405-26-dima@arista.com> <4D0E6734-066D-4A72-A119-2FD6482F857D@zytor.com> In-Reply-To: <4D0E6734-066D-4A72-A119-2FD6482F857D@zytor.com> From: Andy Lutomirski Date: Thu, 1 Aug 2019 14:39:51 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() To: "H. Peter Anvin" Cc: Andy Lutomirski , Dmitry Safonov , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andrei Vagin , Arnd Bergmann , Christian Brauner , Cyrill Gorcunov , "Eric W. Biederman" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , Linux Containers , criu@openvz.org, Linux API , X86 ML , Andrei Vagin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 31, 2019 at 11:09 PM wrote: > > On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski wrote= : > >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote: > >> > >> As it has been discussed on timens RFC, adding a new conditional > >branch > >> `if (inside_time_ns)` on VDSO for all processes is undesirable. > >> It will add a penalty for everybody as branch predictor may > >mispredict > >> the jump. Also there are instruction cache lines wasted on cmp/jmp. > > > > > >> > >> +#ifdef CONFIG_TIME_NS > >> +int vdso_join_timens(struct task_struct *task) > >> +{ > >> + struct mm_struct *mm =3D task->mm; > >> + struct vm_area_struct *vma; > >> + > >> + if (down_write_killable(&mm->mmap_sem)) > >> + return -EINTR; > >> + > >> + for (vma =3D mm->mmap; vma; vma =3D vma->vm_next) { > >> + unsigned long size =3D vma->vm_end - vma->vm_start; > >> + > >> + if (vma_is_special_mapping(vma, &vvar_mapping) || > >> + vma_is_special_mapping(vma, &vdso_mapping)) > >> + zap_page_range(vma, vma->vm_start, size); > >> + } > > > >This is, unfortunately, fundamentally buggy. If any thread is in the > >vDSO or has the vDSO on the stack (due to a signal, for example), this > >will crash it. I can think of three solutions: > > > >1. Say that you can't setns() if you have other mms and ignore the > >signal issue. Anything with green threads will disapprove. It's also > >rather gross. > > > >2. Make it so that you can flip the static branch safely. As in my > >other email, you'll need to deal with CoW somehow, > > > >3. Make it so that you can't change timens, or at least that you can't > >turn timens on or off, without execve() or fork(). > > > >BTW, that static branch probably needs to be aligned to a cache line > >or something similar to avoid all the nastiness with trying to poke > >text that might be concurrently executing. This will be a mess. > > Since we are talking about different physical addresses I believe we shou= ld be okay as long as they don't cross page boundaries, and even if they do= it can be managed with proper page invalidation sequencing =E2=80=93 it's = not like the problems of having to deal with XMC on live pages like in the = kernel. > > Still, you really need each instruction sequence to be present, with the = only difference being specific patch sites. > > Any fundamental reason this can't be strictly data driven? Seems odd to m= e if it couldn't, but I might be missing something obvious. I think it can be. There are at least two places where vDSO slow paths could hook without affecting fast paths: vclock_mode and the low bit of the sequence number.