Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6901854ybi; Thu, 1 Aug 2019 00:03:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqO2zTkYtP26pthBx9srfBM4zLmKEoViAV0VoXxA53JgF9MdbgVY7nzzjL6kB3rott481A X-Received: by 2002:a63:9249:: with SMTP id s9mr113756525pgn.356.1564643002257; Thu, 01 Aug 2019 00:03:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564643002; cv=none; d=google.com; s=arc-20160816; b=eQe4TnZoS/+uKPTVf2FYiMPEtpw8fR3zNMcx0FtbOvbTGfxDQ8tbJbBBLD+STWyFeI fuzxU6ZmWQI87lIIlIzHGODt20Kpm4HnM0dUKDkRoW4C3Y2nUWCM7+6jutqatG41IhGe 2bnYamlQN8iDn9MqXsuoWgwHDWn3+qGt82UKRvp3JzEjlvaOU2hXJg/5LvqC4E+Mmd2l T5cHY/EUSTrd+sRdZ0SOIxDOlzGKvWsqdd6OVVEoZy3jPE4r59wdXsskn+o88mKSYxP0 4at+OjBvVSUIZOH3TlWghFfc/u1XICvG5YJtKWSz68XRr74+3xksxsa1Paj2q+5H0sU2 GZ6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=UeE0pHWmS5rXZ8+G8eAFv9L1oBjwjIcWa7pc1GNFzHo=; b=JaWMpCTxnMgU5t10gxwx6YcwDTxW9xltOr+8YBx6D/13UsAzFNMIqv98s5Ffhxzxeq /s4p8yxstL0oLmM1mG6fk6dnXXAUHt8FzoRn7Ilz84o/suh+/mQkiHry6C9GpJqVIyYq PeujSQD1eHYGp4Y6P2RAp1vk/0pjIEI97i9+h1mPNZjBL8PLO0SpPHMQA8SnyPOuOmm4 PLNBszZLjjMCAG/Ee7vdu7i5g59Jj2W+MC5ZHKmvFBs1SrgLZRAbnFH8TTIVcyJNLspQ pljMsozPZZaUcbHYEBbnXlANv25Cctf0V1NxuD6bbT2ddE2kTTX6SZJ/hlA3BGB1/hVZ LDcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nNupNBov; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u13si10153654pgf.382.2019.08.01.00.03.07; Thu, 01 Aug 2019 00:03:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nNupNBov; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726387AbfHAFek (ORCPT + 99 others); Thu, 1 Aug 2019 01:34:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:45586 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725768AbfHAFek (ORCPT ); Thu, 1 Aug 2019 01:34:40 -0400 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D90982182B for ; Thu, 1 Aug 2019 05:34:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1564637679; bh=HrxXpiw3LkZQVE8e0ZTsuPfAdvYomc3P/3Iwf2HWM78=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=nNupNBovkhscHiDgWVbbCg0LDDOb5JkbKeRM1UpBIVzZemFuiV3y940pJ/Y+s7F4T RVp0kshd5mQ5SjYQWwfZpVdYt0eQJiLV05dztLUjqqehwsWIkMkKCfu/+GeIXJVMoB fABhIvoJS5naQIMQ5SOE9eKhaf458LJ5De5eIdLk= Received: by mail-wr1-f46.google.com with SMTP id x1so22206148wrr.9 for ; Wed, 31 Jul 2019 22:34:38 -0700 (PDT) X-Gm-Message-State: APjAAAW7nNzAzhYCHrCTL0GeS6bvr+TQyS6rDiLqS3siYTwNgmonHFs2 3VnQM4CNF1vzadURyY2RLD8FQaxWGx9CmUMzdQ9Avg== X-Received: by 2002:adf:f28a:: with SMTP id k10mr61034365wro.343.1564637677248; Wed, 31 Jul 2019 22:34:37 -0700 (PDT) MIME-Version: 1.0 References: <20190729215758.28405-1-dima@arista.com> <20190729215758.28405-26-dima@arista.com> In-Reply-To: <20190729215758.28405-26-dima@arista.com> From: Andy Lutomirski Date: Wed, 31 Jul 2019 22:34:26 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() To: Dmitry Safonov Cc: LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andrei Vagin , Andy Lutomirski , Arnd Bergmann , Christian Brauner , Cyrill Gorcunov , "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , Linux Containers , criu@openvz.org, Linux API , X86 ML , Andrei Vagin Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote: > > As it has been discussed on timens RFC, adding a new conditional branch > `if (inside_time_ns)` on VDSO for all processes is undesirable. > It will add a penalty for everybody as branch predictor may mispredict > the jump. Also there are instruction cache lines wasted on cmp/jmp. > > +#ifdef CONFIG_TIME_NS > +int vdso_join_timens(struct task_struct *task) > +{ > + struct mm_struct *mm = task->mm; > + struct vm_area_struct *vma; > + > + if (down_write_killable(&mm->mmap_sem)) > + return -EINTR; > + > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > + unsigned long size = vma->vm_end - vma->vm_start; > + > + if (vma_is_special_mapping(vma, &vvar_mapping) || > + vma_is_special_mapping(vma, &vdso_mapping)) > + zap_page_range(vma, vma->vm_start, size); > + } This is, unfortunately, fundamentally buggy. If any thread is in the vDSO or has the vDSO on the stack (due to a signal, for example), this will crash it. I can think of three solutions: 1. Say that you can't setns() if you have other mms and ignore the signal issue. Anything with green threads will disapprove. It's also rather gross. 2. Make it so that you can flip the static branch safely. As in my other email, you'll need to deal with CoW somehow, 3. Make it so that you can't change timens, or at least that you can't turn timens on or off, without execve() or fork(). BTW, that static branch probably needs to be aligned to a cache line or something similar to avoid all the nastiness with trying to poke text that might be concurrently executing. This will be a mess.