Received: by 2002:ac0:b08d:0:0:0:0:0 with SMTP id l13csp1914934imc; Fri, 22 Feb 2019 13:45:22 -0800 (PST) X-Google-Smtp-Source: AHgI3IYjLJXmQ2YQZAGqEC9LQmuvlS/NkCyc7oTLKzeS2lCzSyVz7akEcIap7P++j8nSHwLO8+zk X-Received: by 2002:a63:6f09:: with SMTP id k9mr5926407pgc.326.1550871922894; Fri, 22 Feb 2019 13:45:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550871922; cv=none; d=google.com; s=arc-20160816; b=K3jDV+4Wb+CUIlf2H0ncE4vF+WQ+DnObLSFSmT0a4qfCCm4g12QTu9YkeEnCzVZd2u qmBC0+0B97rYyLiDA7KJ2B5EToj6xFUVD+v+OTqX+asJ5RcQS6Q7iZUZX2mLuGf7JDP0 HAZs4ham2xS6JNd6YEqp8yilUohnIv93kJg7AroGCc97XrHYCUmdghkWSUw3zzz8HgGC OGBSlIUZVGlLrGUfIEsLaO3Lt9BRlBqTuu/HNcPrfcYiJsI5F2NkNFUbPWZ0akj6juhE wsXtnZCgw/eAui0WViG+rmgm7cHEpS7nZEN6+lrX+CSsSgYEBA9cUS/nXMMhKDb8N12m H99g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=TmTUsQ8XYUbIhhW4ZG6z7ZfoyslHkigDRwfEZjlBtWg=; b=K0Km5S3Avm98PRfdMN31ye7w8FeIHRpnsu1G416dPpiYfp0TPOrEIhWqzN/pyyaYt9 sGf+0ClnLq95xTDFht2WFlc9sSIcKwGatpDLf7IRABPY83+4gQY1aJOOKaSRuFT5aeRN 8eGX8CRHDPLT6SlqnXh7bk6aMEFO76jgq+tHulUBiKL5YtagUhZvR+T9FsqeaQn08E1u EZDytnxpeRg2aL/Dkcl8/PDt+srhjO+SvixQ7D1h962K46+oiznLt24TE4kD1riTRaVL 3RE3tc4PBIXG0DMcQCrf/mS1ydrjiq5NAxtk610i0Qn31I3SW+oCxt2LlFneWLwUfVLm ec2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=E5ucNaia; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j16si2344341pgk.441.2019.02.22.13.45.07; Fri, 22 Feb 2019 13:45:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=E5ucNaia; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726374AbfBVVoX (ORCPT + 99 others); Fri, 22 Feb 2019 16:44:23 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:40631 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726287AbfBVVoW (ORCPT ); Fri, 22 Feb 2019 16:44:22 -0500 Received: by mail-ot1-f68.google.com with SMTP id v20so3131567otk.7 for ; Fri, 22 Feb 2019 13:44:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TmTUsQ8XYUbIhhW4ZG6z7ZfoyslHkigDRwfEZjlBtWg=; b=E5ucNaiahbWOvWQnQxHVMiSiVOL490gB69EwrBwV4O08ZN/oqXuL+AdJRoMLN8X7xI Tv5XrAa+mZHstKQK1F7pDJTdRMlF8fbOqjwzu3iqkL6QjqFrnYdBzZdPfK/jZkkEgfxT RKh4yEbUFa5HdYYIKRiepIx0iRdZ5J1B6C1fvGjLTXABoOFlvJ25tCyP8CZ/KLMv3LDj ODbjhpun4kLZp2SUVTPLoydHPDFxHPnuLlLI/KOdjE5fzlohka+Sm8Fxmjhuiz9OktwC bN8/R4QcCziOA3rhRgjino34wGQYjodTmP6xgtfvpo6/RymMS2ibCdQZUSqOl8S4KHPj gBkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TmTUsQ8XYUbIhhW4ZG6z7ZfoyslHkigDRwfEZjlBtWg=; b=Wq1ZkYfk8kWCGz3tIPpub3E4E+cIIOyh+SokGtWieuLkntx5BgrdiW2Bbi112WEEiN DlNEvXdbmBfJtzpc23QTuxO98xo6bKb+ZZSx5mXBlWLuls/ZLESti36xJbxDFENWYHYT oYLf89O5zPFz0Ktfy2gKpvjT5bN7sfFY5m5hy9WZLY/1s3zTLbQH1PCrXr4g+y8lT6ug TX7cusnlrJxifR18agRxdVXJQRE9UGsVOWBYAKkJJgELzgaJBykdzz60nJveo7JrfbVv qpK0JuzuvPVmsbDPNBKDbZ7xRtNlrGbkaOscK71har7nRP8E7/93hUdD3QOEQ+3m9Zk/ DX7Q== X-Gm-Message-State: AHQUAuanaJKLtmtRRVOxLblht6s8dZFMROyOP5b+2Ro7seXiDZ3Diq9U PPIgneMlB8ffcR7iHNuh57huIEyOFBAUC+RsARa1PpwPCbHnDw== X-Received: by 2002:a9d:6c84:: with SMTP id c4mr4082257otr.242.1550871861197; Fri, 22 Feb 2019 13:44:21 -0800 (PST) MIME-Version: 1.0 References: <20190219111802.1d6dbaa3@gandalf.local.home> <20190219140330.5dd9e876@gandalf.local.home> <20190220171019.5e81a4946b56982f324f7c45@kernel.org> <20190220094926.0ab575b3@gandalf.local.home> <20190222172745.2c7205d62003c0a858e33278@kernel.org> <20190222173509.88489b7c5d1bf0e2ec2382ee@kernel.org> <20190222192703.epvgxghwybte7gxs@ast-mbp.dhcp.thefacebook.com> <20190222143026.17d6f0f6@gandalf.local.home> <20190222193456.5vqppubzrcx5wsul@ast-mbp.dhcp.thefacebook.com> <9E670A9A-699C-4B65-962F-CE1AEFD72974@amacapital.net> In-Reply-To: <9E670A9A-699C-4B65-962F-CE1AEFD72974@amacapital.net> From: Jann Horn Date: Fri, 22 Feb 2019 22:43:54 +0100 Message-ID: Subject: Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault To: Andy Lutomirski Cc: Alexei Starovoitov , Steven Rostedt , Linus Torvalds , Masami Hiramatsu , Linux List Kernel Mailing , Ingo Molnar , Andrew Morton , Changbin Du , Kees Cook , Andy Lutomirski , Daniel Borkmann , Network Development , bpf@vger.kernel.org, Nadav Amit , Rick Edgecombe , Dave Hansen , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (adding some people from the text_poke series to the thread, removing stabl= e@) On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski wrote= : > > On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov wrote: > >> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > >> On Fri, 22 Feb 2019 11:27:05 -0800 > >> Alexei Starovoitov wrote: > >> > >>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > >>>> > >>>> Then we should still probably fix up "__probe_kernel_read()" to not > >>>> allow user accesses. The easiest way to do that is actually likely t= o > >>>> use the "unsafe_get_user()" functions *without* doing a > >>>> uaccess_begin(), which will mean that modern CPU's will simply fault > >>>> on a kernel access to user space. > >>> > >>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read(= ) > >>> and users pass both user and kernel addresses into it and expect > >>> that the helper will actually try to read from that address. > >>> > >>> If __probe_kernel_read will suddenly start failing on all user addres= ses > >>> it will break the expectations. > >>> How do we solve it in bpf_probe_read? > >>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by= -byte > >>> in the loop? > >>> That's doable, but people already complain that bpf_probe_read() is s= low > >>> and shows up in their perf report. > >> > >> We're changing kprobes to add a specific flag to say that we want to > >> differentiate between kernel or user reads. Can this be done with > >> bpf_probe_read()? If it's showing up in perf report, I doubt a single > > > > so you're saying you will break existing kprobe scripts? > > I don't think it's a good idea. > > It's not acceptable to break bpf_probe_read uapi. > > > > If so, the uapi is wrong: a long-sized number does not reliably identify = an address if you don=E2=80=99t separately know whether it=E2=80=99s a user= or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I ha= ve lobbied for RISC-V and future x86_64 to join the crowd. I don=E2=80=99t= know whether I=E2=80=99ll win this fight, but the uapi will probably have = to change for at least s390x. > > What to do about existing scripts is a different question. This lack of logical separation between user and kernel addresses might interact interestingly with the text_poke series, specifically "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for patching" (https://lore.kernel.org/lkml/20190221234451.17632-6-rick.p.edgec= ombe@intel.com/) and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text poking" (https://lore.kernel.org/lkml/20190221234451.17632-7-rick.p.edgecom= be@intel.com/), right? If someone manages to get a tracing BPF program to trigger in a task that has switched to the patching mm, could they use bpf_probe_write_user() - which uses probe_kernel_write() after checking that KERNEL_DS isn't active and that access_ok() passes - to overwrite kernel text that is mapped writable in the patching mm?