Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp5087062pxb; Wed, 26 Jan 2022 04:37:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyuzfhT8YURQ7rvEFuHgapN8/YOcVBgiFIyTyNd0XIO9yjqLOpNg+u+FNNYANbALq3+9Ady X-Received: by 2002:a17:906:7302:: with SMTP id di2mr20119106ejc.322.1643200651101; Wed, 26 Jan 2022 04:37:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643200651; cv=none; d=google.com; s=arc-20160816; b=uNhLchbcLEtqXCbajqhfR1pgYifbCwLiCY7MSM9bnho6Ut6APediucZdixZwhQRKws aPATRwo0dWJAu0e1+zF1qY5SFsXAf1TsB6UrCJ4H/sMI2J7CuEsvkfNiWnaQZHOwt803 95eb3VIR27HQWunjpTk9tx9l6h7rohEuiahucN2Bjix0n/lQZgAd5k+F4xEpSEf1XbBk sE+pe9oGr41F36dCrQAhYg0+5VfCMfLNkFF7AtASX7W3xCmaqR+fEA7PqRaez2zmw/ne h/V5/l1bsMNN4CwLQiSH9uUUkxkTkUQ/d9yH2+I+b78WLxWrDXkFezzMoJ6Gj+culIt7 rVlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=K2xlmuTMSLgUzf9hY9UAlOmB3NWjGfRTRPH6tS8SRFU=; b=qGBVtrYWEdlQswir3JtjeJOWBk0mI8gfiM8GIQt87dH9eizDmMYhuDTHt278y28KxN qztUtu0FiHNNQEsmtiV3Lo6tDl58S8ehCBLWIlLvDJRJy8EZzv18FPwS1S1afLTHa0Qf BQDaDT+RBwV+cStUGQUCAKQDNYtnpGBqDSeHUTMchPxZROsGGsIiPiRmEil+YqeqBvAv 4g809nydSbHuDLl1iOO9xD7vOhGZ4cz+z0LF+6LLmU4fpbBXU8ggGQs8rU9ac61TMvWT 4RGrr6kBd571tJHqIJdJnzYCmJL6792CkXT1CNU9fyyhPuy3V8K2mi+Slc3o31larRRN 4Elg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=XXS9yW6R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dp15si12841767ejc.348.2022.01.26.04.37.05; Wed, 26 Jan 2022 04:37:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=XXS9yW6R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234222AbiAYWuj (ORCPT + 99 others); Tue, 25 Jan 2022 17:50:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234807AbiAYWtv (ORCPT ); Tue, 25 Jan 2022 17:49:51 -0500 Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A10DEC061751; Tue, 25 Jan 2022 14:48:42 -0800 (PST) Received: by mail-pg1-x52d.google.com with SMTP id g20so1390440pgn.10; Tue, 25 Jan 2022 14:48:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=K2xlmuTMSLgUzf9hY9UAlOmB3NWjGfRTRPH6tS8SRFU=; b=XXS9yW6R2VJb1Y5S0b5FZUlDL8wKG/F05kyfoZInEh0MRFKNXNNVzioCIjdlN/O2bZ KIfX+Zon2ejX17yCG3ElL/3pfgHf6wsmI1ShV7xgP1IlVoJpB2KjFKV4GJkeHSbaS5XK vfAEz1ia8LJBNgvbChYX8dAenhjYljKX71+Jl4MwCRUGam85nKc1Bbw2rS5Siz7STz+A OlrutnnjS8dBRs513O5AxUeNp/auXvb980T7VQrPbApRGQl9r54UEvdhFHKVdzo9HGgy +VSDloNtRd+Tg2lpFHHMcptAfsUhH+aaD+VYnOG0Vp3uu1BY+6ckCXZbXMnHZ9MyZWbl rr2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=K2xlmuTMSLgUzf9hY9UAlOmB3NWjGfRTRPH6tS8SRFU=; b=h1ZoqIlqXzf1xGzTPhwyQhkhbJ6mdo1AwQL+TlX05Gm2b9M3noqFZWjhWxx/QsNaEh xb+i6UL3tmNDfmfHHGoqwMZnsg6ttepJbZ55xdgJmYxHHlwemdfzoPU8+6PYOWPngvwC ju6C9yFT+LVMDPEV8Imvp7mpIyYlLiAcl5LCn3eVcRktWFhy6A7jre3hVcY6VUL0hxVW /710e37AIDDBMC6c9puV7UbbD0S4ZLj7Gdt+rG1fbZpfK26lMgFU3dUt4+0j5RKaq4K4 1Q8jZW6GXWYOnrtpH7RMHb2y0pebQdOezcCfLRBh6u2occfUmEkC7UnYbMd8MmXOtxee AO+A== X-Gm-Message-State: AOAM531lknj5o9anixw8h+vNN+Rq9njQwheC2pVRQGZRNgxJGbD1uy1t lNp7rHOLVDUfGx+jMsTXfimqeeUVgGNEpfdRdzw= X-Received: by 2002:aa7:888d:0:b0:4c2:7965:950d with SMTP id z13-20020aa7888d000000b004c27965950dmr20542477pfe.46.1643150922122; Tue, 25 Jan 2022 14:48:42 -0800 (PST) MIME-Version: 1.0 References: <20220121194926.1970172-1-song@kernel.org> <20220121194926.1970172-7-song@kernel.org> <7393B983-3295-4B14-9528-B7BD04A82709@fb.com> <5407DA0E-C0F8-4DA9-B407-3DE657301BB2@fb.com> <5F4DEFB2-5F5A-4703-B5E5-BBCE05CD3651@fb.com> <5E70BF53-E3FB-4F7A-B55D-199C54A8FDCA@fb.com> <2AAC8B8C-96F1-400F-AFA6-D4AF41EC82F4@fb.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 25 Jan 2022 14:48:30 -0800 Message-ID: Subject: Re: [PATCH v6 bpf-next 6/7] bpf: introduce bpf_prog_pack allocator To: Song Liu Cc: Song Liu , Ilya Leoshkevich , bpf , Network Development , LKML , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Peter Zijlstra , X86 ML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 25, 2022 at 2:25 PM Song Liu wrote: > > On Tue, Jan 25, 2022 at 12:00 PM Alexei Starovoitov > wrote: > > > > On Mon, Jan 24, 2022 at 11:21 PM Song Liu wrote: > > > > > > On Mon, Jan 24, 2022 at 9:21 PM Alexei Starovoitov > > > wrote: > > > > > > > > On Mon, Jan 24, 2022 at 10:27 AM Song Liu wrote: > > > > > > > > > > > > Are arches expected to allocate rw buffers in different ways? If not, > > > > > > I would consider putting this into the common code as well. Then > > > > > > arch-specific code would do something like > > > > > > > > > > > > header = bpf_jit_binary_alloc_pack(size, &prg_buf, &prg_addr, ...); > > > > > > ... > > > > > > /* > > > > > > * Generate code into prg_buf, the code should assume that its first > > > > > > * byte is located at prg_addr. > > > > > > */ > > > > > > ... > > > > > > bpf_jit_binary_finalize_pack(header, prg_buf); > > > > > > > > > > > > where bpf_jit_binary_finalize_pack() would copy prg_buf to header and > > > > > > free it. > > > > > > > > It feels right, but bpf_jit_binary_finalize_pack() sounds 100% arch > > > > dependent. The only thing it will do is perform a copy via text_poke. > > > > What else? > > > > > > > > > I think this should work. > > > > > > > > > > We will need an API like: bpf_arch_text_copy, which uses text_poke_copy() > > > > > for x86_64 and s390_kernel_write() for x390. We will use bpf_arch_text_copy > > > > > to > > > > > 1) write header->size; > > > > > 2) do finally copy in bpf_jit_binary_finalize_pack(). > > > > > > > > we can combine all text_poke operations into one. > > > > > > > > Can we add an 'image' pointer into struct bpf_binary_header ? > > > > > > There is a 4-byte hole in bpf_binary_header. How about we put > > > image_offset there? Actually we only need 2 bytes for offset. > > > > > > > Then do: > > > > int bpf_jit_binary_alloc_pack(size, &ro_hdr, &rw_hdr); > > > > > > > > ro_hdr->image would be the address used to compute offsets by JIT. > > > > > > If we only do one text_poke(), we cannot write ro_hdr->image yet. We > > > can use ro_hdr + rw_hdr->image_offset instead. > > > > Good points. > > Maybe let's go back to Ilya's suggestion and return 4 pointers > > from bpf_jit_binary_alloc_pack ? > > How about we use image_offset, like: > > struct bpf_binary_header { > u32 size; > u32 image_offset; > u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); > }; > > Then we can use > > image = (void *)header + header->image_offset; I'm not excited about it, since it leaks header details into JITs. Looks like we don't need JIT to be aware of it. How about we do random() % roundup(sizeof(struct bpf_binary_header), 64) to pick the image start and populate image-sizeof(struct bpf_binary_header) range with 'int 3'. This way we can completely hide binary_header inside generic code. The bpf_jit_binary_alloc_pack() would return ro_image and rw_image only. And JIT would pass them back into bpf_jit_binary_finalize_pack(). From the image pointer it would be trivial to get to binary_header with &63. The 128 byte offset that we use today was chosen arbitrarily. We were burning the whole page for a single program, so 128 bytes zone at the front was ok. Now we will be packing progs rounded up to 64 bytes, so it's better to avoid wasting those 128 bytes regardless.