Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8160770imu; Fri, 28 Dec 2018 11:47:30 -0800 (PST) X-Google-Smtp-Source: AFSGD/W4DijDTdoAdSts2x3J/azKIP7l4QvB42rhDeQhbOFwnddRlmYijzYnQ3QKCxXo+NoWR7dY X-Received: by 2002:a62:3241:: with SMTP id y62mr29235710pfy.178.1546026450787; Fri, 28 Dec 2018 11:47:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546026450; cv=none; d=google.com; s=arc-20160816; b=oKe4IasmBDa53CsH8IyLk54m6aK1u2iHlE/yo1hCUWHA29HJf3AhzXGouEmX3O9EBn DLDrDKc2FobZYdj/VLc4tWhIYYr4aDYG+cfqm+ZkHh3QNwe0arbMkCmmRMsmoVnU1XEw qdsTJ2ykdmdctW4l1cn0vf7te9rVaXHTwY7rSKXH0G8HcNtQx1xOmUy3VrONmlxxl/Nd IXVWBLHEqC5936xRYX56YBqfTTGi5HnZFYTItXENMj5zXd8ClUw1sZAIyTrCrGcBB2Zk 3DJfAD1EIH3TjPlUoX3+PCTLlfde/J0NB+lE4DiRbSRwry6Zo+OOlE3gf171N4OhMXbN TUaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=6VTJztM92oYt4LsgbQkq0KQlLA+AeO1F45n04gA+OvU=; b=b3CsNm03vJnsE2a0tB/plw0Z2j4dTFSJ4G9KRx+U343Ncd6DHD43jWwY0/HCO4YLB7 zLHW/0iDrvkCP1/xQA6kGvgqS8EJPw37e3Cl7D40UGf9vLFeiGl7IbEjPQXz5mpDSKeb wyUIbwpS9N1RjQmpRK0iBb2n+JpHy++0N5VZKjqWdl/V9NpX5ykv2r3kCmpjK8S3WfJw antC3b7kPhvQZnm+vOy1dVoSOS4ufvDJhyuGAzoGftO3nUxLt6ZEGKB3CRjQg/mabrmL t53zivbQ9XIIvl4Iev3bAT1K+9PKkqTD3XWpQV5YbhQiR42iK6/yxhQofFohfD9U8CH5 FtJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kylb3E2p; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 9si34712005pgn.524.2018.12.28.11.47.05; Fri, 28 Dec 2018 11:47:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kylb3E2p; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731546AbeL1LSb (ORCPT + 99 others); Fri, 28 Dec 2018 06:18:31 -0500 Received: from mail-oi1-f195.google.com ([209.85.167.195]:34674 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728367AbeL1LSb (ORCPT ); Fri, 28 Dec 2018 06:18:31 -0500 Received: by mail-oi1-f195.google.com with SMTP id r62so17156878oie.1 for ; Fri, 28 Dec 2018 03:18:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6VTJztM92oYt4LsgbQkq0KQlLA+AeO1F45n04gA+OvU=; b=kylb3E2pKD6rqT0zKiYLx6gMJUeW5s9gRGalcpejsjuHCVzVTowOyNPjxRVcasb9JW RF/opF0MDutr7AoV6vc4DnmUFa7WAz/eFigamtfr8/n7esfVzi2FpL+lrg8mlYvAcI4b LxdgMeW3BzrJpUfhEPxsoT/uuZeEdaXEbQ29U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6VTJztM92oYt4LsgbQkq0KQlLA+AeO1F45n04gA+OvU=; b=Bjogg5zUrvsmdvl34xEEhkvpULWB7lh6ECrXMNdcxqs0+pRJyuOhMJH49CH6N2FBDJ R0KByXaFTBWsUZQBH/HT7VNuewuU6+ofb7e+HHV3LEHmJKfr4hKKTFzzBz4TjTpZ90Lt nb8HlM2bU1tCIVvoyfv+Oc2gHGxIucaQcjIndK8eldcKi36JFgnh7rHhNsfax7WS10lO dr0eoaCZpJ1YyyDMg8OpWxIfPpM18CIevGCFPiAkoEOcRviw6TF3ZNKRUjwl+5BpWCFK WjtPqZZakVTwywyLE1t5s9sV0w0Qk8x8zmAyI8cQjk0HlAG46I0Bhi7spL2CfbjojkHC AVKg== X-Gm-Message-State: AA+aEWbalNZInKRXSlWmUgn/n50lpZok37QlQR1MsbNHLnk+nuRn5zHs RS3eC7Nh4nkDkoDWkv1E9mxB+FzJFXiIlhvfImj3ChDB X-Received: by 2002:aca:1e17:: with SMTP id m23mr17974889oic.332.1545995910314; Fri, 28 Dec 2018 03:18:30 -0800 (PST) MIME-Version: 1.0 References: <87bm56vqg4.fsf@mid.deneb.enyo.de> <9C6A7D45-CF53-4C61-B5DD-12CA0D419972@dilger.ca> In-Reply-To: <9C6A7D45-CF53-4C61-B5DD-12CA0D419972@dilger.ca> From: Peter Maydell Date: Fri, 28 Dec 2018 11:18:18 +0000 Message-ID: Subject: Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation To: Andreas Dilger Cc: Florian Weimer , linux-fsdevel , Linux API , Ext4 Developers List , lucho@ionkov.net, libc-alpha@sourceware.org, Arnd Bergmann , ericvh@gmail.com, hpa@zytor.com, lkml - Kernel Mailing List , QEMU Developers , rminnich@sandia.gov, v9fs-developer@lists.sourceforge.net Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 28 Dec 2018 at 00:23, Andreas Dilger wrote: > On Dec 27, 2018, at 10:41 AM, Peter Maydell wrote: > > As you note, this causes breakage for userspace programs which > > need to implement an API/ABI with 32-bit offset but which only > > have access to the kernel's 64-bit offset API/ABI. > > This is (IMHO) a bit of an oxymoron, isn't it? Applications using > the 64-bit API, but storing the value in a 32-bit field? I didn't say "which choose to store the value in a 32-bit field", I said "which have to implement an API/ABI which has 32-bit fields". In QEMU's case, we use the host kernel's ABI, which has 64-bit offset fields. We implement a syscall ABI for the guest binary we are running under emulation, which may have 32-bit offset fields (for instance if we are running a 32-bit Arm binary.) Both of these ABIs are fixed -- QEMU doesn't have a choice here, it just has to make the best effort it can with what the host kernel provides it, to provide the semantics the guest binary needs. My suggestion in this thread is that the host kernel provides a wider range of facilities so that QEMU can do the job it's trying to do. > The same > problem would exist for filesystems with 64-bit inodes or 64-bit > file offsets trying to store these values in 32-bit variables. > It might work most of the time, but it can also break randomly. In general inodes and offsets start from 0 and work up -- so almost all of the time they don't actually overflow. The problem with ext4 directory hash "offsets" is that they overflow all the time and immediately, so instead of "works unless you have a weird edge case" like all the other filesystems, it's "never works". > > I think the best fix for this would be for the kernel to either > > (a) consistently use a 32-bit hash or (b) to provide an API > > so that userspace can use the FMODE_32BITHASH flag the way > > that kernel-internal users already can. > > It would be relatively straight forward to add a "32bitapi" mount > option to return a 32-bit directory hash to userspace for operations > on that mountpoint (ext4 doesn't have 64-bit inode numbers yet). > However, I can't think of an easy way to do this on a per-process > basis without just having it call the 32-bit API directly. The problem is that there is no 32-bit API in some cases (unless I have misunderstood the kernel code) -- not all host architectures implement compat syscalls or allow them to be called from 64-bit processes or implement all the older syscall variants that had smaller offets. If there was a guaranteed "this syscall always exists and always gives me 32-bit offsets" we could use it. > For ext4 at least, you could just shift the high 32-bit part of > the 64-bit hash down into a 32-bit value in telldir(), and > shift it back up when seekdir() is called. Yes, that has been suggested, but it seemed a bit dubious to bake in knowledge of ext4's internal implementation details. Can we rely on this as an ABI promise that will always work for all versions of all file systems going forwards? thanks -- PMM