Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8543158imu; Fri, 28 Dec 2018 21:40:28 -0800 (PST) X-Google-Smtp-Source: AFSGD/UVKBzqQ8yGZDJez1uTee8oOwDtrro1rc2bERkt5+LxRWr5FC881cncLhdMRQ3ntNe+Rl2w X-Received: by 2002:a62:7e93:: with SMTP id z141mr30227271pfc.239.1546062028723; Fri, 28 Dec 2018 21:40:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546062028; cv=none; d=google.com; s=arc-20160816; b=o6nlgUSaXVvLaR76wTb3i1Nd8hCFP1ROYh8l2klbBR9IlkmyCc00YCaNB5U9Cn9wt/ 3zjQcFpJK8URq8noKnmB5BaW6aM/V9EgqyT+sCeSBTc8EqaiS/Wke5Mul6rLh0pLGIUe AylY1QRnRN74F49aOqVaxchJzw1YDvJ1d+4wbuGAjuxWB0VKVQ6aCY8uFSrUt1EjNKqV mbOuRdu1VAP1dcIMyTjD9X5ojoWX6T9L0Hu3crbEwqiDa8OHmxfs8UasXwvon+vdNsSE rW0crnpKxJsSYLMV9FoYGP1B/VOA4TCxdJ/Ec4OK8VBIQOyLX7mq4xvx5VqmEfk55DJ5 B9Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=DVRU4TKZa6jxqu6R6SbMdxQOygD+STQ1r4IFvyjL/9E=; b=lyBzqbsi1hvGzYykBRyXy8kqa1rXAnxVqY07Vwcyfj5RyTts5utO68duTfhZaWFGLn LTnfAQqU1NWQAYl2BgPTePHBtjadyvFs7cNj/YwjUxU1Ygko0wgkOyA62SSD9eS7EGT7 GOi/OduwrkIG5eCOOaQ/RODybxU7fNlswhPOXfKUkNPqIHyMGWQ0EAzoD2BkcS8jKb3i fwjQKRM3RJ0UbLozd1PV4WjDNGrn7psuHxNcEEatGEmCK617zW/EgGPkhCOjn/qw4eEU 1WH+qz98Q3SYH8XSPWMAPc+mFDgxHVsTNJbHxQLW5L7xFW+8n1PTm+824MOsOGYXNUjQ nIQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=f6VsMJZi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d10si139413pls.170.2018.12.28.21.40.13; Fri, 28 Dec 2018 21:40:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=f6VsMJZi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727505AbeL1XQ2 (ORCPT + 99 others); Fri, 28 Dec 2018 18:16:28 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:43738 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727408AbeL1XQ2 (ORCPT ); Fri, 28 Dec 2018 18:16:28 -0500 Received: by mail-pl1-f193.google.com with SMTP id gn14so10540853plb.10 for ; Fri, 28 Dec 2018 15:16:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=DVRU4TKZa6jxqu6R6SbMdxQOygD+STQ1r4IFvyjL/9E=; b=f6VsMJZichOxVbBt+2/xevV/tcqLlR8U7iStUclrTRKQTkN15V/CQ/nflfVYch85/w oDc7j69nRD81KTPBNI9h6oRbP0ArZRK1djCZsgDGmeyyShLu6nQzFnytWVXOuDq2sa76 4SqLyObO0W4Oon5nFjbH8tETpp/lyhqzwNUO6MlfkZ3GXbe/Kh80iQcP9TpBD/i01wi4 SGoMLjLsHLUviMGN2PhDMMR3dWjTuRtiTcGaf+YnsKt6Jv2C51p9lJYHL7F+3W+VZMhK hQNqgsVATCi5I1/grutcIEk6WQy6I+0IlhfM4ksKJJIOLtvyVIRpM10xORIZfttPuslG THMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=DVRU4TKZa6jxqu6R6SbMdxQOygD+STQ1r4IFvyjL/9E=; b=JIr36SbqdXk/L8V6tCkbSdgHWpnaMdmBiLc5b0TfpW4l78Uu+h41YLdn7o6B9HfDwJ YIt7g5MIisejCiKOAdgM1DZOEnhNq2M+JgsQffwntT95M2hlX6SC8/gFVxEYRgpf0Jv0 AhCUWHQ8hE7HSzqesOwK7EX/mqlap1Bxizu5lArIZ7wWbGjp6gQRCWzTtPR8dnvqsdRJ GhVz0v2a/W2A0TgkZ8u10RRV+F8tJ0xUe8f+sc1J4v+v4ZqAAIQ0FLEY5yQLDPCJyn/U NL1FOuOlafdPljCZf/kN1H9JUqbNXCSQkRoZxJTS9f/XATRLx7FsNQoPlrMZ4LjFtgmi YTYg== X-Gm-Message-State: AJcUukdMouVpQsCCAf0mEbp8e3jiSbq+t0l4c0LnAS6iEV4oeOQWJp0y aZqKZJ1R7/yUsEFTqMG58klr0g== X-Received: by 2002:a17:902:7296:: with SMTP id d22mr29902592pll.265.1546038987148; Fri, 28 Dec 2018 15:16:27 -0800 (PST) Received: from cabot.hitronhub.home (S0106bc4dfb596de3.ek.shawcable.net. [174.0.67.248]) by smtp.gmail.com with ESMTPSA id v13sm54755314pff.20.2018.12.28.15.16.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Dec 2018 15:16:25 -0800 (PST) From: Andreas Dilger Message-Id: <1EF1B31A-83D8-4642-BEBF-F56E45485223@dilger.ca> Content-Type: multipart/signed; boundary="Apple-Mail=_08DEB4F4-6BEB-4500-881F-7C4F24C62FEB"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation Date: Fri, 28 Dec 2018 16:16:21 -0700 In-Reply-To: Cc: Florian Weimer , linux-fsdevel , Linux API , Ext4 Developers List , lucho@ionkov.net, libc-alpha@sourceware.org, Arnd Bergmann , ericvh@gmail.com, hpa@zytor.com, lkml - Kernel Mailing List , QEMU Developers , rminnich@sandia.gov, v9fs-developer@lists.sourceforge.net To: Peter Maydell References: <87bm56vqg4.fsf@mid.deneb.enyo.de> <9C6A7D45-CF53-4C61-B5DD-12CA0D419972@dilger.ca> X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Apple-Mail=_08DEB4F4-6BEB-4500-881F-7C4F24C62FEB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Dec 28, 2018, at 4:18 AM, Peter Maydell = wrote: >=20 > On Fri, 28 Dec 2018 at 00:23, Andreas Dilger = wrote: >> On Dec 27, 2018, at 10:41 AM, Peter Maydell = wrote: >>> As you note, this causes breakage for userspace programs which >>> need to implement an API/ABI with 32-bit offset but which only >>> have access to the kernel's 64-bit offset API/ABI. >>=20 >> This is (IMHO) a bit of an oxymoron, isn't it? Applications using >> the 64-bit API, but storing the value in a 32-bit field? >=20 > I didn't say "which choose to store the value in a 32-bit field", > I said "which have to implement an API/ABI which has 32-bit fields". > In QEMU's case, we use the host kernel's ABI, which has 64-bit > offset fields. We implement a syscall ABI for the guest binary > we are running under emulation, which may have 32-bit offset fields > (for instance if we are running a 32-bit Arm binary.) Both of > these ABIs are fixed -- QEMU doesn't have a choice here, it > just has to make the best effort it can with what the host kernel > provides it, to provide the semantics the guest binary needs. > My suggestion in this thread is that the host kernel provides > a wider range of facilities so that QEMU can do the job it's > trying to do. >=20 >> The same >> problem would exist for filesystems with 64-bit inodes or 64-bit >> file offsets trying to store these values in 32-bit variables. >> It might work most of the time, but it can also break randomly. >=20 > In general inodes and offsets start from 0 and work up -- > so almost all of the time they don't actually overflow. > The problem with ext4 directory hash "offsets" is that they > overflow all the time and immediately, so instead of "works > unless you have a weird edge case" like all the other filesystems, > it's "never works". >=20 >>> I think the best fix for this would be for the kernel to either >>> (a) consistently use a 32-bit hash or (b) to provide an API >>> so that userspace can use the FMODE_32BITHASH flag the way >>> that kernel-internal users already can. >>=20 >> It would be relatively straight forward to add a "32bitapi" mount >> option to return a 32-bit directory hash to userspace for operations >> on that mountpoint (ext4 doesn't have 64-bit inode numbers yet). >> However, I can't think of an easy way to do this on a per-process >> basis without just having it call the 32-bit API directly. >=20 > The problem is that there is no 32-bit API in some cases > (unless I have misunderstood the kernel code) -- not all > host architectures implement compat syscalls or allow them > to be called from 64-bit processes or implement all the older > syscall variants that had smaller offets. If there was a guaranteed > "this syscall always exists and always gives me 32-bit offsets" > we could use it. The "32bitapi" mount option would use 32-bit hash for seekdir and telldir, regardless of what kernel API was used. That would just set the FMODE_32BITHASH flag in the file->f_mode for all files. Using 32-bit directory hash values is not necessarily harmful, but it returns the possibility to hit the problem with hash collisions that previously existed before the move to 64-bit hash values. This becomes more of a problem as directory sizes increase. >> For ext4 at least, you could just shift the high 32-bit part of >> the 64-bit hash down into a 32-bit value in telldir(), and >> shift it back up when seekdir() is called. >=20 > Yes, that has been suggested, but it seemed a bit dubious > to bake in knowledge of ext4's internal implementation details. > Can we rely on this as an ABI promise that will always work > for all versions of all file systems going forwards? Well, the directory cookies need to be relatively stable over time because they are exported to applications and possibly remote nodes via NFS, so it can't be changed very much. Cheers, Andreas --Apple-Mail=_08DEB4F4-6BEB-4500-881F-7C4F24C62FEB Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAlwmrsUACgkQcqXauRfM H+BcEQ//QtSwKaRnVgyNL34t9fYOhxAIo5A4AwgxC6dafaNLpFdDCfow3frNBTFj wtyqIoCmGFa10unoaI1hY+u9gTgs+//zAOtggfvLV3VRHzupxc5g2dILnAo4jVVW td+CgZamBXFaMrYkTX4gx2rm/9Bc/W0R/1KasC/DXtNQEMrUybwv1x2OMmhx8cfH fcVcjLbl94JNqfVZLVtCF08sqECh2dvBNWHB95Mfnzy6qEeLyI4VS32ydZ2HEAcB tkRGukzGn3eDhh3nZl5FiqWEEB5ZYSTE8KIPkZ/h2dCsSZjvW5gB/o6EC7Ig+A80 IuBddSkTF023UmQ6PiGKDr8BmKGYf7lJxa6IYW+2HjT6Vp89XmJSTCdoFRJnFW9z nKHTA2mFmHZN3L4DAvdMGc3VfXBEHyTw4V1q/Mf37QtP8Z+FSYvAGll22DbMMysl hlNFqothpjCNcjWpuT22Pu6T9gKLmwSvLYA0cr8C8dYGHt2F8+jbhL7ld5I0kBPA IGREujjMVKuA46BMrqQ8oL12PrYhAo0HbPvaAvwJ2HHcYtwjuv3bUOc6VgV8oipq wOFgTk8WAZfHEPQrfSoG5TatNixl58Q+GvBD7S8TS4/hgxaMluhxfw9sqQ6C7SwO JFVtp2MYJumNd6niFwfR8J52SV3+lIXohPfbccr6LEtvc7fMtIc= =Hl10 -----END PGP SIGNATURE----- --Apple-Mail=_08DEB4F4-6BEB-4500-881F-7C4F24C62FEB--