Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7724150imu; Fri, 28 Dec 2018 03:51:39 -0800 (PST) X-Google-Smtp-Source: ALg8bN4e5/VKMMqM2Syh6Tj/jwltT+LGnE2PJsml85AODtR4br8y3kovllDfkPkQwUsRxm/1N7ac X-Received: by 2002:a17:902:8c91:: with SMTP id t17mr26515663plo.119.1545997899016; Fri, 28 Dec 2018 03:51:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545997898; cv=none; d=google.com; s=arc-20160816; b=oDkw+RKtXIOgtxTygweRmOCeKHSswQQL/dnMqnuDBK8oLK3Bi733kUB7Gj12T9hSvk fRTISXCVdibdGVrNAfWgXdkdv0wS8WQUm9Dl4cPjn87g+kgInuc87AQBCQ7QBCI8CaVT FxEMt1YWjS947uPTIdLbkhd8ujbp/sJD7NANTFpq8yUZbXUHpO7GlYSnBR2IMCJBlEi0 Dv79vnjb1RIe+rcxxcbBaqM5RQL+jNajjjJp58GPxUWTAqsvwMYzAWNVTt/UMttxrO4Y 9Emsa4nyFqTKl8+MFOrIh8d5G0eHcoMtp8n8zvfANovetUy03T/BBekzRurT9raDtKst mZsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=kWwcgztRj0/FCCDsjpttsuNCQKr1Jd5NYN3R6M09jco=; b=nGc5eybO9adGZUdLh5NmRDFTD4YQEmG4QdOb8BM8LKYvxZMyfXE/jdI2NA/bHXZFsK +JZSd5+y4I/2SXWcWT0qSVPzE0DSbrbP/PrbRVlDDsbWqeiTU+UA9W6UoTKOkYnMIpr2 Y/ifyaUVBYOOyKWulD0SvAIvdPHsPfZL1PwB9S/D1LUcGlkK77BKTmdZNUzp4NA4Gdoo WCt/5aDnLjneS0fBvfXNtMz5Rz3KyJSnCfjMBI0Id8XI034CBELdB2RFGHjdAXPuf+g1 zROLWJ7LIqPSXJBGY6yhP+aPAqBwPk6zZc+b4OxmTnxC00Cga8VOAQqYqqE89BPUI0A2 ea0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=qLx7WqI6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 44si20689832plb.57.2018.12.28.03.51.11; Fri, 28 Dec 2018 03:51:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=qLx7WqI6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728231AbeL1AXc (ORCPT + 99 others); Thu, 27 Dec 2018 19:23:32 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:40377 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727413AbeL1AXc (ORCPT ); Thu, 27 Dec 2018 19:23:32 -0500 Received: by mail-pf1-f194.google.com with SMTP id i12so9735633pfo.7 for ; Thu, 27 Dec 2018 16:23:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=kWwcgztRj0/FCCDsjpttsuNCQKr1Jd5NYN3R6M09jco=; b=qLx7WqI6gIuj3hMcRTG2j4RN3cn1SBrkGdkKmiSdMfWddzEKJmwM+69FNTAJ2XmlnY BSO4g7JMwct9jV1IC9jcZTnV3Me243IPak2wJifmwrh0yC9eSVCZ4Ratgf4DHrhXxB0U 23RFRqlZDkWSx3bBTEQPMbUGW94mUuYyCIF/Wz5jZ02YX14xxboIR7hnnUDlOi88WMXp OwGa3Ihmn93/VHUfcgb7MeV3kaLCrcp3eSwHzmsUy9y8wQmNhu+IvZmyydzo4Hes23iE 7f9Lp76qJdun/Hdzd+tlfIef71ne/yGXgp21uX5aPkipeWdSf1ZLlon+hrtyoQMz25zl UiCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=kWwcgztRj0/FCCDsjpttsuNCQKr1Jd5NYN3R6M09jco=; b=MTatRyIpI7tRgYPCZgTDP1FULmx4/27cRbt4Mpf35UgxR7qUxjzIYPg9SGk71M1rDm HiWO5IdewXN1IAG57+YRhY7vHGY6/zShsyHCx7u1aGAZWvN5MRqpE+zBLWhvYLENsAt/ DzliIWx7ByX7s6vnycQoJfLFDrCOPx2LNjOPgHubxEogu47Ng9ezcGQvE68w2IiiXHMg w7h6QAYSiHKjLI+r84oWI+aiYv6Jm0o+J5Gzelk28DUoLoo+Gf/Xd/iIlgKHu7Qu4z8n 3pmlGKJiLEDGhfdlkYXKQekKYzEcr8/o6QME87spE7pECGqV2jZsVc50yOc4TgmKbU2X nF9A== X-Gm-Message-State: AJcUukdyRUVnjUUmigpUwnOburMGbqzBeLB0T2rvDFGbSsN1DrVCYzds C3sjGMdoCiDencVbbIJs5U5ogA== X-Received: by 2002:a62:68c5:: with SMTP id d188mr26778014pfc.194.1545956610463; Thu, 27 Dec 2018 16:23:30 -0800 (PST) Received: from cabot.hitronhub.home (S0106bc4dfb596de3.ek.shawcable.net. [174.0.67.248]) by smtp.gmail.com with ESMTPSA id p7sm58270677pfa.22.2018.12.27.16.23.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Dec 2018 16:23:29 -0800 (PST) From: Andreas Dilger Message-Id: <9C6A7D45-CF53-4C61-B5DD-12CA0D419972@dilger.ca> Content-Type: multipart/signed; boundary="Apple-Mail=_6F3080DD-1C91-40F7-84C8-6E083D782885"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation Date: Thu, 27 Dec 2018 17:23:28 -0700 In-Reply-To: Cc: Florian Weimer , linux-fsdevel , Linux API , Ext4 Developers List , lucho@ionkov.net, libc-alpha@sourceware.org, Arnd Bergmann , ericvh@gmail.com, hpa@zytor.com, lkml - Kernel Mailing List , QEMU Developers , rminnich@sandia.gov, v9fs-developer@lists.sourceforge.net To: Peter Maydell References: <87bm56vqg4.fsf@mid.deneb.enyo.de> X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Apple-Mail=_6F3080DD-1C91-40F7-84C8-6E083D782885 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Dec 27, 2018, at 10:41 AM, Peter Maydell = wrote: >=20 > On Thu, 27 Dec 2018 at 17:19, Florian Weimer wrote: >> We have a bit of an interesting problem with respect to the d_off >> field in struct dirent. >>=20 >> When running a 64-bit kernel on certain file systems, notably ext4, >> this field uses the full 63 bits even for small directories (strace = -v >> output, wrapped here for readability): >>=20 >> getdents(3, [ >> {d_ino=3D1494304, d_off=3D3901177228673045825, d_reclen=3D40, = d_name=3D"authorized_keys", d_type=3DDT_REG}, >> {d_ino=3D1494277, d_off=3D7491915799041650922, d_reclen=3D24, = d_name=3D".", d_type=3DDT_DIR}, >> {d_ino=3D1314655, d_off=3D9223372036854775807, d_reclen=3D24, = d_name=3D"..", d_type=3DDT_DIR} >> ], 32768) =3D 88 >>=20 >> When running in 32-bit compat mode, this value is somehow truncated = to >> 31 bits, for both the getdents and the getdents64 (!) system call (at >> least on i386). >=20 > Yes -- look for hash2pos() and friends in fs/ext4/dir.c. > The ext4 code in the kernel uses a 32 bit hash if (a) the kernel > is 32 bit (b) this is a compat syscall (b) some other bit of > the kernel asked it to via the FMODE_32BITHASH flag (currently only > NFS does that I think). >=20 > As you note, this causes breakage for userspace programs which > need to implement an API/ABI with 32-bit offset but which only > have access to the kernel's 64-bit offset API/ABI. This is (IMHO) a bit of an oxymoron, isn't it? Applications using the 64-bit API, but storing the value in a 32-bit field? The same problem would exist for filesystems with 64-bit inodes or 64-bit file offsets trying to store these values in 32-bit variables. It might work most of the time, but it can also break randomly. > I think the best fix for this would be for the kernel to either > (a) consistently use a 32-bit hash or (b) to provide an API > so that userspace can use the FMODE_32BITHASH flag the way > that kernel-internal users already can. It would be relatively straight forward to add a "32bitapi" mount option to return a 32-bit directory hash to userspace for operations on that mountpoint (ext4 doesn't have 64-bit inode numbers yet). However, I can't think of an easy way to do this on a per-process basis without just having it call the 32-bit API directly. > I couldn't think of or find any existing way for userspace > to get the right results here, which is why > 32-bit-guest-on-64-bit-host QEMU doesn't work on these filesystems > (depending on what exactly the guest's libc etc do). >=20 >> the 32-bit getdents system call emulation in a 64-bit qemu-user >> process would just silently truncate the d_off field as part of >> the translation, not reporting an error. >> [...] >> This truncation has always been a bug; it breaks telldir/seekdir >> at least in some cases. >=20 > Yes; you can't fit a quart into a pint pot, so if the guest > only handles 32-bit offsets then truncation is about all we > can do. This works fine if offsets are offsets, assuming the > directory isn't so enormous it would have broken the guest > anyway. I'm not aware of any issues with this other than the > oddball ext4 offsets-are-hashes situation -- could you expand > on the telldir/seekdir issue? (I suppose we should probably > make QEMU's syscall emulation layer return "no more entries" > rather than entries with truncated hashes.) For ext4 at least, you could just shift the high 32-bit part of the 64-bit hash down into a 32-bit value in telldir(), and shift it back up when seekdir() is called. Cheers, Andreas --Apple-Mail=_6F3080DD-1C91-40F7-84C8-6E083D782885 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAlwlbQEACgkQcqXauRfM H+DTmRAAtE1K8llCpAAh6pb9XLbHF5UY+Ku7sXMR9kxGgLgoV2aU7jgT/kPCwR+p wHucd80Rfr4pA57R7H53zrUZ23p4yY+vYil+TMYp17KOb0YoO6QJHI93D9dta7sg hV0+v1Xf1BpEfhtEhAj3WGgsPVE5rvCQDCGABF/ZmiN6/nLRIe73LaJbdxVdCh4G bTsX8V+oaEvR0GEzjoGwoXB2id8RemUWX6Md7gWUdmmNDeF5Ic11YYqvX/R83F/r JGgqh5P4p9L/LbbfYDIBVMX+VHZTaUrE45OeULR6dPP9I//zk57NWB2fjTLEyUVL VMTxU7l4SNBGIXkaRiE/mwe8cWRpM4aIZOT3UVId7f4Urjk3n/92GQ3dHK4b7rda YeOff8Cgni60CSXw6kjzaivSwce2DM3hWpwKRITS9+ast+Av6X/VcrmETCKQ5Orx Vt7kh9O8+RjApPBPTpB45e23439pKbv4oiPWG9vyLQjkO00R6KzVnYRXyy5XHH/g julO83BrEcmA2/ZyK6yp49AbUdRcxNxarx/YbphtyOBKrmnjMa5lD8clnickveHO A3GWDZwA2T1zH+ymZK5K4QdlFjDMg7RE95P9mQYFdV6k3rWhD4OVr2dVvZ7t+ZET whSDJ7dvGGG1j0Px27FsUhhXCHnD20j0TvyOyxAr7oj/1e+Utrg= =JCC9 -----END PGP SIGNATURE----- --Apple-Mail=_6F3080DD-1C91-40F7-84C8-6E083D782885--