It was brought to my attention that this bug from 2018 was
still unresolved: 32 bit emulators like QEMU were given
64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes
indicate that they are 32 bit Linux to the kernel. This
was suggested by Teo in the original thread, so I just wired
it up and it solves the problem.
Programs that need the 32 bit hash only need to issue the
personality(PER_LINUX32) call and things start working.
I made a test program like this:
#include <dirent.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/personality.h>
int main(int argc, char** argv) {
DIR* dir;
personality(PER_LINUX32);
dir = opendir("/boot");
printf("dir=%p\n", dir);
printf("readdir(dir)=%p\n", readdir(dir));
printf("errno=%d: %s\n", errno, strerror(errno));
return 0;
}
This was compiled with an ARM32 toolchain from Bootlin using
glibc 2.28 and thus suffering from the bug.
Before the patch:
$ ./readdir-bug
dir=0x86000
readdir(dir)=(nil)
errno=75: Value too large for defined data type
After the patch:
$ ./readdir-bug
dir=0x86000
readdir(dir)=0x86020
errno=0: Success
Problem solved.
Cc: Florian Weimer <[email protected]>
Cc: Peter Maydell <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Suggested-by: Theodore Ts'o <[email protected]>
Link: https://bugs.launchpad.net/qemu/+bug/1805913
Link: https://lore.kernel.org/lkml/[email protected]/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957
Signed-off-by: Linus Walleij <[email protected]>
---
fs/ext4/dir.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 9aa1f75409b0..3faf9edf3e92 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -27,6 +27,7 @@
#include <linux/slab.h>
#include <linux/iversion.h>
#include <linux/unicode.h>
+#include <linux/personality.h>
#include "ext4.h"
#include "xattr.h"
@@ -618,6 +619,14 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
static int ext4_dir_open(struct inode * inode, struct file * filp)
{
+ /*
+ * If we are currently running e.g. a 32 bit emulator on
+ * a 64 bit machine, the emulator will indicate that it needs
+ * a 32 bit personality and thus 32 bit hashes from the file
+ * system.
+ */
+ if (personality(current->personality) == PER_LINUX32)
+ filp->f_mode |= FMODE_32BITHASH;
if (IS_ENCRYPTED(inode))
return fscrypt_get_encryption_info(inode) ? -EACCES : 0;
return 0;
--
2.24.1
On Tue, 17 Mar 2020 at 11:31, Linus Walleij <[email protected]> wrote:
>
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.
Thanks for having a look at this. I'm not sure this is what
QEMU needs, though. When QEMU runs, it is not a 32-bit
process, it's a 64-bit process. Some of the syscalls
it makes are on behalf of the guest and would need 32-bit
semantics (including this one of wanting 32-bit hash sizes
in directory reads). But some syscalls it makes for itself
(either directly, or via libraries it's linked against
including glibc and glib) -- those would still want the
usual 64-bit semantics, I would have thought.
> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.
What in particular does this personality setting affect?
My copy of the personality(2) manpage just says:
PER_LINUX32 (since Linux 2.2)
[To be documented.]
which isn't very informative.
thanks
-- PMM
* Linus Walleij:
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.
>
> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.
>
> I made a test program like this:
>
> #include <dirent.h>
> #include <errno.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/personality.h>
>
> int main(int argc, char** argv) {
> DIR* dir;
> personality(PER_LINUX32);
> dir = opendir("/boot");
> printf("dir=%p\n", dir);
> printf("readdir(dir)=%p\n", readdir(dir));
> printf("errno=%d: %s\n", errno, strerror(errno));
> return 0;
> }
>
> This was compiled with an ARM32 toolchain from Bootlin using
> glibc 2.28 and thus suffering from the bug.
Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU?
(I see why not.)
However, this does not solve the issue with network file systems and
other scenarios. I still think need to add a workaround to the glibc
implementation.
On Tue, Mar 17, 2020 at 12:53 PM Florian Weimer <[email protected]> wrote:
> Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU?
> (I see why not.)
I set it in the program explicitly, but what actually happens when
I run it is that the binfmt handler invokes qemu-user so certainly
that program can set the flag, any process can.
Yours,
Linus Walleij
On Mar 17, 2020, at 5:31 AM, Linus Walleij <[email protected]> wrote:
>
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.
>
> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.
I'm generally with with this from the ext4 point of view.
That said, I'd think it would be preferable for ease of use and
compatibility that applications didn't have to be modified
(e.g. have QEMU or glibc internally set PER_LINUX32 for this
process before the 32-bit syscall is called, given that it knows
whether it is emulating a 32-bit runtime or not).
The other way to handle this would be for ARM32 to check the
PER_LINUX32 flag via is_compat_task() so that there wouldn't
need to be any changes to the ext4 code at all?
Cheers, Andreas
> I made a test program like this:
>
> #include <dirent.h>
> #include <errno.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/personality.h>
>
> int main(int argc, char** argv) {
> DIR* dir;
> personality(PER_LINUX32);
> dir = opendir("/boot");
> printf("dir=%p\n", dir);
> printf("readdir(dir)=%p\n", readdir(dir));
> printf("errno=%d: %s\n", errno, strerror(errno));
> return 0;
> }
>
> This was compiled with an ARM32 toolchain from Bootlin using
> glibc 2.28 and thus suffering from the bug.
>
> Before the patch:
>
> $ ./readdir-bug
> dir=0x86000
> readdir(dir)=(nil)
> errno=75: Value too large for defined data type
>
> After the patch:
>
> $ ./readdir-bug
> dir=0x86000
> readdir(dir)=0x86020
> errno=0: Success
>
> Problem solved.
>
> Cc: Florian Weimer <[email protected]>
> Cc: Peter Maydell <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: [email protected]
> Suggested-by: Theodore Ts'o <[email protected]>
> Link: https://bugs.launchpad.net/qemu/+bug/1805913
> Link: https://lore.kernel.org/lkml/[email protected]/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957
> Signed-off-by: Linus Walleij <[email protected]>
> ---
> fs/ext4/dir.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 9aa1f75409b0..3faf9edf3e92 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -27,6 +27,7 @@
> #include <linux/slab.h>
> #include <linux/iversion.h>
> #include <linux/unicode.h>
> +#include <linux/personality.h>
> #include "ext4.h"
> #include "xattr.h"
>
> @@ -618,6 +619,14 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
>
> static int ext4_dir_open(struct inode * inode, struct file * filp)
> {
> + /*
> + * If we are currently running e.g. a 32 bit emulator on
> + * a 64 bit machine, the emulator will indicate that it needs
> + * a 32 bit personality and thus 32 bit hashes from the file
> + * system.
> + */
> + if (personality(current->personality) == PER_LINUX32)
> + filp->f_mode |= FMODE_32BITHASH;
> if (IS_ENCRYPTED(inode))
> return fscrypt_get_encryption_info(inode) ? -EACCES : 0;
> return 0;
> --
> 2.24.1
>
Cheers, Andreas
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.5.9, v5.4.25, v4.19.109, v4.14.173, v4.9.216, v4.4.216.
v5.5.9: Build OK!
v5.4.25: Build OK!
v4.19.109: Failed to apply! Possible dependencies:
592ddec7578a ("ext4: use IS_ENCRYPTED() to check encryption status")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
v4.14.173: Failed to apply! Possible dependencies:
2ee6a576be56 ("fs, fscrypt: add an S_ENCRYPTED inode flag")
69fe2d75dd91 ("btrfs: make the delalloc block rsv per inode")
79f015f21653 ("btrfs: cleanup extent locking sequence")
8b62f87bad9c ("Btrfs: rework outstanding_extents")
ae5e165d855d ("fs: new API for handling inode->i_version")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
ee73f9a52a34 ("ext4: convert to new i_version API")
v4.9.216: Failed to apply! Possible dependencies:
39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
5e9d0e3d9ea6 ("powerpc/lib: Fix randconfig build failure in sstep.c")
783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros")
ee73f9a52a34 ("ext4: convert to new i_version API")
v4.4.216: Failed to apply! Possible dependencies:
39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
4dffbfc48d65 ("arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP")
57f4959bad0a ("arm64: kernel: Add support for User Access Override")
6c94f27ac847 ("arm64: switch to relative exception tables")
783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
7dd01aef0557 ("arm64: trap userspace "dc cvau" cache operation on errata-affected core")
9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support")
9e8e865bbe29 ("arm64: unify idmap removal")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
d5370f754875 ("arm64: prefetch: add alternative pattern for CPUs without a prefetcher")
e5bc22a42e4d ("arm64/efi: split off EFI init and runtime code for reuse by 32-bit ARM")
e7227d0e528f ("arm64: Cleanup SCTLR flags")
ee73f9a52a34 ("ext4: convert to new i_version API")
f7d924894265 ("arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <[email protected]> wrote:
> On Tue, 17 Mar 2020 at 11:31, Linus Walleij <[email protected]> wrote:
> >
> > It was brought to my attention that this bug from 2018 was
> > still unresolved: 32 bit emulators like QEMU were given
> > 64 bit hashes when running 32 bit emulation on 64 bit systems.
> >
> > The personality(2) system call supports to let processes
> > indicate that they are 32 bit Linux to the kernel. This
> > was suggested by Teo in the original thread, so I just wired
> > it up and it solves the problem.
>
> Thanks for having a look at this. I'm not sure this is what
> QEMU needs, though. When QEMU runs, it is not a 32-bit
> process, it's a 64-bit process. Some of the syscalls
> it makes are on behalf of the guest and would need 32-bit
> semantics (including this one of wanting 32-bit hash sizes
> in directory reads). But some syscalls it makes for itself
> (either directly, or via libraries it's linked against
> including glibc and glib) -- those would still want the
> usual 64-bit semantics, I would have thought.
The "personality" thing is a largely unused facility that
a process can use to say it has this generic behaviour.
In this case we say we have the PER_LINUX32 personality
so it will make the process evoke 32bit behaviours inside
the kernel when dealing with this process.
Which I (loosely) appreciate that we want to achieve.
> > Programs that need the 32 bit hash only need to issue the
> > personality(PER_LINUX32) call and things start working.
>
> What in particular does this personality setting affect?
> My copy of the personality(2) manpage just says:
>
> PER_LINUX32 (since Linux 2.2)
> [To be documented.]
>
> which isn't very informative.
It's not a POSIX thing (not part of the Single Unix Specification)
so as with most Linux things it has some fuzzy semantics
defined by the community...
I usually just go to the source.
If you grep the kernel what turns up is a bunch of
architecture-specific behaviors on arm64, ia64, mips, parisc,
powerpc, s390, sparc. They are very minor.
On x86_64 the semantic effect is
none so this would work for any kind of 32bit userspace
on x86_64. It would be the first time this flag has any
effect at all on x86_64, but arguably a useful one.
I would not be surprised if running say i586 on x86_64
has the same problem, just that noone has reported
it yet. But what do I know. If they have the same problem
they can use the same solution. Hm QEMU supports
emulating i586 or even i386 ... maybe you could test?
Or tell me how to test? We might be solving a bigger
issue here.
Yours,
Linus Walleij
On Tue, Mar 17, 2020 at 11:29 PM Andreas Dilger <[email protected]> wrote:
> That said, I'd think it would be preferable for ease of use and
> compatibility that applications didn't have to be modified
> (e.g. have QEMU or glibc internally set PER_LINUX32 for this
> process before the 32-bit syscall is called, given that it knows
> whether it is emulating a 32-bit runtime or not).
I think the plan is that QEMU set this, either directly when
you run qemu-user or through the binfmt handler.
https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
IIUC the binfmt handler is invoked when you try to
execute ELF so-and-so-for-arch-so-and-so when you
are not that arch yourself. So that can set up this
personality.
Not that I know how the binfmt handler works, I am just
assuming that thing is some program that can issue
syscalls. It JustWorks for me after installing the QEMU
packages...
The problem still stands that userspace need to somehow
inform kernelspace that 32bit is going on, and this was the
best I could think of.
Yours,
Linus Walleij
On Thu, 19 Mar 2020 at 15:13, Linus Walleij <[email protected]> wrote:
> On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <[email protected]> wrote:
> > What in particular does this personality setting affect?
> > My copy of the personality(2) manpage just says:
> >
> > PER_LINUX32 (since Linux 2.2)
> > [To be documented.]
> >
> > which isn't very informative.
>
> It's not a POSIX thing (not part of the Single Unix Specification)
> so as with most Linux things it has some fuzzy semantics
> defined by the community...
>
> I usually just go to the source.
If we're going to decide that this is the way to say
"give me 32-bit semantics" we need to actually document
that and define in at least broad terms what we mean
by it, so that when new things are added that might or
might not check against the setting there is a reference
defining whether they should or not, and so that
userspace knows what it's opting into by setting the flag.
The kernel loves undocumented APIs but userspace
consumers of them are not so enamoured :-)
As a concrete example, should "give me 32-bit semantics
via PER_LINUX32" mean "mmap should always return addresses
within 4GB" ? That would seem like it would make sense --
but on the other hand it would make it absolutely unusable
for QEMU's purposes, because we want to be able to
do full 64-bit mmap() for our own internal allocations.
(This is a specific example of the general case that
I'm dubious about having this be a process-wide switch,
because QEMU is a single process which sometimes
makes syscalls on its own behalf and sometimes makes
syscalls on behalf of the guest program it is emulating.
We want 32-bit semantics for the latter but if we
also get them for the former then there might be
unintended side effects.)
> I would not be surprised if running say i586 on x86_64
> has the same problem, just that noone has reported
> it yet. But what do I know. If they have the same problem
> they can use the same solution. Hm QEMU supports
> emulating i586 or even i386 ... maybe you could test?
Native i586 code on x86-64 should be fine, because it
will be using the compat syscalls, which ext4 already
ensures get the 32-bit sized hash (see hash2pos() and
friends).
thanks
-- PMM
On Thu, Mar 19, 2020 at 4:25 PM Peter Maydell <[email protected]> wrote:
> On Thu, 19 Mar 2020 at 15:13, Linus Walleij <[email protected]> wrote:
> > On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <[email protected]> wrote:
> > > What in particular does this personality setting affect?
> > > My copy of the personality(2) manpage just says:
> > >
> > > PER_LINUX32 (since Linux 2.2)
> > > [To be documented.]
> > >
> > > which isn't very informative.
> >
> > It's not a POSIX thing (not part of the Single Unix Specification)
> > so as with most Linux things it has some fuzzy semantics
> > defined by the community...
> >
> > I usually just go to the source.
>
> If we're going to decide that this is the way to say
> "give me 32-bit semantics" we need to actually document
> that and define in at least broad terms what we mean
> by it, so that when new things are added that might or
> might not check against the setting there is a reference
> defining whether they should or not, and so that
> userspace knows what it's opting into by setting the flag.
> The kernel loves undocumented APIs but userspace
> consumers of them are not so enamoured :-)
OK I guess we can at least take this opportunity to add
some kerneldoc to the include file.
> As a concrete example, should "give me 32-bit semantics
> via PER_LINUX32" mean "mmap should always return addresses
> within 4GB" ? That would seem like it would make sense --
Incidentally that thing in particular has its own personality
flag (personalities are additive, it's a bit schizophrenic)
so PER_LINUX_32BIT is defined as:
PER_LINUX_32BIT = 0x0000 | ADDR_LIMIT_32BIT,
and that is specifically for limiting the address space to
32bit.
There is also PER_LINUX32_3GB for a 3GB lowmem
limit.
Since the personality is kind of additive, if
we want a flag *specifically* for indicating that we want
32bit hashes from the file system, there are bits left so we
can provide that.
Is this what we want to do? I just think we shouldn't
decide on that lightly as we will be using up personality
bug bits, but sometimes you have to use them.
PER_LINUX32 as it stands means 32bit personality
but very specifically does not include memory range
limitations since that has its own flags.
Yours,
Linus Walleij
On Thu, Mar 19, 2020 at 11:23:33PM +0100, Linus Walleij wrote:
> OK I guess we can at least take this opportunity to add
> some kerneldoc to the include file.
>
> > As a concrete example, should "give me 32-bit semantics
> > via PER_LINUX32" mean "mmap should always return addresses
> > within 4GB" ? That would seem like it would make sense --
>
> Incidentally that thing in particular has its own personality
> flag (personalities are additive, it's a bit schizophrenic)
> so PER_LINUX_32BIT is defined as:
> PER_LINUX_32BIT = 0x0000 | ADDR_LIMIT_32BIT,
> and that is specifically for limiting the address space to
> 32bit.
>
> There is also PER_LINUX32_3GB for a 3GB lowmem
> limit.
>
> Since the personality is kind of additive, if
> we want a flag *specifically* for indicating that we want
> 32bit hashes from the file system, there are bits left so we
> can provide that.
>
> Is this what we want to do? I just think we shouldn't
> decide on that lightly as we will be using up personality
> bug bits, but sometimes you have to use them.
I've been looking at the personality bug bits more detailed, and it
looks... messy. Do we pick a new personality, or do we grab another
unique flag?
Another possibility, which would be messier for qemu, would be use a
flag set via fcntl. That would require qemu from noticing when the
guest is calling open, openat, or openat2, and then inserting a fcntl
system call to set the 32-bit readdir mode. That's cleaner from the
kernel interface complexity perspective, but it's messier for qemu.
- Ted
On Tue, 24 Mar 2020 at 02:34, Theodore Y. Ts'o <[email protected]> wrote:
> Another possibility, which would be messier for qemu, would be use a
> flag set via fcntl. That would require qemu from noticing when the
> guest is calling open, openat, or openat2, and then inserting a fcntl
> system call to set the 32-bit readdir mode. That's cleaner from the
> kernel interface complexity perspective, but it's messier for qemu.
On the contrary, that would be a much better interface for QEMU.
We always know when we're doing an open-syscall on behalf
of the guest, and it would be trivial to make the fcntl() call then.
That would ensure that we don't accidentally get the
'32-bit semantics' on file descriptors QEMU opens for its own
purposes, and wouldn't leave us open to the risk in future that
setting the PER_LINUX32 flag for all of QEMU causes
unexpected extra behaviour in future kernels that would be correct
for the guest binary but wrong/broken for QEMU's own internals.
thanks
-- PMM
On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
>
> On the contrary, that would be a much better interface for QEMU.
> We always know when we're doing an open-syscall on behalf
> of the guest, and it would be trivial to make the fcntl() call then.
> That would ensure that we don't accidentally get the
> '32-bit semantics' on file descriptors QEMU opens for its own
> purposes, and wouldn't leave us open to the risk in future that
> setting the PER_LINUX32 flag for all of QEMU causes
> unexpected extra behaviour in future kernels that would be correct
> for the guest binary but wrong/broken for QEMU's own internals.
If using a flag set by fcntl is better for qemu, then by all means
let's go with that instead of using a personality flag/number.
Linus, do you have what you need to do a respin of the patch?
- Ted
On Tue, Mar 24, 2020 at 7:48 PM Theodore Y. Ts'o <[email protected]> wrote:
> On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
> >
> > On the contrary, that would be a much better interface for QEMU.
> > We always know when we're doing an open-syscall on behalf
> > of the guest, and it would be trivial to make the fcntl() call then.
> > That would ensure that we don't accidentally get the
> > '32-bit semantics' on file descriptors QEMU opens for its own
> > purposes, and wouldn't leave us open to the risk in future that
> > setting the PER_LINUX32 flag for all of QEMU causes
> > unexpected extra behaviour in future kernels that would be correct
> > for the guest binary but wrong/broken for QEMU's own internals.
>
> If using a flag set by fcntl is better for qemu, then by all means
> let's go with that instead of using a personality flag/number.
>
> Linus, do you have what you need to do a respin of the patch?
Absolutely, I'm a bit occupied this week but I will try to get to it
early next week!
Thanks a lot for the directions here, it's highly valuable.
Yours,
Linus Walleij