2010-12-14 16:59:39

by Josh Hunt

[permalink] [raw]
Subject: Re: cpumask: fix compat getaffinity

I realize this thread is fairly old, but I have a question about this
statement in the changelog:

> On NUMA systems with less than 32 possibly CPUs, the
> current compat_sys_sched_getaffinity now returns '4'
> instead of the actual NR_CPUS/8, which makes libnuma
> bail out when parsing the cpumap.

This does not seem accurate, at least not on my system with NR_CPUS=32
and nr_cpu_ids=8. The smallest value retlen will ever be set to is
sizeof(long) which is 8 on most modern systems. This breaks the
statement that this function will now return NR_CPUS/8.

Thanks
Josh


On Sun, May 16, 2010 at 11:04 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> On Wed, 12 May 2010 06:00:45 pm Milton Miller wrote:
>> >
>> > At least for parsing, we need to allocate and parse NR_CPUS until
>> > all places like arch/powerpc/platforms/pseries/xics.c that compare a
>> > user-supplied mask to CPUMASK_ALL are eliminated.
>>
>> Good point. ?Anton will want to fix those anyway for CONFIG_CPUMASK_OFFSTACK,
>> too, but that's the reason the parsing uses nr_cpumask_bits.
>>
>> > > Would it make sense to use my initial patch for -stable, which reverts
>> > > the ABI back to before the change that caused the problem, but apply
>> > > the correct fix (changing the ABI throughout) for future releases?
>> >
>> > This would definitly be the conservative fix.
>>
>> Instead of changing back to NR_CPUS which will break libnuma for
>> CPUMASK_OFFSTACK, how about changing it to nr_cpumask_bits and having an
>> explicit comment above it:
>
> Yes and No.
>
> 1) sched_getaffinity syscall is used from glibc and libnuma.
> 2) glibc doesn't use the return value almostly. glibc emulate it as NR_CPUS=1024.
> 3) Now, both sched_getaffinity() and compat_sys_sched_getaffinity() have nr_cpu_ids thing.
> 4) But only compat_sys_sched_getaffinity() hit libnuma problem.
>
> I think It mean compat_sys_sched_getaffinity() should behave as sched_getaffinity().
> IOW, libnuma assume compat_sys_sched_getaffinity() return len args or NR_CPUS.
> then, following patch do it. I confirmed the patch works with or without CPUMASK_OFFSTACK.
>
> So, My proposal is
> ? ? ? ?1) merge both mine and yours to linus tree
> ? ? ? ?2) but backport only mine
>
>
> How do you think?
>
>
> ==========================================================
> From: KOSAKI Motohiro <[email protected]>
> Subject: [PATCH] cpumask: fix compat getaffinity
>
> Commit a45185d2d "cpumask: convert kernel/compat.c" broke
> libnuma, which abuses sched_getaffinity to find out NR_CPUS
> in order to parse /sys/devices/system/node/node*/cpumap.
>
> On NUMA systems with less than 32 possibly CPUs, the
> current compat_sys_sched_getaffinity now returns '4'
> instead of the actual NR_CPUS/8, which makes libnuma
> bail out when parsing the cpumap.
>
> The libnuma call sched_getaffinity(0, bitmap, 4096)
> at first. It mean the libnuma expect the return value of
> sched_getaffinity() is either len argument or NR_CPUS.
> But it doesn't expect to return nr_cpu_ids.
>
> Strictly speaking, userland requirement are
>
> 1) Glibc assume the return value mean the lengh of initialized
> ? of mask argument. E.g. if sched_getaffinity(1024) return 128,
> ? glibc make zero fill rest 896 byte.
> 2) Libnuma assume the return value can be used to guess NR_CPUS
> ? in kernel. It assume len-arg<NR_CPUS makes -EINVAL. But
> ? it try len=4096 at first and 4096 is always bigger than
> ? NR_CPUS. Then, if we remove strange min_length normalization,
> ? we never hit -EINVAL case.
>
> sched_getaffinity() already solved this issue. This patch
> adapt compat_sys_sched_getaffinity() as it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> ?kernel/compat.c | ? 25 +++++++++++--------------
> ?1 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/compat.c b/kernel/compat.c
> index 7f40e92..5adab05 100644
> --- a/kernel/compat.c
> +++ b/kernel/compat.c
> @@ -495,29 +495,26 @@ asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid, unsigned int len,
> ?{
> ? ? ? ?int ret;
> ? ? ? ?cpumask_var_t mask;
> - ? ? ? unsigned long *k;
> - ? ? ? unsigned int min_length = cpumask_size();
> -
> - ? ? ? if (nr_cpu_ids <= BITS_PER_COMPAT_LONG)
> - ? ? ? ? ? ? ? min_length = sizeof(compat_ulong_t);
>
> - ? ? ? if (len < min_length)
> + ? ? ? if ((len * BITS_PER_BYTE) < nr_cpu_ids)
> + ? ? ? ? ? ? ? return -EINVAL;
> + ? ? ? if (len & (sizeof(compat_ulong_t)-1))
> ? ? ? ? ? ? ? ?return -EINVAL;
>
> ? ? ? ?if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> ? ? ? ? ? ? ? ?return -ENOMEM;
>
> ? ? ? ?ret = sched_getaffinity(pid, mask);
> - ? ? ? if (ret < 0)
> - ? ? ? ? ? ? ? goto out;
> + ? ? ? if (ret == 0) {
> + ? ? ? ? ? ? ? size_t retlen = min_t(size_t, len, cpumask_size());
>
> - ? ? ? k = cpumask_bits(mask);
> - ? ? ? ret = compat_put_bitmap(user_mask_ptr, k, min_length * 8);
> - ? ? ? if (ret == 0)
> - ? ? ? ? ? ? ? ret = min_length;
> -
> -out:
> + ? ? ? ? ? ? ? if (compat_put_bitmap(user_mask_ptr, cpumask_bits(mask), retlen * 8))
> + ? ? ? ? ? ? ? ? ? ? ? ret = -EFAULT;
> + ? ? ? ? ? ? ? else
> + ? ? ? ? ? ? ? ? ? ? ? ret = retlen;
> + ? ? ? }
> ? ? ? ?free_cpumask_var(mask);
> +
> ? ? ? ?return ret;
> ?}
>
> --
> 1.6.5.2
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>


2010-12-15 14:41:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: cpumask: fix compat getaffinity

On Tuesday 14 December 2010, Josh Hunt wrote:
>
> I realize this thread is fairly old, but I have a question about this
> statement in the changelog:
>
> > On NUMA systems with less than 32 possibly CPUs, the
> > current compat_sys_sched_getaffinity now returns '4'
> > instead of the actual NR_CPUS/8, which makes libnuma
> > bail out when parsing the cpumap.
>
> This does not seem accurate, at least not on my system with NR_CPUS=32
> and nr_cpu_ids=8. The smallest value retlen will ever be set to is
> sizeof(long) which is 8 on most modern systems. This breaks the
> statement that this function will now return NR_CPUS/8.

Yes, I think the description is a bit misleading there.
The point was that it should return the number that the
sysfs files are based on, instead of the smallest multiple
of sizeof(compat_long) that can hold the installed CPUs.

Arnd