2010-02-11 20:00:58

by Suresh Siddha

[permalink] [raw]
Subject: [patch v3 2/2] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

Generic support for PTRACE_GETREGSET/PTRACE_SETREGSET commands which
export the regsets supported by each architecture using the correponding
NT_* types. These NT_* types are already part of the userland ABI, used
in representing the architecture specific register sets as different NOTES
in an ELF core file.

'addr' parameter for the ptrace system call encode the REGSET type (using
the corresppnding NT_* type) and the 'data' parameter points to the
struct iovec having the user buffer and the length of that buffer.

struct iovec iov = { buf, len};
ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);

On successful completion, iov.len will be updated by the kernel specifying
how much the kernel has written/read to/from the user's iov.buf.

x86 extended state registers are primarily exported using this interface.

Signed-off-by: Suresh Siddha <[email protected]>
Acked-by: Hongjiu Lu <[email protected]>
---
include/linux/elf.h | 6 ++-
include/linux/ptrace.h | 15 ++++++++
kernel/ptrace.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 1 deletion(-)

Index: tip/include/linux/ptrace.h
===================================================================
--- tip.orig/include/linux/ptrace.h
+++ tip/include/linux/ptrace.h
@@ -27,6 +27,21 @@
#define PTRACE_GETSIGINFO 0x4202
#define PTRACE_SETSIGINFO 0x4203

+/*
+ * Generic ptrace interface that exports the architecture specific regsets
+ * using the corresponding NT_* types (which are also used in the core dump).
+ *
+ * This interface usage is as follows:
+ * struct iovec iov = { buf, len};
+ *
+ * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
+ *
+ * On the successful completion, iov.len will be updated by the kernel,
+ * specifying how much the kernel has written/read to/from the user's iov.buf.
+ */
+#define PTRACE_GETREGSET 0x4204
+#define PTRACE_SETREGSET 0x4205
+
/* options set using PTRACE_SETOPTIONS */
#define PTRACE_O_TRACESYSGOOD 0x00000001
#define PTRACE_O_TRACEFORK 0x00000002
Index: tip/kernel/ptrace.c
===================================================================
--- tip.orig/kernel/ptrace.c
+++ tip/kernel/ptrace.c
@@ -22,6 +22,7 @@
#include <linux/pid_namespace.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
+#include <linux/regset.h>


/*
@@ -511,6 +512,47 @@ static int ptrace_resume(struct task_str
return 0;
}

+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+
+static const struct user_regset *
+find_regset(const struct user_regset_view *view, unsigned int type)
+{
+ const struct user_regset *regset;
+ int n;
+
+ for (n = 0; n < view->n; ++n) {
+ regset = view->regsets + n;
+ if (regset->core_note_type == type)
+ return regset;
+ }
+
+ return NULL;
+}
+
+static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
+ struct iovec *kiov)
+{
+ const struct user_regset_view *view = task_user_regset_view(task);
+ const struct user_regset *regset = find_regset(view, type);
+ int regset_no;
+
+ if (!regset || (kiov->iov_len % regset->size) != 0)
+ return -EIO;
+
+ regset_no = regset - view->regsets;
+ kiov->iov_len = min(kiov->iov_len,
+ (__kernel_size_t) (regset->n * regset->size));
+
+ if (req == PTRACE_GETREGSET)
+ return copy_regset_to_user(task, view, regset_no, 0,
+ kiov->iov_len, kiov->iov_base);
+ else
+ return copy_regset_from_user(task, view, regset_no, 0,
+ kiov->iov_len, kiov->iov_base);
+}
+
+#endif
+
int ptrace_request(struct task_struct *child, long request,
long addr, long data)
{
@@ -573,6 +615,26 @@ int ptrace_request(struct task_struct *c
return 0;
return ptrace_resume(child, request, SIGKILL);

+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ case PTRACE_GETREGSET:
+ case PTRACE_SETREGSET:
+ {
+ struct iovec kiov;
+ struct iovec __user *uiov = (struct iovec __user *) data;
+
+ if (!access_ok(VERIFY_WRITE, uiov, sizeof(*uiov)))
+ return -EFAULT;
+
+ if (__get_user(kiov.iov_base, &uiov->iov_base) ||
+ __get_user(kiov.iov_len, &uiov->iov_len))
+ return -EFAULT;
+
+ ret = ptrace_regset(child, request, addr, &kiov);
+ if (!ret)
+ ret = __put_user(kiov.iov_len, &uiov->iov_len);
+ break;
+ }
+#endif
default:
break;
}
@@ -711,6 +773,32 @@ int compat_ptrace_request(struct task_st
else
ret = ptrace_setsiginfo(child, &siginfo);
break;
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ case PTRACE_GETREGSET:
+ case PTRACE_SETREGSET:
+ {
+ struct iovec kiov;
+ struct compat_iovec __user *uiov =
+ (struct compat_iovec __user *) datap;
+ compat_uptr_t ptr;
+ compat_size_t len;
+
+ if (!access_ok(VERIFY_WRITE, uiov, sizeof(*uiov)))
+ return -EFAULT;
+
+ if (__get_user(ptr, &uiov->iov_base) ||
+ __get_user(len, &uiov->iov_len))
+ return -EFAULT;
+
+ kiov.iov_base = compat_ptr(ptr);
+ kiov.iov_len = len;
+
+ ret = ptrace_regset(child, request, addr, &kiov);
+ if (!ret)
+ ret = __put_user(kiov.iov_len, &uiov->iov_len);
+ break;
+ }
+#endif

default:
ret = ptrace_request(child, request, addr, data);
Index: tip/include/linux/elf.h
===================================================================
--- tip.orig/include/linux/elf.h
+++ tip/include/linux/elf.h
@@ -349,7 +349,11 @@ typedef struct elf64_shdr {
#define ELF_OSABI ELFOSABI_NONE
#endif

-/* Notes used in ET_CORE */
+/*
+ * Notes used in ET_CORE. Architectures export some of the arch register sets
+ * using the corresponding note types via the PTRACE_GETREGSET and
+ * PTRACE_SETREGSET requests.
+ */
#define NT_PRSTATUS 1
#define NT_PRFPREG 2
#define NT_PRPSINFO 3


2010-02-11 23:19:37

by Suresh Siddha

[permalink] [raw]
Subject: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

Commit-ID: 2225a122ae26d542bdce523d9d87a4a7ba10e07b
Gitweb: http://git.kernel.org/tip/2225a122ae26d542bdce523d9d87a4a7ba10e07b
Author: Suresh Siddha <[email protected]>
AuthorDate: Thu, 11 Feb 2010 11:51:00 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 11 Feb 2010 15:08:33 -0800

ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

Generic support for PTRACE_GETREGSET/PTRACE_SETREGSET commands which
export the regsets supported by each architecture using the correponding
NT_* types. These NT_* types are already part of the userland ABI, used
in representing the architecture specific register sets as different NOTES
in an ELF core file.

'addr' parameter for the ptrace system call encode the REGSET type (using
the corresppnding NT_* type) and the 'data' parameter points to the
struct iovec having the user buffer and the length of that buffer.

struct iovec iov = { buf, len};
ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);

On successful completion, iov.len will be updated by the kernel specifying
how much the kernel has written/read to/from the user's iov.buf.

x86 extended state registers are primarily exported using this interface.

Signed-off-by: Suresh Siddha <[email protected]>
LKML-Reference: <[email protected]>
Acked-by: Hongjiu Lu <[email protected]>
Cc: Roland McGrath <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
include/linux/elf.h | 6 +++-
include/linux/ptrace.h | 15 ++++++++
kernel/ptrace.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/include/linux/elf.h b/include/linux/elf.h
index a8c4af0..d8e6e61 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -349,7 +349,11 @@ typedef struct elf64_shdr {
#define ELF_OSABI ELFOSABI_NONE
#endif

-/* Notes used in ET_CORE */
+/*
+ * Notes used in ET_CORE. Architectures export some of the arch register sets
+ * using the corresponding note types via the PTRACE_GETREGSET and
+ * PTRACE_SETREGSET requests.
+ */
#define NT_PRSTATUS 1
#define NT_PRFPREG 2
#define NT_PRPSINFO 3
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 56f2d63..dbfa821 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -27,6 +27,21 @@
#define PTRACE_GETSIGINFO 0x4202
#define PTRACE_SETSIGINFO 0x4203

+/*
+ * Generic ptrace interface that exports the architecture specific regsets
+ * using the corresponding NT_* types (which are also used in the core dump).
+ *
+ * This interface usage is as follows:
+ * struct iovec iov = { buf, len};
+ *
+ * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
+ *
+ * On the successful completion, iov.len will be updated by the kernel,
+ * specifying how much the kernel has written/read to/from the user's iov.buf.
+ */
+#define PTRACE_GETREGSET 0x4204
+#define PTRACE_SETREGSET 0x4205
+
/* options set using PTRACE_SETOPTIONS */
#define PTRACE_O_TRACESYSGOOD 0x00000001
#define PTRACE_O_TRACEFORK 0x00000002
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 23bd09c..13b4554 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -22,6 +22,7 @@
#include <linux/pid_namespace.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
+#include <linux/regset.h>


/*
@@ -511,6 +512,47 @@ static int ptrace_resume(struct task_struct *child, long request, long data)
return 0;
}

+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+
+static const struct user_regset *
+find_regset(const struct user_regset_view *view, unsigned int type)
+{
+ const struct user_regset *regset;
+ int n;
+
+ for (n = 0; n < view->n; ++n) {
+ regset = view->regsets + n;
+ if (regset->core_note_type == type)
+ return regset;
+ }
+
+ return NULL;
+}
+
+static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
+ struct iovec *kiov)
+{
+ const struct user_regset_view *view = task_user_regset_view(task);
+ const struct user_regset *regset = find_regset(view, type);
+ int regset_no;
+
+ if (!regset || (kiov->iov_len % regset->size) != 0)
+ return -EIO;
+
+ regset_no = regset - view->regsets;
+ kiov->iov_len = min(kiov->iov_len,
+ (__kernel_size_t) (regset->n * regset->size));
+
+ if (req == PTRACE_GETREGSET)
+ return copy_regset_to_user(task, view, regset_no, 0,
+ kiov->iov_len, kiov->iov_base);
+ else
+ return copy_regset_from_user(task, view, regset_no, 0,
+ kiov->iov_len, kiov->iov_base);
+}
+
+#endif
+
int ptrace_request(struct task_struct *child, long request,
long addr, long data)
{
@@ -573,6 +615,26 @@ int ptrace_request(struct task_struct *child, long request,
return 0;
return ptrace_resume(child, request, SIGKILL);

+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ case PTRACE_GETREGSET:
+ case PTRACE_SETREGSET:
+ {
+ struct iovec kiov;
+ struct iovec __user *uiov = (struct iovec __user *) data;
+
+ if (!access_ok(VERIFY_WRITE, uiov, sizeof(*uiov)))
+ return -EFAULT;
+
+ if (__get_user(kiov.iov_base, &uiov->iov_base) ||
+ __get_user(kiov.iov_len, &uiov->iov_len))
+ return -EFAULT;
+
+ ret = ptrace_regset(child, request, addr, &kiov);
+ if (!ret)
+ ret = __put_user(kiov.iov_len, &uiov->iov_len);
+ break;
+ }
+#endif
default:
break;
}
@@ -711,6 +773,32 @@ int compat_ptrace_request(struct task_struct *child, compat_long_t request,
else
ret = ptrace_setsiginfo(child, &siginfo);
break;
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ case PTRACE_GETREGSET:
+ case PTRACE_SETREGSET:
+ {
+ struct iovec kiov;
+ struct compat_iovec __user *uiov =
+ (struct compat_iovec __user *) datap;
+ compat_uptr_t ptr;
+ compat_size_t len;
+
+ if (!access_ok(VERIFY_WRITE, uiov, sizeof(*uiov)))
+ return -EFAULT;
+
+ if (__get_user(ptr, &uiov->iov_base) ||
+ __get_user(len, &uiov->iov_len))
+ return -EFAULT;
+
+ kiov.iov_base = compat_ptr(ptr);
+ kiov.iov_len = len;
+
+ ret = ptrace_regset(child, request, addr, &kiov);
+ if (!ret)
+ ret = __put_user(kiov.iov_len, &uiov->iov_len);
+ break;
+ }
+#endif

default:
ret = ptrace_request(child, request, addr, data);

2010-02-12 03:57:21

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch v3 2/2] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

Note that this patch and the xstate user_regset patch are entirely independent.
They can me merged in any order or one without the other.

> +/*
> + * Generic ptrace interface that exports the architecture specific regsets
> + * using the corresponding NT_* types (which are also used in the core dump).

There is a special case here, which I think already works as we intend it
to, but which should be clarified in the comment about this API. The
NT_PRSTATUS note type in a core dump contains a full 'struct elf_prstatus'.
But the user_regset for NT_PRSTATUS contains just the elf_gregset_t that
is the pr_reg field of 'struct elf_prstatus'.

For all the other user_regset flavors, the user_regset layout and the ELF
core dump note payload are exactly the same layout, as your comment implies.

> +static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
> + struct iovec *kiov)
> +{
> + const struct user_regset_view *view = task_user_regset_view(task);
> + const struct user_regset *regset = find_regset(view, type);
> + int regset_no;
> +
> + if (!regset || (kiov->iov_len % regset->size) != 0)
> + return -EIO;

My inclination would be to diagnose these more specifically. For a bad
size, give -EINVAL. For an unknown regset type, give maybe -EINVAL or
maybe -ENODEV. (-ENODEV is what you get for a known NT_* type that has a
user_regset implemented in the kernel, but that the particular hardware
we're running on doesn't support. So perhaps you don't want to overload
that for a wholly unrecognized NT_* type.)

Otherwise, looks good to me. ACK contingent on Oleg's ACK.


Thanks,
Roland

2010-02-12 16:00:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch v3 2/2] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

On 02/11, Roland McGrath wrote:
>
> > +static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
> > + struct iovec *kiov)
> > +{
> > + const struct user_regset_view *view = task_user_regset_view(task);
> > + const struct user_regset *regset = find_regset(view, type);
> > + int regset_no;
> > +
> > + if (!regset || (kiov->iov_len % regset->size) != 0)
> > + return -EIO;
>
> My inclination would be to diagnose these more specifically.

Agreed.

Otherwise I think the patch is fine.

Oleg.

2010-02-22 09:07:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET


* tip-bot for Suresh Siddha <[email protected]> wrote:

> Commit-ID: 2225a122ae26d542bdce523d9d87a4a7ba10e07b
> Gitweb: http://git.kernel.org/tip/2225a122ae26d542bdce523d9d87a4a7ba10e07b
> Author: Suresh Siddha <[email protected]>
> AuthorDate: Thu, 11 Feb 2010 11:51:00 -0800
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Thu, 11 Feb 2010 15:08:33 -0800
>
> ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

FYI, this commit broke tip:master on PARISC (other architectures are fine):

kernel/built-in.o: In function `ptrace_request':
(.text.ptrace_request+0x2cc): undefined reference to `task_user_regset_view'

I'll keep them in tip:master to get them tested, but note that i cannot push
any of these patches into linux-next until this is fixed, as linux-next
requires all architectures to build, with no regard to which architectures are
tested by kernel testers in practice.

Ingo

2010-02-22 09:33:31

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next requiements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

Hi Ingo,

On Mon, 22 Feb 2010 10:07:10 +0100 Ingo Molnar <[email protected]> wrote:
>
> I'll keep them in tip:master to get them tested, but note that i cannot push
> any of these patches into linux-next until this is fixed, as linux-next
> requires all architectures to build, with no regard to which architectures are
> tested by kernel testers in practice.

I merely expect people not to push known broken code into linux-next.
linux-next breaks on all sorts of architectures all the time (check
http://kisskb.ellerman.id.au/kisskb/branch/9/). One of the reasons
linux-next exists is so that all kernel developers don't have to build
test on all architectures all the time. (And, yes, I know that linux-next
doesn't build all architectures or all configs.)

That being said, I will complain about breakages on some architectures
more than others since they are in my personal builds (as opposed to the
over night builds).

That being said (:-)), I am sure that the PARISC guys would appreciate
the original problem being fixed.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.11 kB)
(No filename) (198.00 B)
Download all attachments

2010-02-22 10:28:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requiements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)


* Stephen Rothwell <[email protected]> wrote:

> Hi Ingo,
>
> On Mon, 22 Feb 2010 10:07:10 +0100 Ingo Molnar <[email protected]> wrote:
> >
> > I'll keep them in tip:master to get them tested, but note that i cannot
> > push any of these patches into linux-next until this is fixed, as
> > linux-next requires all architectures to build, with no regard to which
> > architectures are tested by kernel testers in practice.
>
> I merely expect people not to push known broken code into linux-next.

FYI, this 'mere' kind of indiscriminate definition of 'breakage' is what i am
talking about.

The occasional driver build breakage can be tested relatively easily: one
allyesconfig build and it's done. Build testing 22 architectures is
exponentially harder: it requires the setup (and constant maintenance) of
zillions of tool-chains, plus the build time is significant as well.

So this kind of linux-next requirement causes the over-testing of code that
doesnt get all that much active usage, plus it increases build testing
overhead 10-fold. That, by definition, causes the under-testing of code that
_does_ matter a whole lot more to active testers of the Linux kernel.

Which is a problem, obviously.

Ingo

2010-02-22 11:48:06

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

Hi Ingo,

On Mon, 22 Feb 2010 11:27:45 +0100 Ingo Molnar <[email protected]> wrote:
>
> * Stephen Rothwell <[email protected]> wrote:
>
> > On Mon, 22 Feb 2010 10:07:10 +0100 Ingo Molnar <[email protected]> wrote:
> > >
> > > I'll keep them in tip:master to get them tested, but note that i cannot
> > > push any of these patches into linux-next until this is fixed, as
> > > linux-next requires all architectures to build, with no regard to which
> > > architectures are tested by kernel testers in practice.
> >
> > I merely expect people not to push known broken code into linux-next.
>
> FYI, this 'mere' kind of indiscriminate definition of 'breakage' is what i am
> talking about.

OK, let me remove "merely" from this.

I expect people not to push known broken code into linux-next. Code in
linux-next is meant to be as ready as possible to be sent to Linus. If
you know that it breaks some architecture then it should obviously be
fixed some how (unless the architecture maintainer really doesn't care,
of course).

This is different from not knowing that it breaks some architecture even
though you have done reasonable testing.

> The occasional driver build breakage can be tested relatively easily: one
> allyesconfig build and it's done. Build testing 22 architectures is
> exponentially harder: it requires the setup (and constant maintenance) of
> zillions of tool-chains, plus the build time is significant as well.
>
> So this kind of linux-next requirement causes the over-testing of code that
> doesnt get all that much active usage, plus it increases build testing
> overhead 10-fold. That, by definition, causes the under-testing of code that
> _does_ matter a whole lot more to active testers of the Linux kernel.

Which is why linux-next does *not* require that. (Did you read the part
of my email that you removed?) I do point out when build failures occur
(that is part of the point of linux-next after all) but they only upset
me when it is clear that the code that has been changed was not built at
all (which doesn't happen too often).

> Which is a problem, obviously.

It certainly would be.

Maybe I don't understand what you are trying to say.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (2.24 kB)
(No filename) (198.00 B)
Download all attachments

2010-02-22 18:37:50

by Roland McGrath

[permalink] [raw]
Subject: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET

> FYI, this commit broke tip:master on PARISC (other architectures are fine):
>
> kernel/built-in.o: In function `ptrace_request':
> (.text.ptrace_request+0x2cc): undefined reference to `task_user_regset_view'

This means that parisc failed to meet the documented requirements for
setting CONFIG_HAVE_ARCH_TRACEHOOK, but set it anyway. If arch folks don't
follow the specs, it defeats the whole purpose of having clear statements
of requirements for arch code.

> I'll keep them in tip:master to get them tested, but note that i cannot push
> any of these patches into linux-next until this is fixed, as linux-next
> requires all architectures to build, with no regard to which architectures are
> tested by kernel testers in practice.

IMHO if parisc does not finish up its requirements RSN, it should instead:

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 524d935..f388dc6 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -18,7 +18,6 @@ config PARISC
select BUG
select HAVE_PERF_EVENTS
select GENERIC_ATOMIC64 if !64BIT
- select HAVE_ARCH_TRACEHOOK
help
The PA-RISC microprocessor is designed by Hewlett-Packard and used
in many of their workstations & servers (HP9000 700 and 800 series,


Thanks,
Roland

2010-02-22 22:58:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

On 02/22/2010 03:47 AM, Stephen Rothwell wrote:
>>
>> So this kind of linux-next requirement causes the over-testing of code that
>> doesnt get all that much active usage, plus it increases build testing
>> overhead 10-fold. That, by definition, causes the under-testing of code that
>> _does_ matter a whole lot more to active testers of the Linux kernel.
>
> Which is why linux-next does *not* require that. (Did you read the part
> of my email that you removed?) I do point out when build failures occur
> (that is part of the point of linux-next after all) but they only upset
> me when it is clear that the code that has been changed was not built at
> all (which doesn't happen too often).
>
>> Which is a problem, obviously.
>
> It certainly would be.
>
> Maybe I don't understand what you are trying to say.

Sounds like a big source of confusion to me.

Either which way, Roland has a mitigation patch -- which basically
disables the broken bits of PARISC until the PARISC maintainers fix it.
What is the best way to handle that kind of stuff?

-hpa

2010-02-23 00:00:09

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

Hi Peter,

On Mon, 22 Feb 2010 14:57:28 -0800 "H. Peter Anvin" <[email protected]> wrote:
>
> Either which way, Roland has a mitigation patch -- which basically
> disables the broken bits of PARISC until the PARISC maintainers fix it.
> What is the best way to handle that kind of stuff?

Well, now that Roland has made at least one of the PARISC maintainers
aware of the problem, we could wait a little while to see if a solution
is forthcoming from them. If not, then maybe Roland's patch could be
applied to the appropriate tip tree or we could just leave PARISC broken
in linux-next until they decide to fix it. Note that some of the PARISC
builds are already broken in linux-next for other reasons.

Are there any downsides to Roland's patch as far as PARISC is concerned
(apart from the loss of some functionality, of course)?
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (935.00 B)
(No filename) (198.00 B)
Download all attachments

2010-02-23 08:46:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)


* Stephen Rothwell <[email protected]> wrote:

> Hi Ingo,
>
> On Mon, 22 Feb 2010 11:27:45 +0100 Ingo Molnar <[email protected]> wrote:
> >
> > * Stephen Rothwell <[email protected]> wrote:
> >
> > > On Mon, 22 Feb 2010 10:07:10 +0100 Ingo Molnar <[email protected]> wrote:
> > > >
> > > > I'll keep them in tip:master to get them tested, but note that i cannot
> > > > push any of these patches into linux-next until this is fixed, as
> > > > linux-next requires all architectures to build, with no regard to which
> > > > architectures are tested by kernel testers in practice.
> > >
> > > I merely expect people not to push known broken code into linux-next.
> >
> > FYI, this 'mere' kind of indiscriminate definition of 'breakage' is what i am
> > talking about.
>
> OK, let me remove "merely" from this.
>
> I expect people not to push known broken code into linux-next. Code in
> linux-next is meant to be as ready as possible to be sent to Linus. If
> you know that it breaks some architecture then it should obviously be
> fixed some how (unless the architecture maintainer really doesn't care,
> of course).
>
> This is different from not knowing that it breaks some architecture even
> though you have done reasonable testing.
>
> > The occasional driver build breakage can be tested relatively easily: one
> > allyesconfig build and it's done. Build testing 22 architectures is
> > exponentially harder: it requires the setup (and constant maintenance) of
> > zillions of tool-chains, plus the build time is significant as well.
> >
> > So this kind of linux-next requirement causes the over-testing of code that
> > doesnt get all that much active usage, plus it increases build testing
> > overhead 10-fold. That, by definition, causes the under-testing of code that
> > _does_ matter a whole lot more to active testers of the Linux kernel.
>
> Which is why linux-next does *not* require that. [...]

How can you reconcile that with:

> I expect people not to push known broken code into linux-next. Code in

?

So is it your point that technically you 'expect' but dont 'require'?

That's mostly word games really IMO, as in the end there's not much of a
difference in practice: because you report build failures of non-mainstream
architectures pretty much the same way as the main architectures.

I.e. indirectly you push up their importance, while taking away resources from
the main architectures. Testing and maintainer attention is a finite resource,
it's all a zero-sum game.

So in the end maintainers either cross-build to all architectures and avoid
all the squeaky-wheel overhead of linux-next, or avoid pushing to it all that
frequently. Which causes the collateral damage i mentioned:

> The occasional driver build breakage can be tested relatively easily: one
> allyesconfig build and it's done. Build testing 22 architectures is
> exponentially harder: it requires the setup (and constant maintenance) of
> zillions of tool-chains, plus the build time is significant as well.
>
> So this kind of linux-next requirement causes the over-testing of code that
> doesnt get all that much active usage, plus it increases build testing
> overhead 10-fold. That, by definition, causes the under-testing of code that
> _does_ matter a whole lot more to active testers of the Linux kernel.
>
> Which is a problem, obviously.

The solution? Stop this anti-real-world-usage bias already. Stop pretending
that those cross-build results are as important as say the thousands of real
bugzilla entries we have. They are fine info, but the kind of priority you are
giving them is causing a waste of resources.

The thing is, testing whether the kernel still builds with gcc33 has more
practical relevance to Linux users than testing about half of the
architectures. The ancient NE2000 driver probably still has ten times more
users than half of the architectures we have. Do you boot-test NE2000 with
linux-next?

Developers simply cannot be expected to build for 22 architectures, and they
shouldnt be. It's somewhat of a waste of time even on the subsystem maintainer
level. (although it's certainly more contained there, plus subsystem
maintainers generally have more hw resources as well.)

The thing is, last i checked you didnt even _test_ x86 as the first step in
your linux-next build tests. Most of your generic build bug reports are
against PowerPC. They create the appearance that x86 is a second class citizen
in linux-next.

IMHO a generic tree like linux-next should be fundamentally neutral and its
testing should be weighted _towards_ real Linux usage. You should try hard to
avoid even the _apperance_ of pro-PowerPC (and anti-x86) bias - but AFAICS you
dont even try.

Which i see as a problem.

Thanks,

Ingo

2010-02-23 18:37:30

by Roland McGrath

[permalink] [raw]
Subject: [tip:x86/ptrace] parisc: Disable CONFIG_HAVE_ARCH_TRACEHOOK

Commit-ID: 5e6dbc260704ce4d22fc9664f517f0bb6748feaa
Gitweb: http://git.kernel.org/tip/5e6dbc260704ce4d22fc9664f517f0bb6748feaa
Author: Roland McGrath <[email protected]>
AuthorDate: Mon, 22 Feb 2010 10:37:07 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 23 Feb 2010 10:34:41 -0800

parisc: Disable CONFIG_HAVE_ARCH_TRACEHOOK

> FYI, this commit broke tip:master on PARISC (other architectures are fine):
>
> kernel/built-in.o: In function `ptrace_request':
> (.text.ptrace_request+0x2cc): undefined reference to `task_user_regset_view'

This means that parisc failed to meet the documented requirements for
setting CONFIG_HAVE_ARCH_TRACEHOOK, but set it anyway. If arch folks don't
follow the specs, it defeats the whole purpose of having clear statements
of requirements for arch code.

Until parisc finishes up its requirements, disable CONFIG_HAVE_ARCH_TRACEHOOK.

Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
---
arch/parisc/Kconfig | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 524d935..f388dc6 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -18,7 +18,6 @@ config PARISC
select BUG
select HAVE_PERF_EVENTS
select GENERIC_ATOMIC64 if !64BIT
- select HAVE_ARCH_TRACEHOOK
help
The PA-RISC microprocessor is designed by Hewlett-Packard and used
in many of their workstations & servers (HP9000 700 and 800 series,

2010-02-23 19:52:45

by Al Viro

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

On Tue, Feb 23, 2010 at 09:45:52AM +0100, Ingo Molnar wrote:

> The solution? Stop this anti-real-world-usage bias already. Stop pretending
> that those cross-build results are as important as say the thousands of real
> bugzilla entries we have. They are fine info, but the kind of priority you are
> giving them is causing a waste of resources.

Ho-hum... "Kernel won't build for several thousand boxen"... "Five lines
contain trailing whitespace"... Yup, the latter is far higher priority,
all right.

2010-02-23 19:58:06

by Al Viro

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

On Tue, Feb 23, 2010 at 07:52:14PM +0000, Al Viro wrote:
> On Tue, Feb 23, 2010 at 09:45:52AM +0100, Ingo Molnar wrote:
>
> > The solution? Stop this anti-real-world-usage bias already. Stop pretending
> > that those cross-build results are as important as say the thousands of real
> > bugzilla entries we have. They are fine info, but the kind of priority you are
> > giving them is causing a waste of resources.
>
> Ho-hum... "Kernel won't build for several thousand boxen"... "Five lines
> contain trailing whitespace"... Yup, the latter is far higher priority,
> all right.

IOW, I'd buy that argument from somebody who didn't protect checkpatch.pl
wankers against exactly that kind of criticism.

2010-02-23 20:21:05

by Roland McGrath

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

> Are there any downsides to Roland's patch as far as PARISC is concerned
> (apart from the loss of some functionality, of course)?

Kyle ACK'd my parisc patch, and my impression is he wants it to go in
and does not plan to work on the necessary arch support imminently.

I think the only visible effect of turning off HAVE_ARCH_TRACEHOOK
on parisc will be that /proc/pid/syscall is not available.


Thanks,
Roland

2010-02-23 20:53:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

On 02/23/2010 12:20 PM, Roland McGrath wrote:
>> Are there any downsides to Roland's patch as far as PARISC is concerned
>> (apart from the loss of some functionality, of course)?
>
> Kyle ACK'd my parisc patch, and my impression is he wants it to go in
> and does not plan to work on the necessary arch support imminently.
>
> I think the only visible effect of turning off HAVE_ARCH_TRACEHOOK
> on parisc will be that /proc/pid/syscall is not available.

Added to tip:x86/ptrace.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-02-23 22:54:52

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements (Was: Re: [tip:x86/ptrace] ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET)

On Tue, 23 Feb 2010 12:49:42 -0800 "H. Peter Anvin" <[email protected]> wrote:
>
> On 02/23/2010 12:20 PM, Roland McGrath wrote:
> >> Are there any downsides to Roland's patch as far as PARISC is concerned
> >> (apart from the loss of some functionality, of course)?
> >
> > Kyle ACK'd my parisc patch, and my impression is he wants it to go in
> > and does not plan to work on the necessary arch support imminently.
> >
> > I think the only visible effect of turning off HAVE_ARCH_TRACEHOOK
> > on parisc will be that /proc/pid/syscall is not available.
>
> Added to tip:x86/ptrace.

And Linus actually put it in his tree yesterday ... commit
15cbf627abcd93c3c668d5a92d58d9fec8f953dd "Revert "parisc:
HAVE_ARCH_TRACEHOOK""

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (827.00 B)
(No filename) (198.00 B)
Download all attachments

2010-02-24 07:25:52

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements

[I have removed linux-tip-commits from the cc list]

Hi Ingo,

On Tue, 23 Feb 2010 09:45:52 +0100 Ingo Molnar <[email protected]> wrote:
>
> Developers simply cannot be expected to build for 22 architectures, and they
> shouldnt be.

I have agreed with this point of yours several times. Why do you keep
stating it?

> The thing is, last i checked you didnt even _test_ x86 as the first step in
> your linux-next build tests. Most of your generic build bug reports are
> against PowerPC. They create the appearance that x86 is a second class citizen
> in linux-next.

Lets see. Over the last 60 days, I have reported 37 build errors. Of
these, 16 were reported against x86, 14 against ppc, 7 against other
archs. Of the ppc reports, 10 would not affect x86 builds (due to being
ppc specific problems or dependencies on implicit includes that do happen on
x86). None of the reports against other arches would affect x86 builds.

I also reported 31 warnings. 15 against x86, 15 against ppc and 1 against
both. Of those only reported against ppc, 13 did not affect x86.

So of my "generic" reports, 4 errors and 2 warnings were reported against
ppc, 16 errors and 15 warnings again x86.

Also, I am not sure how reports of 37 build errors and 32 warnings over
60 days can tax the resources of our developer base. Most of these are
fairly trivial to fix (as is shown by how quick they are fixed. Usually
the developer has just forgotten to test the !CONFIG_SOMETHING case or
used some function without explicitly including the file that declares it.

As to my perceived pro-PowerPC and anti-x86 bias, you are the only one who
has even mentioned it to me.

Anyway, I sick of these discussions. If people see the way I do
linux-next as a problem, then they can find someone else. That is not
the impression I gained at the Kernel Summit and (apart from these
occasional "discussions") I am quite happy to continue.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.98 kB)
(No filename) (198.00 B)
Download all attachments

2010-02-27 01:54:16

by Grant Likely

[permalink] [raw]
Subject: Re: linux-next requirements

On Wed, Feb 24, 2010 at 12:25 AM, Stephen Rothwell <[email protected]> wrote:
> Anyway, I sick of these discussions. ?If people see the way I do
> linux-next as a problem, then they can find someone else. ?That is not
> the impression I gained at the Kernel Summit and (apart from these
> occasional "discussions") I am quite happy to continue.

Please don't stop. I'd be screwed.

g.

2010-02-27 08:53:22

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: linux-next requirements

On Sat, Feb 27, 2010 at 02:53, Grant Likely <[email protected]> wrote:
> On Wed, Feb 24, 2010 at 12:25 AM, Stephen Rothwell <[email protected]> wrote:
>> Anyway, I sick of these discussions.  If people see the way I do
>> linux-next as a problem, then they can find someone else.  That is not
>> the impression I gained at the Kernel Summit and (apart from these
>> occasional "discussions") I am quite happy to continue.
>
> Please don't stop.  I'd be screwed.

Yes, I like linux-next, so
Acked-by: Geert Uytterhoeven <[email protected]>

(This ack doesn't necessarily apply to the rest of the discussion that
happened on
linux-tip-commits before. I'm gonna pretend I didn't read it ;-).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2010-02-27 09:09:25

by Jaswinder Singh

[permalink] [raw]
Subject: Re: linux-next requirements

Hello Stephen,

On Wed, Feb 24, 2010 at 12:55 PM, Stephen Rothwell <[email protected]> wrote:
>
> Anyway, I sick of these discussions. ?If people see the way I do
> linux-next as a problem, then they can find someone else. ?That is not
> the impression I gained at the Kernel Summit and (apart from these
> occasional "discussions") I am quite happy to continue.

You are doing great job. linux-next is also very useful before
submitting the patch as we can see all the changes under one roof.
Please do not get upset with useless discussions and continue with
your great work.

Thanks and big Cheers :-)
--
Jaswinder.

2010-02-27 09:40:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Stephen Rothwell <[email protected]> wrote:

> [I have removed linux-tip-commits from the cc list]
>
> Hi Ingo,
>
> On Tue, 23 Feb 2010 09:45:52 +0100 Ingo Molnar <[email protected]> wrote:
> >
> > Developers simply cannot be expected to build for 22 architectures, and
> > they shouldnt be.
>
> I have agreed with this point of yours several times. Why do you keep
> stating it?

If you agree with me then why do you put so much focus on cross-arch build
failures, versus other, more relevant forms of testing?

> > The thing is, last i checked you didnt even _test_ x86 as the first step
> > in your linux-next build tests. Most of your generic build bug reports are
> > against PowerPC. They create the appearance that x86 is a second class
> > citizen in linux-next.
>
> Lets see. Over the last 60 days, I have reported 37 build errors. Of
> these, 16 were reported against x86, 14 against ppc, 7 against other archs.

So only 43% of them were even relevant on the platform that 95+% of the Linux
testers use? Seems to support the points i made.

> Of the ppc reports, 10 would not affect x86 builds (due to being ppc
> specific problems or dependencies on implicit includes that do happen on
> x86). None of the reports against other arches would affect x86 builds.
>
> I also reported 31 warnings. 15 against x86, 15 against ppc and 1 against
> both. Of those only reported against ppc, 13 did not affect x86.
>
> So of my "generic" reports, 4 errors and 2 warnings were reported against
> ppc, 16 errors and 15 warnings again x86.
>
> Also, I am not sure how reports of 37 build errors and 32 warnings over 60
> days can tax the resources of our developer base. [...]

Note that out of those 37 build errors only a small minority were caused by
any tree i co-maintain. (i dont have the precise numbers but it's below 5)

Why? Because i cross-build before pushing to linux-next. I bug people about
cross-arch build failures, and about the patch flow delays and hickups this
causes. Without that you'd see twice that many cross-build failures.

Which in itself is not bad of course (any fix is a good fix) - except the
forced prioritization and its place in the workflow: it sends the wrong
testing message.

It sends the message that building on N architectures is more important than
for the code to work for real people. I've had good developers waste their
time trying to set up cross-build testing environments and complain to me how
this complicates their testing.

> [...] Most of these are fairly trivial to fix (as is shown by how quick
> they are fixed. Usually the developer has just forgotten to test the
> !CONFIG_SOMETHING case or used some function without explicitly including
> the file that declares it.
>
> As to my perceived pro-PowerPC and anti-x86 bias, you are the only one who
> has even mentioned it to me.

Have you asked me recently for example?

> Anyway, I sick of these discussions. If people see the way I do linux-next
> as a problem, then they can find someone else. That is not the impression I
> gained at the Kernel Summit and (apart from these occasional "discussions")
> I am quite happy to continue.

Not sure how you jump from my observations to "I will quit if you do this". I
am simply pointing out problems as i see them - as i do that with every piece
of the workflow we use. I have expressed my views numerous times about where i
find linux-next useful and positive - and it's sure a net positive.

Ingo

2010-02-27 12:22:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: linux-next requirements

On Saturday 27 February 2010, Ingo Molnar wrote:
>
> * Stephen Rothwell <[email protected]> wrote:
>
> > [I have removed linux-tip-commits from the cc list]
> >
> > Hi Ingo,
> >
> > On Tue, 23 Feb 2010 09:45:52 +0100 Ingo Molnar <[email protected]> wrote:
> > >
> > > Developers simply cannot be expected to build for 22 architectures, and
> > > they shouldnt be.
> >
> > I have agreed with this point of yours several times. Why do you keep
> > stating it?
>
> If you agree with me then why do you put so much focus on cross-arch build
> failures, versus other, more relevant forms of testing?

I don't really know what this is all about. Stephen does what he can and
that's generally appreciated very much. It helps to make sure the code builds
correctly on the architectures it's supposed to build on and there's nothing
wrong with that IMO.

> > > The thing is, last i checked you didnt even _test_ x86 as the first step
> > > in your linux-next build tests. Most of your generic build bug reports are
> > > against PowerPC. They create the appearance that x86 is a second class
> > > citizen in linux-next.
> >
> > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > these, 16 were reported against x86, 14 against ppc, 7 against other archs.
>
> So only 43% of them were even relevant on the platform that 95+% of the Linux
> testers use? Seems to support the points i made.

Well, I hope you don't mean that because the majority of bug reporters (vs
testers, the number of whom is unknown to me at least) use x86, we are free
to break the other architectures. ;-)

> > Of the ppc reports, 10 would not affect x86 builds (due to being ppc
> > specific problems or dependencies on implicit includes that do happen on
> > x86). None of the reports against other arches would affect x86 builds.
> >
> > I also reported 31 warnings. 15 against x86, 15 against ppc and 1 against
> > both. Of those only reported against ppc, 13 did not affect x86.
> >
> > So of my "generic" reports, 4 errors and 2 warnings were reported against
> > ppc, 16 errors and 15 warnings again x86.
> >
> > Also, I am not sure how reports of 37 build errors and 32 warnings over 60
> > days can tax the resources of our developer base. [...]
>
> Note that out of those 37 build errors only a small minority were caused by
> any tree i co-maintain. (i dont have the precise numbers but it's below 5)
>
> Why? Because i cross-build before pushing to linux-next. I bug people about
> cross-arch build failures, and about the patch flow delays and hickups this
> causes. Without that you'd see twice that many cross-build failures.
>
> Which in itself is not bad of course (any fix is a good fix) - except the
> forced prioritization and its place in the workflow: it sends the wrong
> testing message.
>
> It sends the message that building on N architectures is more important than
> for the code to work for real people. I've had good developers waste their
> time trying to set up cross-build testing environments and complain to me how
> this complicates their testing.

That's the kind of task linux-next is really good at AFAICT. Before linux-next
I used to have a cross-build testing environment like this, but I don't need it
any more, because I know linux-next will catch the cross-build problems for
me and I appreciate that very much, because it saves a lot of my time.

Rafael

2010-02-27 12:47:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Rafael J. Wysocki <[email protected]> wrote:

> > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > archs.
> >
> > So only 43% of them were even relevant on the platform that 95+% of the
> > Linux testers use? Seems to support the points i made.
>
> Well, I hope you don't mean that because the majority of bug reporters (vs
> testers, the number of whom is unknown to me at least) use x86, we are free
> to break the other architectures. ;-)

It means exactly that: just like we 'can' break compilation with gcc296,
ancient versions of binutils, odd bootloaders, can break the boot via odd
hardware, etc. When someone uses that architectures then the 'easy' bugfixes
will actually flow in very quickly and without much fuss - and without
burdening developers to consider cases they have no good ways to test. Why
should rare architectures be more important than those other rare forms of
Linux usage?

In fact those rare ways of building and booting the kernel i mentioned are
probably used _more_ than half of the architectures that linux-next
build-tests ...

So yes, of course _all_ bugs need fixing if there's enough capacity, but the
process in general should be healthy, low-overhead and shouldnt concentrate on
an irrelevant portion of Linux usage in such a prominent way.

Or, if it does, it should _first_ cover the other, much more burning areas of
testing interest. All the while our _real_ bugreports are often rotting on
bugzilla.kernel.org ...

Thanks,

Ingo

2010-02-27 19:07:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: linux-next requirements

On Saturday 27 February 2010, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > archs.
> > >
> > > So only 43% of them were even relevant on the platform that 95+% of the
> > > Linux testers use? Seems to support the points i made.
> >
> > Well, I hope you don't mean that because the majority of bug reporters (vs
> > testers, the number of whom is unknown to me at least) use x86, we are free
> > to break the other architectures. ;-)
>
> It means exactly that: just like we 'can' break compilation with gcc296,
> ancient versions of binutils, odd bootloaders, can break the boot via odd
> hardware, etc. When someone uses that architectures then the 'easy' bugfixes
> will actually flow in very quickly and without much fuss

Then I don't understand what the problem with getting them in at the linux-next
stage is. They are necessary anyway, so we'll need to add them sooner or
later and IMO the sooner the better.

Apart from this, that cross-build issues aren't always "easy" and sometimes
they take quite some time and engineering effort to resolve. IMO that's better
done at the linux-next stage than during a merge window.

> - and without burdening developers to consider cases they have no good ways
> to test. Why should rare architectures be more important than those other
> rare forms of Linux usage?

Because the Linus' tree is supposed to build on those architectures. As long
as that's the case, linux-next should build on them too.

> In fact those rare ways of building and booting the kernel i mentioned are
> probably used _more_ than half of the architectures that linux-next
> build-tests ...

I don't know and you don't know either. That's just pure speculation and
therefore meaningless.

> So yes, of course _all_ bugs need fixing if there's enough capacity, but the
> process in general should be healthy, low-overhead and shouldnt concentrate on
> an irrelevant portion of Linux usage in such a prominent way.
>
> Or, if it does, it should _first_ cover the other, much more burning areas of
> testing interest. All the while our _real_ bugreports are often rotting on
> bugzilla.kernel.org ...

All right. There are two _separate_ questions to ask IMO:

(1) Do we need the kind of community service that Stephen has been doing?

(2) Do we need more testing of linux-next and if so, who's task should that be?

I think you agree that the aswer to (1) is "yes, we do". So _someone_ has to
do it and I'm very grateful to Stephen for taking care of it.

[Thanks, Stephen!]

Now, the part of this service is to check that the resulting tree will actually
build in all conditions it's supposed to build in, if possible, or the whole
merging exercise wouldn't have much practical meaning. Stephen has been
doing just that and IMO to a good result.

To some extent, though, that's a matter of defining in what conditions the
kernel is supposed to build in, but I think for linux-next these conditions
should be the same as for the Linus' tree, for the simple reason that
linux-next is supposed to be a "future snapshot" of it. So linux-next should
build on all architectures that the future Linus' tree is supposed to build on.
Even on "exotic" ones.

[IMO that's actually important, because such corner cases tend to reveal
runtime bugs we wouldn't have been aware of otherwise. Now, in the majority
of cases a casual tester will be discouraged by the kernel not compiling for
him while he might have found a "real" bug otherwise.]

Now, as far as (2) and is concerned, I think the answer here is also "yes, we
do", but that's not a part of the Stephen's job.

Thanks,
Rafael

2010-02-27 21:50:20

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: linux-next requirements

On Sat, Feb 27, 2010 at 20:07, Rafael J. Wysocki <[email protected]> wrote:
> On Saturday 27 February 2010, Ingo Molnar wrote:
>> * Rafael J. Wysocki <[email protected]> wrote:
>>
>> > > > Lets see.  Over the last 60 days, I have reported 37 build errors.  Of
>> > > > these, 16 were reported against x86, 14 against ppc, 7 against other
>> > > > archs.
>> > >
>> > > So only 43% of them were even relevant on the platform that 95+% of the
>> > > Linux testers use? Seems to support the points i made.
>> >
>> > Well, I hope you don't mean that because the majority of bug reporters (vs
>> > testers, the number of whom is unknown to me at least) use x86, we are free
>> > to break the other architectures. ;-)
>>
>> It means exactly that: just like we 'can' break compilation with gcc296,
>> ancient versions of binutils, odd bootloaders, can break the boot via odd
>> hardware, etc. When someone uses that architectures then the 'easy' bugfixes
>> will actually flow in very quickly and without much fuss
>
> Then I don't understand what the problem with getting them in at the linux-next
> stage is.  They are necessary anyway, so we'll need to add them sooner or
> later and IMO the sooner the better.
>
> Apart from this, that cross-build issues aren't always "easy" and sometimes
> they take quite some time and engineering effort to resolve.  IMO that's better
> done at the linux-next stage than during a merge window.
>
>> - and without burdening developers to consider cases they have no good ways
>> to test.  Why should rare architectures be more important than those other
>> rare forms of Linux usage?
>
> Because the Linus' tree is supposed to build on those architectures.  As long
> as that's the case, linux-next should build on them too.
>
>> In fact those rare ways of building and booting the kernel i mentioned are
>> probably used _more_ than half of the architectures that linux-next
>> build-tests ...
>
> I don't know and you don't know either.  That's just pure speculation and
> therefore meaningless.

If only the CE Linux Forum member companies would publish figures about the
number of Linux devices they push onto the world population...

Yes I know, this still excludes `obsolete' architectures like parisc
and alpha, but it would
change the balance towards x86 (and powerpc?) drastically.

>> So yes, of course _all_ bugs need fixing if there's enough capacity, but the
>> process in general should be healthy, low-overhead and shouldnt concentrate on
>> an irrelevant portion of Linux usage in such a prominent way.
>>
>> Or, if it does, it should _first_ cover the other, much more burning areas of
>> testing interest. All the while our _real_ bugreports are often rotting on
>> bugzilla.kernel.org ...
>
> All right.  There are two _separate_ questions to ask IMO:
>
> (1) Do we need the kind of community service that Stephen has been doing?
>
> (2) Do we need more testing of linux-next and if so, who's task should that be?
>
> I think you agree that the aswer to (1) is "yes, we do".  So _someone_ has to
> do it and I'm very grateful to Stephen for taking care of it.
>
> [Thanks, Stephen!]
>
> Now, the part of this service is to check that the resulting tree will actually
> build in all conditions it's supposed to build in, if possible, or the whole
> merging exercise wouldn't have much practical meaning.  Stephen has been
> doing just that and IMO to a good result.
>
> To some extent, though, that's a matter of defining in what conditions the
> kernel is supposed to build in, but I think for linux-next these conditions
> should be the same as for the Linus' tree, for the simple reason that
> linux-next is supposed to be a "future snapshot" of it.   So linux-next should
> build on all architectures that the future Linus' tree is supposed to build on.
> Even on "exotic" ones.
>
> [IMO that's actually important, because such corner cases tend to reveal
> runtime bugs we wouldn't have been aware of otherwise.  Now, in the majority
> of cases a casual tester will be discouraged by the kernel not compiling for
> him while he might have found a "real" bug otherwise.]
>
> Now, as far as (2) and is concerned, I think the answer here is also "yes, we
> do", but that's not a part of the Stephen's job.

While wearing my m68k hat, I can say that I suffer an order of
magnitude more from
build failures than from boot failures. So I'm inclined to agree with
Linus when he says
`if it compiles, it's great; if it boots, it's perfect' :-)

Or perhaps this says more about our review process: we're quite good at catching
logical errors in our code, and worse at catching syntax and dependency errors.
Fortunately we have tools (and linux-next) to catch those...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2010-02-27 22:29:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: linux-next requirements

On Saturday 27 February 2010, Geert Uytterhoeven wrote:
> On Sat, Feb 27, 2010 at 20:07, Rafael J. Wysocki <[email protected]> wrote:
> > On Saturday 27 February 2010, Ingo Molnar wrote:
> >> * Rafael J. Wysocki <[email protected]> wrote:
> >>
> >> > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> >> > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> >> > > > archs.
> >> > >
> >> > > So only 43% of them were even relevant on the platform that 95+% of the
> >> > > Linux testers use? Seems to support the points i made.
> >> >
> >> > Well, I hope you don't mean that because the majority of bug reporters (vs
> >> > testers, the number of whom is unknown to me at least) use x86, we are free
> >> > to break the other architectures. ;-)
> >>
> >> It means exactly that: just like we 'can' break compilation with gcc296,
> >> ancient versions of binutils, odd bootloaders, can break the boot via odd
> >> hardware, etc. When someone uses that architectures then the 'easy' bugfixes
> >> will actually flow in very quickly and without much fuss
> >
> > Then I don't understand what the problem with getting them in at the linux-next
> > stage is. They are necessary anyway, so we'll need to add them sooner or
> > later and IMO the sooner the better.
> >
> > Apart from this, that cross-build issues aren't always "easy" and sometimes
> > they take quite some time and engineering effort to resolve. IMO that's better
> > done at the linux-next stage than during a merge window.
> >
> >> - and without burdening developers to consider cases they have no good ways
> >> to test. Why should rare architectures be more important than those other
> >> rare forms of Linux usage?
> >
> > Because the Linus' tree is supposed to build on those architectures. As long
> > as that's the case, linux-next should build on them too.
> >
> >> In fact those rare ways of building and booting the kernel i mentioned are
> >> probably used _more_ than half of the architectures that linux-next
> >> build-tests ...
> >
> > I don't know and you don't know either. That's just pure speculation and
> > therefore meaningless.
>
> If only the CE Linux Forum member companies would publish figures about the
> number of Linux devices they push onto the world population...
>
> Yes I know, this still excludes `obsolete' architectures like parisc
> and alpha, but it would
> change the balance towards x86 (and powerpc?) drastically.

You apparently forgot about ARM.

Rafael

2010-02-28 07:07:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Rafael J. Wysocki <[email protected]> wrote:

> On Saturday 27 February 2010, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> > > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > > archs.
> > > >
> > > > So only 43% of them were even relevant on the platform that 95+% of the
> > > > Linux testers use? Seems to support the points i made.
> > >
> > > Well, I hope you don't mean that because the majority of bug reporters (vs
> > > testers, the number of whom is unknown to me at least) use x86, we are free
> > > to break the other architectures. ;-)
> >
> > It means exactly that: just like we 'can' break compilation with gcc296,
> > ancient versions of binutils, odd bootloaders, can break the boot via odd
> > hardware, etc. When someone uses that architectures then the 'easy'
> > bugfixes will actually flow in very quickly and without much fuss
>
> Then I don't understand what the problem with getting them in at the
> linux-next stage is. They are necessary anyway, so we'll need to add them
> sooner or later and IMO the sooner the better.

The problem is the dynamics and resulting (non-)cleanliness of code. We have
architectures that have been conceptually broken for 5 years or more, but
still those problems get blamed on the last change that 'causes' the breakage:
the core kernel and the developers who try to make a difference.

I think your perspective and your opinion is correct, while my perspective is
real and correct as well - there's no contradiction really. Let me try to
explain how i see it:

You are working in a relatively well-designed piece of code which interfaces
to the kernel in sane ways - kernel/power/* et al. You might break the
cross-builds sometimes, but it's not very common, and in those cases it's
usually your own fault and you are grateful for linux-next to have caught that
stupidity. (i hope this a fair summary!)

I am not criticising that aspect of linux-next _at all_ - it's useful and
beneficial - and i'd like to thank Stephen for all his hard work. Other
aspects of linux-next useful as well: such as the patch conflict mediation
role.

But as it happens so often, people tend to talk more about the things that are
not so rosy, not about the things that work well.

The area i am worried about are new core kernel facilities and their
development and extension of existing facilities. _Those_ facilities are
affected by 'many architectures' in a different way from how you experience
it: often we can do very correct changes to them, which still 'break' on some
architecture due to _that architecture's conceptual fault_.

Let me give you an example that happened just yesterday. My cross-testing
found that a change in the tracing infrastructure code broke m32r and parisc.

The breakage:

/home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
/home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
make[3]: *** [kernel/trace/trace_clock.o] Error 1
make[3]: *** Waiting for unfinished jobs....

Is was 'caused by':

18b4a4d: oprofile: remove tracing build dependency

In linux-next this would be pinned to commit 18b4a4d, which would have to be
reverted/fixed.

Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why
dont they, four years after it has been introduced as a core kernel facility
in 2006, _still_ not support raw_local_irq_save()?

( A similar situation occured in this very thread a well - before the subject
of the thread - so it's a real and present problem. We didnt even get _any_
reaction about that particular breakage from the affected architecture ... )

These situations are magnified by how certain linux-next bugs are reported:
the 'blame' is put on the new commit that exposes that laggy nature of certain
architectures. Often the developers even believe this false notion and feel
guilty for 'having broken' an architecture - often an architecture that has
not contributed a single core kernel facility _in its whole existence_.

The usual end result is that the path of least resistance is taken: the commit
is reverted or worked around, while the 'laggy' architecture can continue
business as usual and cause more similar bugs and hickups in the future ...

I.e. there is extra overhead put on clearly 'good' efforts, while 'bad'
behavior (parasitic hanging-on, passivity, indifference) is rewarded.
Rewarding bad behavior is very clearly harmful to Linux in many regards, and i
speak up when i see it.

So i wish linux-next balanced these things more fairly towards those areas of
code that are actually useful: if it ignored build breakages that are due to
architectures being lazy - in fact if it required architectures to _help out_
with the development of the kernel.

The majority of build-bugs i see trigger in cross-builds (90% of which i catch
before they get into linux-next) are of this nature, that's why i raised it in
such a pointed way. Your (and many other people's) experience will differ - so
you might see this as an unjustified criticism.

Thanks,

Ingo

2010-02-28 07:14:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Rafael J. Wysocki <[email protected]> wrote:

> > - and without burdening developers to consider cases they have no good
> > ways to test. Why should rare architectures be more important than those
> > other rare forms of Linux usage?
>
> Because the Linus' tree is supposed to build on those architectures. [...]

That's not actually true: Linus on multiple occasions has said that only the
major architectures (x86, powerpc, ARM and a few others) are 'required' to
build and that the others should be left to fail to build and should be
_forced to get their act together_.

> [...] As long as that's the case, linux-next should build on them too.

No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
supposed to build on all architectures. Maybe that's a core bit of a
misunderstanding (on either my or on sfr's side) and it should be clarified
...

Ingo

2010-02-28 07:24:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Rafael J. Wysocki <[email protected]> wrote:

> > In fact those rare ways of building and booting the kernel i mentioned are
> > probably used _more_ than half of the architectures that linux-next
> > build-tests ...
>
> I don't know and you don't know either. That's just pure speculation and
> therefore meaningless.

We know various arch (and hardware) usage stats, such as:

http://smolt.fedoraproject.org/static/stats/stats.html

Today's stats, done amongst users who are willing to opt in to the Smolt
daemon:

x86: 99.7%
powerpc: 0.3%

x86 used to be 99.5 a year ago, so the world has become even more x86-centric.

There's also the kerneloops.org client, which shows in excess of 95% x86 usage
as well. You can also grep the linux-kernel folder for arch signatures, etc.

And yes, there are millions of ARM (and MIPS) CPUs running Linux as well.
(They are only as present as present their developers are: the users almost
never show up on linux-kernel.)

Plus, a kernel subsystem maintainer like me who does lots of kernel
infrastructure work can have a pretty good gut feeling about which
architectures are actively helping out Linux, and which are just hanging on to
the bandwagon.

So i respectfully disagree with your 'pure speculation' bit. Yes, it's
somewhat of a guessing game, as so many things in life - but the trend is very
clear.

Ingo

2010-02-28 07:37:34

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements

Hi Ingo,

On Sun, 28 Feb 2010 08:14:05 +0100 Ingo Molnar <[email protected]> wrote:
>
> > [...] As long as that's the case, linux-next should build on them too.
>
> No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
> supposed to build on all architectures. Maybe that's a core bit of a
> misunderstanding (on either my or on sfr's side) and it should be clarified
> ...

Well, we have no real problem then. The only architectures for which a
failure will stop new stuff getting into linux-next are the ones I
personally build while constructing the tree (x86, ppc and sparc). Once
something is in linux-next, even if it causes a build failure overnight,
I am loath to remove it again as it can cause pain for Andrew (who bases
-mm on linux-next).

I will still report such failures (if I have time to notice them - I
mostly hope that architecture maintainers will have a glance over the
build results themselves) and others do as well but such failures do not
generally cause any actions on my part (except in rare cases I may
actually fix the problem or put a provided fix patch in linux-next).

I would like to add arm to the mix of the architectures I build during
construction, but there is no wide ranging config that builds for arm and
building a few of the configs would just end up taking too much time.

Thanks for clarifying.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.43 kB)
(No filename) (198.00 B)
Download all attachments

2010-02-28 07:51:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Stephen Rothwell <[email protected]> wrote:

> Hi Ingo,
>
> On Sun, 28 Feb 2010 08:14:05 +0100 Ingo Molnar <[email protected]> wrote:
> >
> > > [...] As long as that's the case, linux-next should build on them too.
> >
> > No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
> > supposed to build on all architectures. Maybe that's a core bit of a
> > misunderstanding (on either my or on sfr's side) and it should be clarified
> > ...
>
> Well, we have no real problem then. The only architectures for which a
> failure will stop new stuff getting into linux-next are the ones I
> personally build while constructing the tree (x86, ppc and sparc). Once
> something is in linux-next, even if it causes a build failure overnight, I
> am loath to remove it again as it can cause pain for Andrew (who bases -mm
> on linux-next).

Ok - very good. This has apparently been relaxed some time ago, i know
linux-next used to be more stringent.

> I will still report such failures (if I have time to notice them - I mostly
> hope that architecture maintainers will have a glance over the build results
> themselves) and others do as well but such failures do not generally cause
> any actions on my part (except in rare cases I may actually fix the problem
> or put a provided fix patch in linux-next).

Yeah. Plus it's never black and white - sometimes a rare arch will show some
real crappiness in a commit. So we want to know all bugs.

> I would like to add arm to the mix of the architectures I build during
> construction, but there is no wide ranging config that builds for arm and
> building a few of the configs would just end up taking too much time.

Yeah, ARM is clearly important from a usage share POV IMHO, and it's also
actively driving many areas of interest.

It's also a bit difficult to keep ARM going because there's so many
non-standardized hw variants of ARM, so i'm sure the ARM folks will appreciate
us not breaking them ...

( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
i'm concerned about go: it has implemented most of the core kernel
infrastructures so there's few if any 'self inflicted' breakages that i can
remember. )

> Thanks for clarifying.

Thanks,

Ingo

2010-02-28 08:20:09

by Al Viro

[permalink] [raw]
Subject: Re: linux-next requirements

On Sun, Feb 28, 2010 at 08:51:05AM +0100, Ingo Molnar wrote:

> ( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
> i'm concerned about go: it has implemented most of the core kernel
> infrastructures so there's few if any 'self inflicted' breakages that i can
> remember. )

FWIW, it might make sense to run cross-builds for many targets and post
the things that crop up + analysis to linux-arch... Any takers?

I haven't run a lot of cross-builds lately, but IME most of the breakage
tends to be less dramatic - somebody relying on indirect includes in
driver *or* forgetting to add "depends on" to Kconfig used to be the
most frequent case.

"let other targets rot" attitude has a very nasty effect - it snowballs.
At some point people *can't* check that their patches don't break things,
even if they want to. And that, IMO, sucks. At that point architecture
needs to be either removed or brought to the state when it builds in
mainline.

Note that we have filesystems that are built only on some architectures.
I don't know about you, but I *do* care about not leaving half-converted
interfaces in that area. For entirely rational reasons - people tend
to copy b0rken code from random places in the tree. Playing whack-a-mole
gets old pretty soon.

2010-02-28 08:53:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next requirements


* Al Viro <[email protected]> wrote:

> On Sun, Feb 28, 2010 at 08:51:05AM +0100, Ingo Molnar wrote:
>
> > ( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
> > i'm concerned about go: it has implemented most of the core kernel
> > infrastructures so there's few if any 'self inflicted' breakages that i can
> > remember. )
>
> FWIW, it might make sense to run cross-builds for many targets and post the
> things that crop up + analysis to linux-arch... Any takers?
>
> I haven't run a lot of cross-builds lately, but IME most of the breakage
> tends to be less dramatic - somebody relying on indirect includes in driver
> *or* forgetting to add "depends on" to Kconfig used to be the most frequent
> case.
>
> "let other targets rot" attitude has a very nasty effect - it snowballs. At
> some point people *can't* check that their patches don't break things, even
> if they want to. And that, IMO, sucks. At that point architecture needs to
> be either removed or brought to the state when it builds in mainline.

What is happening right now is that our combined _costs_ snowball: generic
changes are burdened with the overhead of a thousand cuts ...

IMO either there's enough interest in keeping an architecture going, rooted in
_that_ architecture's importance (or the enthusiasm/clue of their developers),
or, after a few years of inactivity it really shouldnt be upstream.

Right now we are socializing all the costs, sometimes even pretending that all
architectures are equal. None of the costs really looks particularly large in
isolation, but the sum of them does exist and adds up in certain places of the
kernel.

Thanks,

Ingo

2010-02-28 10:27:11

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next requirements

Hi Al,

On Sun, 28 Feb 2010 08:19:22 +0000 Al Viro <[email protected]> wrote:
>
> FWIW, it might make sense to run cross-builds for many targets and post
> the things that crop up + analysis to linux-arch... Any takers?

See http://kisskb.ellerman.id.au/kisskb/branch/9/ ... we just need
someone to read it regularly and post about them. There is a set of
builds of Linus' tree there as well (look under "Branches").

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (529.00 B)
(No filename) (198.00 B)
Download all attachments

2010-02-28 12:19:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: linux-next requirements

On Sunday 28 February 2010, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > On Saturday 27 February 2010, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > > > archs.
> > > > >
> > > > > So only 43% of them were even relevant on the platform that 95+% of the
> > > > > Linux testers use? Seems to support the points i made.
> > > >
> > > > Well, I hope you don't mean that because the majority of bug reporters (vs
> > > > testers, the number of whom is unknown to me at least) use x86, we are free
> > > > to break the other architectures. ;-)
> > >
> > > It means exactly that: just like we 'can' break compilation with gcc296,
> > > ancient versions of binutils, odd bootloaders, can break the boot via odd
> > > hardware, etc. When someone uses that architectures then the 'easy'
> > > bugfixes will actually flow in very quickly and without much fuss
> >
> > Then I don't understand what the problem with getting them in at the
> > linux-next stage is. They are necessary anyway, so we'll need to add them
> > sooner or later and IMO the sooner the better.
>
> The problem is the dynamics and resulting (non-)cleanliness of code. We have
> architectures that have been conceptually broken for 5 years or more, but
> still those problems get blamed on the last change that 'causes' the breakage:
> the core kernel and the developers who try to make a difference.
>
> I think your perspective and your opinion is correct, while my perspective is
> real and correct as well - there's no contradiction really. Let me try to
> explain how i see it:
>
> You are working in a relatively well-designed piece of code which interfaces
> to the kernel in sane ways - kernel/power/* et al. You might break the
> cross-builds sometimes, but it's not very common, and in those cases it's
> usually your own fault and you are grateful for linux-next to have caught that
> stupidity. (i hope this a fair summary!)

Fair enough.

> I am not criticising that aspect of linux-next _at all_ - it's useful and
> beneficial - and i'd like to thank Stephen for all his hard work. Other
> aspects of linux-next useful as well: such as the patch conflict mediation
> role.

Great.

> But as it happens so often, people tend to talk more about the things that are
> not so rosy, not about the things that work well.
>
> The area i am worried about are new core kernel facilities and their
> development and extension of existing facilities. _Those_ facilities are
> affected by 'many architectures' in a different way from how you experience
> it: often we can do very correct changes to them, which still 'break' on some
> architecture due to _that architecture's conceptual fault_.
>
> Let me give you an example that happened just yesterday. My cross-testing
> found that a change in the tracing infrastructure code broke m32r and parisc.
>
> The breakage:
>
> /home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
> /home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
> make[3]: *** [kernel/trace/trace_clock.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
>
> Is was 'caused by':
>
> 18b4a4d: oprofile: remove tracing build dependency
>
> In linux-next this would be pinned to commit 18b4a4d, which would have to be
> reverted/fixed.
>
> Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why
> dont they, four years after it has been introduced as a core kernel facility
> in 2006, _still_ not support raw_local_irq_save()?

OK, I see your point.

> ( A similar situation occured in this very thread a well - before the subject
> of the thread - so it's a real and present problem. We didnt even get _any_
> reaction about that particular breakage from the affected architecture ... )
>
> These situations are magnified by how certain linux-next bugs are reported:
> the 'blame' is put on the new commit that exposes that laggy nature of certain
> architectures. Often the developers even believe this false notion and feel
> guilty for 'having broken' an architecture - often an architecture that has
> not contributed a single core kernel facility _in its whole existence_.
>
> The usual end result is that the path of least resistance is taken: the commit
> is reverted or worked around, while the 'laggy' architecture can continue
> business as usual and cause more similar bugs and hickups in the future ...
>
> I.e. there is extra overhead put on clearly 'good' efforts, while 'bad'
> behavior (parasitic hanging-on, passivity, indifference) is rewarded.
> Rewarding bad behavior is very clearly harmful to Linux in many regards, and i
> speak up when i see it.
>
> So i wish linux-next balanced these things more fairly towards those areas of
> code that are actually useful: if it ignored build breakages that are due to
> architectures being lazy - in fact if it required architectures to _help out_
> with the development of the kernel.
>
> The majority of build-bugs i see trigger in cross-builds (90% of which i catch
> before they get into linux-next) are of this nature, that's why i raised it in
> such a pointed way. Your (and many other people's) experience will differ - so
> you might see this as an unjustified criticism.

Thanks a lot for the clarification.

Best,
Rafael