2004-10-28 21:06:47

by Blaisorblade

[permalink] [raw]
Subject: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

On Thursday 28 October 2004 21:33, Chris Wedgwood wrote:
> On Thu, Oct 28, 2004 at 09:04:30PM +0200, Blaisorblade wrote:
> > Hmm, this is true for some of them, not for other ones (mostly
> > fixups, but some wrong).

> ive been sending patches out for ages and they are getting nowhere.
"Not getting an answer" does not mean "getting nowhere". I'm not absolutely
able, for instance, to understand the update for generic IRQs. I've seen the
"compile only" fixes from Jeff. And they were "compile-only". But
understanding the other changes from you is too difficult for me.

About Jeff, I still keep CC'ing him every time, but as he has admitted
privately, he does not have the time to answer detailedly to each patch.
Since I'm here from some time, I'm now using my own judgement on some little
things (i.e. little compile-only fixes, or when a patch is being rejected for
questionable reasons, or when it is reportedly safe).

I don't try to touch the real UML core without getting a review from Jeff, and
anyone having a clue on what I'm doing is welcome.

> if people have better fixes, these have been weeks (in some cases
> months) to get them in

For instance, Jeff rejected the mconsole-proc rewrite. So, I tried harder,
then updated the patch to just #ifdef out his version, and was going to send
it in.

However, you are not entirely wrong. Jeff does not scale enough to the rate of
kernel changes, and not even I can (I'm only a 1st year university student;
luckily they have not yet started teaching anything new).

> > However, always CC both the -devel list (my request) and the LKML
> > (Andrew's request to me some time ago) when sending UML patches.

> i admit i've missed -devel most of the time, i said ill do that from
> now on

> the fact remains, people have fixes that are weeks old or more and if
> you dont submit them to get them merged, then please let another
> potential suitable fix go in for now

> UML often doesn't build and less often runs correctly --- it probably
> one of the worst architectures for this in a sense (i don't know about
> the obscure stuff, i bet those break too --- but nobody uses them
> which isn't the case for UML)

Well, this is true. There are mainly these reasons:

1) the Linux Kernel often breaks when using certain GCC versions or certain
binutils, and has to be fixed.

But UML is a binary doing the most unusual things on the world around, so it
must cope also with different versions of libc / binutils / host kernel.

2) Uml is often not cared by mainline developers. It was merged in 2.6.9 and
remained unworking for ages just because Linus ignored UML patches for ages.
And right now, if UML does not compile it's for the Ingo Molnar's hardirq
patch and for a missed silly prototype change for a TTY api change (they
fixed the UML user, ended up changing one UML function prototype, forgot to
do a trivial update to one user. One missed "grep" invocation, in fact).

3) Uml *is* strange. The kernel has his own linking script? Uml must have a
merged version of the userspace one from binutils and of the kernel one.

Since it must remap its .text section away under his back, it has to copy the
kernel image and remap the data with one one-shot function, which is
statically linked - so you end up with symbol clashes on some glibc using
NPTL, for trivial reasons - and so on.

4) We are too few. The currently active developers (and I mean only the one
which this month have being working on it) are:

- Bodo Stroesser - he came in just now, but he's doing a tremendous work on
getting SYSEMU working well.

- you, Chris

- Gerd Knorr, the Suse UML packager and maintainer.

- I and Jeff, for various other stuff.

The number nearly doubles if you just include work done before this summer,
with Henrik Normstrod, M.A. Young and Ingo Molnar coming here. But that's the
fact.

I.e., if after 2.6.9-rc4 for any reason I did not send the fixes (like being
overloaded or away from the net), Jeff probably would have sent them (he was
just about doing it). But let's say he was a bit away from the net, or he
forgot some build fixes, even 2.6.9 wouldn't have worked for UML.

That said, with mainline inclusion UML is getting more work on from mainline.
At least, most API changes are handled by the ones who submit them.
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729


2004-10-28 21:50:57

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

On Thu, Oct 28, 2004 at 10:54:21PM +0200, Blaisorblade wrote:

> I'm not absolutely able, for instance, to understand the update for
> generic IRQs.

arch/um/kernel/irq.c is almost identical to the generic irq code
(ediff shows this nicely if that helps), the difference being
free_irq_by_irq_and_dev in free_irq --- which is why this was (nor
now) pushed into the UML drivers

i suspect the generated code before and after is almost identical

ideally this model needs a rework, but i don't see anyone having the
cycles to do this now so accommodating the generic-code is a sensible
solution for the interim

> For instance, Jeff rejected the mconsole-proc rewrite. So, I tried
> harder, then updated the patch to just #ifdef out his version, and
> was going to send it in.

it was later ACKed, you were cc'd on that email --- it's been merged

yes, that means we have merged code with slightly different semantics
than before --- but it also means it doesn't crash

you mentioned there is another solution to this --- i'd love to see
that before i rewrite this again to address Jeff's concern about the
requirement for /proc to be mounted (which is legitimate)

> 1) the Linux Kernel often breaks when using certain GCC versions or
> certain binutils, and has to be fixed.

this almost never happens these days, especially for i386 and amd64

i dont think i can recall it happenning in a big way for over a year
now

> But UML is a binary doing the most unusual things on the world
> around, so it must cope also with different versions of libc /
> binutils / host kernel.

UML also has to cope with ptrace changes and any semantics related to
this. there has been some flux in this area recently and it's not
over yet sadly

UML is the most complicated ptrace-using code i think ive ever seen :)
and some of the code probably relies on semantics which are not well
defined so keeping up with those changes is important

part of that might be noticing when it breaks and screaming at roland,
i dont know

> 2) Uml is often not cared by mainline developers. It was merged in
> 2.6.9 and remained unworking for ages just because Linus ignored UML
> patches for ages.

was it submitted? i really don't this the merge barrier is that high
from a maintainer for things like linux/arch/<foo> in most cases

the same is true for various drivers (it must be, so many of them are
horribly broken in places)

> And right now, if UML does not compile it's for the Ingo Molnar's
> hardirq patch and for a missed silly prototype change for a TTY api
> change (they fixed the UML user, ended up changing one UML function
> prototype, forgot to do a trivial update to one user.

UML has been described as 'abandonware' amongst other things, this
isn't completely unjustified. Any efforts I think to help keep it
more inline with the rest of the kernel to change this perception I
think are great --- if we can get enough cleanups in, we might even be
able to get more of the mainline people buildling and checking against
UML so we get let breakage in the future.

I think to do this we should first fix some of the bogons we have
before adding new code and features.

> 4) We are too few. The currently active developers (and I mean only
> the one which this month have being working on it) are:

if we can get things more inline (fixes, track mainline, dont add new
features just yet) i think we can get more people to help out

right now it's too hard (too much effort) to most people to deal with

anyhow, i think enough has been said as we are mostly in voilent
agreement, if we can keep poking away at this i think we have a pretty
good shot and making UML less of a second-class citizen

2004-10-29 00:07:44

by Blaisorblade

[permalink] [raw]
Subject: Re: [uml-devel] Re: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

On Thursday 28 October 2004 23:42, Chris Wedgwood wrote:
> On Thu, Oct 28, 2004 at 10:54:21PM +0200, Blaisorblade wrote:
> > I'm not absolutely able, for instance, to understand the update for
> > generic IRQs.

> > 1) the Linux Kernel often breaks when using certain GCC versions or
> > certain binutils, and has to be fixed.
>
> this almost never happens these days, especially for i386 and amd64
>
> i dont think i can recall it happenning in a big way for over a year
> now

Yes, it's true. That happens because with tons of developers, beta-gcc
releases are pre-tested every time.

> > But UML is a binary doing the most unusual things on the world
> > around, so it must cope also with different versions of libc /
> > binutils / host kernel.

> UML also has to cope with ptrace changes and any semantics related to
> this. there has been some flux in this area recently and it's not
> over yet sadly

> UML is the most complicated ptrace-using code i think ive ever seen :)
> and some of the code probably relies on semantics which are not well
> defined so keeping up with those changes is important
Yes, it was using SIGKILL instead of PTRACE_KILL; this gets broken by 2.6.9.

> part of that might be noticing when it breaks and screaming at roland,
> i dont know
"screaming at roland"? Ah, ok, I guess you mean Roland McGrath (roland (at)
redhat (dot) com ), who seems to be ptrace maintainer. At least he was the
author of all late ptrace changes I've seen until now.

> > 2) Uml is often not cared by mainline developers. It was merged in
> > 2.6.9 and remained unworking for ages just because Linus ignored UML
> > patches for ages.

> was it submitted? i really don't this the merge barrier is that high
> from a maintainer for things like linux/arch/<foo> in most cases
Yes, it was. It's an old, old story. You can take a look here, starting from
"24 Mar 2003".

http://user-mode-linux.sourceforge.net/diary.html

> the same is true for various drivers (it must be, so many of them are
> horribly broken in places)

> > And right now, if UML does not compile it's for the Ingo Molnar's
> > hardirq patch and for a missed silly prototype change for a TTY api
> > change (they fixed the UML user, ended up changing one UML function
> > prototype, forgot to do a trivial update to one user.

> UML has been described as 'abandonware' amongst other things, this
> isn't completely unjustified.
Well, I've seen Christoph Hellwig not particularly happy about us, see for
instance:

http://linux.bkbits.net:8080/linux-2.5/cset@41752cc9xdFXib-03VDV5akqKJZ-yA?nav=index.html|
ChangeSet@-7d

(I must admit I read one of those two emails he talks about).

I've seen instead, for instance, Jeff Garzik giving a try to UML and reporting
back.

> Any efforts I think to help keep it
> more inline with the rest of the kernel to change this perception I
> think are great --- if we can get enough cleanups in, we might even be
> able to get more of the mainline people buildling and checking against
> UML so we get let breakage in the future.

> I think to do this we should first fix some of the bogons we have
> before adding new code and features.

Well, adding new features could also attract some developers. I want to fix
bugs, but how much people will come to UML just to test CPU-hotplug on a
normal PC? That's possible (after we cleanup the SMP code, which is suffering
a lot, as for that spinlock deadlock in the UBD driver we've discussed time
ago).

> > 4) We are too few. The currently active developers (and I mean only
> > the one which this month have being working on it) are:

> if we can get things more inline (fixes, track mainline, dont add new
> features just yet) i think we can get more people to help out

> right now it's too hard (too much effort) to most people to deal with

> anyhow, i think enough has been said as we are mostly in voilent
> agreement, if we can keep poking away at this i think we have a pretty
> good shot and making UML less of a second-class citizen

I agree on what you say. In fact, I'm not adding new features right now,
mostly. There is a whole lot of things I would like to do (mostly bug fixes)
but I'm mostly working on one-liners. Sending a patch requires at least
proof-reading it and writing a meaningful changelog. Also, managing mails
takes tons of time.

Bye
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729

2004-10-29 01:18:07

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [uml-devel] Re: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

On Fri, Oct 29, 2004 at 01:49:31AM +0200, Blaisorblade wrote:

> Yes, it was using SIGKILL instead of PTRACE_KILL; this gets broken
> by 2.6.9.

the problem here is that ptrace semantics are not well defined to
anything subtle can and will break from time to time

if we can get UML in a more suitable state and perhaps get some minor
QA stuff merged (a new make target using initramfs maybe?) we could
'encourage' people to test UML more often

> Well, I've seen Christoph Hellwig not particularly happy about us,
> see for instance:
>
> http://linux.bkbits.net:8080/linux-2.5/cset@41752cc9xdFXib-03VDV5akqKJZ-yA?nav=index.html|ChangeSet@-7d

well, he is right in this case

it's hard to find a balance between keeping it working for existing
UML users (which is what i'm trying to make sure is the case) and
doing cleanups which people such as hch point out really are needed

> Sending a patch requires at least proof-reading it and writing a
> meaningful changelog. Also, managing mails takes tons of time.

im happy to take any and all fixes w/o comments in any form for now if
you want to fire them off to me

2004-10-29 06:45:33

by Werner Almesberger

[permalink] [raw]
Subject: Re: [uml-devel] Re: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

Chris Wedgwood wrote:
> the problem here is that ptrace semantics are not well defined to
> anything subtle can and will break from time to time

I wonder what the "correct" solution for this would be: write a
specification for Linux ptrace, or try to get the POSIX folks
interested ?

Given that we get subtle ptrace breakages quite regularly, it
would be nice to see this eventually get resolved. "The
implementation is the specification" doesn't seem to work well
in this case.

BTW, things have improved around UML quite a bit recently, and I
think this is to no small amount due to Paolo's work.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2004-10-29 15:03:02

by Blaisorblade

[permalink] [raw]
Subject: Re: [uml-devel] Re: Why UML often does not build (was: Re: [PATCH] UML: Build fix for TT w/o SKAS)

On Friday 29 October 2004 08:44, Werner Almesberger wrote:
> Chris Wedgwood wrote:
> > the problem here is that ptrace semantics are not well defined to
> > anything subtle can and will break from time to time
>
> I wonder what the "correct" solution for this would be: write a
> specification for Linux ptrace, or try to get the POSIX folks
> interested ?

> Given that we get subtle ptrace breakages quite regularly, it
> would be nice to see this eventually get resolved. "The
> implementation is the specification" doesn't seem to work well
> in this case.
Well, you are quite right - Linux is aimed at never breaking existing
binaries, and ptrace() does not follow that.

However, the problem here is that UML was not behaving correctly. Instead of
using the documented way, PTRACE_KILL, we just sent a SIGKILL and that
happened to work (and since PTRACE_KILL implementation just sends a SIGKILL,
you would still expect it to work).

In fact, I fixed the Gerd Knorr test program to use PTRACE_KILL and it works
on 2.6.9.

> BTW, things have improved around UML quite a bit recently, and I
> think this is to no small amount due to Paolo's work.
Thanks a lot for that, it's something very important for me, but I'm not the
only one deserving such recognition. See the amount of work done by Bodo
Stroesser in a few weeks - he solved lots of problems which I fought against
without success.

Besides that, I need to do a lot of janitorial work, while holding on more
advanced stuff - so I think that anybody could be able to help here.
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729