Hi all.
I'm probably going to regret this, but seeing the current discussion on
binary modules makes me wonder:
What does tainting actually mean?
What I mean is, how does it help to know that a kernel is tainted? When
I'm working on Software Suspend and someone sends me an oops, I don't
really care whether it's marked as tainted or not. For all I know, even if
it's not tainted, they may have thrown in half a dozen different patches
aside from Suspend, any one of which could be playing a role in the
appearance of the oops. It doesn't help me to know that the kernel was
tainted. It helps me to know what the non-standard additions are (and how
the kernel was configured), regardless of whether the additions mark the
kernel tainted or not.
Of course I realise at the same time that maybe tainting has nothing to do
with saying 'This isn't an unmodified tree' and everything to do with
saying 'This kernel has had non-GPL code interacting with it'. If that's
the case, I don't see the relevance of saying (as Paul did a little while
ago):
"You deceived maintainers who receive "untainted" bug reports."
Indeed, the surrounding lines seem to make it clear that the real issue is
not fixing bugs but politics. Thus my question: What does tainting
actually mean?
Regards,
Nigel
--
Nigel Cunningham
C/- Westminster Presbyterian Church Belconnen
61 Templeton Street, Cook, ACT 2614, Australia.
+61 (2) 6251 7727 (wk)
From: Nigel Cunningham <[email protected]>
Date: Wed, Apr 28, 2004 at 02:00:35PM +1000
> Hi all.
>
> I'm probably going to regret this, but seeing the current discussion on
> binary modules makes me wonder:
>
> What does tainting actually mean?
>
It means you can never be sure the bug is _not_ in some binary module.
It may be unprobable, you may be able to find a bug in the kernel, but
you're never _sure_.
Jurriaan
--
I am the pimple that forms before a really big date
Darkwing Duck
Debian (Unstable) GNU/Linux 2.6.6-rc2-mm2 2x6062 bogomips 0.05 0.02
Hi.
On Wed, 28 Apr 2004 06:27:42 +0200, Jurriaan <[email protected]> wrote:
> From: Nigel Cunningham <[email protected]>
> Date: Wed, Apr 28, 2004 at 02:00:35PM +1000
>> Hi all.
>>
>> I'm probably going to regret this, but seeing the current discussion on
>> binary modules makes me wonder:
>>
>> What does tainting actually mean?
>>
> It means you can never be sure the bug is _not_ in some binary module.
> It may be unprobable, you may be able to find a bug in the kernel, but
> you're never _sure_.
Is that true? We can see where the oops occurs. If it's in the module,
nothing more needs to be said. If it's in the kernel itself, we can check
our source. We could check all the calls the module makes to open source
code and validate that the parameters are correct. We should be able to
say with authority 'the module is doing the wrong thing'. We might not be
able to say exactly what, but we could determine that it is the module.
Nigel
--
Nigel Cunningham
C/- Westminster Presbyterian Church Belconnen
61 Templeton Street, Cook, ACT 2614, Australia.
+61 (2) 6251 7727 (wk)
Nigel Cunningham wrote:
> Is that true? We can see where the oops occurs. If it's in the module,
> nothing more needs to be said. If it's in the kernel itself, we can
> check our source. We could check all the calls the module makes to open
> source code and validate that the parameters are correct. We should be
> able to say with authority 'the module is doing the wrong thing'. We
> might not be able to say exactly what, but we could determine that it
> is the module.
If only it were that easy.
There has already been a case mentioned of a binary module that messed up something that was only
visible once that module was unloaded and another one loaded. It all depends totally on usage patterns.
Generally speaking, if a user is technical enough to patch their kernel, they're aware of the
possible problems and will submit bug reports with things like "kernel version blah, with the foo
and bar patches applied". The developers can then say "there's a known issue with foo/bar together".
Binary modules, on the other hand, are often loaded up by users that know just barely enough to
download them and run an install script. In this case, it can be helpful to know up front that
there has been proprietary code running in kernel space, and aside from calls to kernel APIs, we
have no clue what else it was doing, what memory was being trampled, what cpu registers were
whacked, etc.
Chris
Hi.
On Wed, 28 Apr 2004 01:19:32 -0400, Chris Friesen
<[email protected]> wrote:
> If only it were that easy.
>
> There has already been a case mentioned of a binary module that messed
> up something that was only visible once that module was unloaded and
> another one loaded. It all depends totally on usage patterns.
I don't know what module you're talking about, but surely there must be
something that could be done kernel-side to protect against such problems.
Reference counting or such like? I guess if it was a hardware issue, but
then again that might be an issue with too many assumptions being made
about prior state? Maybe I am being too naive :>
> Binary modules, on the other hand, are often loaded up by users that
> know just barely enough to download them and run an install script. In
> this case, it can be helpful to know up front that there has been
> proprietary code running in kernel space, and aside from calls to kernel
> APIs, we have no clue what else it was doing, what memory was being
> trampled, what cpu registers were whacked, etc.
Now I see your point. Of course my previous point about patches is still
valid though: the tainted flag only gives part of the picture. The person
reporting the bug might create just as much of a black box for us by
forgetting to mention that they applied patch foobar.
Regards,
Nigel
--
Nigel Cunningham
C/- Westminster Presbyterian Church Belconnen
61 Templeton Street, Cook, ACT 2614, Australia.
+61 (2) 6251 7727 (wk)
Nigel Cunningham wrote:
> What I mean is, how does it help to know that a kernel is tainted? When
> I'm working on Software Suspend and someone sends me an oops, I don't
> really care whether it's marked as tainted or not. For all I know, even
> if it's not tainted, they may have thrown in half a dozen different
> patches aside from Suspend, any one of which could be playing a role in
> the appearance of the oops. It doesn't help me to know that the kernel
The legal/moral implications of taint/binary-mods/etc. aside, I think it
may be worth putting some thought into coming up with a way to identify
which patches were applied to a kernel -- given the wide-spread use of this
method to add/remove/amend kernel functionality. Maybe there should be a
/proc/sys/kernel/patches file at runtime which would provide a list of
applied patches and some characteristics/description? When patches are
applied, there could then be a toplevel .patches file which all patch
submitters/providers/distributors would be strongly encouraged or
<form>insert your favorite coercive method or torture technique here</form>
to amend as they add their code. At build time, the makefile could then
use this file the generate some header used by the code printing out the
/proc/sys/kernel/patches. At oops time, the content of this file would also
be part of the dump.
Note: such a mechanism is not an alternative to "tainting", it's an
additional tool for trying to identify potential problems. I, for one,
would love to be able to find out which patches the various distro vendor
have found useful to include with their kernel. Currently, this takes more
time than I'm willing to invest ... at least until something serious
comes up ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Hi.
On Wed, 28 Apr 2004 02:02:55 -0400, Chris Siebenmann
<[email protected]> wrote:
> What happens when a binary module thinks it knows the size of a
> structure and is wrong? What happens when a binary module has a
> concurrency problem, in any of the many forms they manifest in the Linux
> kernel?
Good points. It could be really difficult to trace the cause of those
issues. But hard/too much effort != impossible. For every entry point to
the module we have a known state of the system prior to and after the
call. We could potentially checksum the whole of memory before and after
and find out exactly what the module has changed.
Anyway, I'm going to drop this conversation now. Work to do :>
Nigel
--
Nigel Cunningham
C/- Westminster Presbyterian Church Belconnen
61 Templeton Street, Cook, ACT 2614, Australia.
+61 (2) 6251 7727 (wk)
On Wed, 28 Apr 2004, Karim Yaghmour wrote:
> The legal/moral implications of taint/binary-mods/etc. aside, I think it
> may be worth putting some thought into coming up with a way to identify
> which patches were applied to a kernel -- given the wide-spread use of this
> method to add/remove/amend kernel functionality. Maybe there should be a
> /proc/sys/kernel/patches file at runtime which would provide a list of
> applied patches and some characteristics/description? When patches are
Or maybe we could add a line to the top level Makefile to append something
to the version number to indicate things like whether a kernel is a
prerelease and whether/which extra patches have been added.
We could call it something like "EXTRAVERSION"...
--
Just because it isn't nice doesn't make it any less a miracle.
http://users.albatross.co.nz/~psycho/ O- -><-
On Wed, Apr 28, 2004 at 01:51:20AM -0400, you [Karim Yaghmour] wrote:
>
> Nigel Cunningham wrote:
> >What I mean is, how does it help to know that a kernel is tainted? When
> >I'm working on Software Suspend and someone sends me an oops, I don't
> >really care whether it's marked as tainted or not. For all I know, even
> >if it's not tainted, they may have thrown in half a dozen different
> >patches aside from Suspend, any one of which could be playing a role in
> >the appearance of the oops. It doesn't help me to know that the kernel
>
> The legal/moral implications of taint/binary-mods/etc. aside, I think it
> may be worth putting some thought into coming up with a way to identify
> which patches were applied to a kernel -- given the wide-spread use of this
> method to add/remove/amend kernel functionality. Maybe there should be a
> /proc/sys/kernel/patches file at runtime which would provide a list of
> applied patches and some characteristics/description? When patches are
> applied, there could then be a toplevel .patches file which all patch
> submitters/providers/distributors would be strongly encouraged or
> <form>insert your favorite coercive method or torture technique here</form>
> to amend as they add their code. At build time, the makefile could then
> use this file the generate some header used by the code printing out the
> /proc/sys/kernel/patches. At oops time, the content of this file would also
> be part of the dump.
It has been suggested before:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=linux.kernel.20020312114234.GF128921%40niksula.cs.hut.fi&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3D%2BRe%253A%2B%255Bmodule%252Fpatch%255D%2Boptional%2B%252Fproc%252Fpatches%2B%253F%253F%2B%26btnG%3DSearch
but it didn't exactly raise enormeous interest.
-- v --
[email protected]
On Wednesday 28 of April 2004 08:18, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 28 Apr 2004 02:02:55 -0400, Chris Siebenmann
>
> <[email protected]> wrote:
> > What happens when a binary module thinks it knows the size of a
> > structure and is wrong? What happens when a binary module has a
> > concurrency problem, in any of the many forms they manifest in the Linux
> > kernel?
>
> Good points. It could be really difficult to trace the cause of those
> issues. But hard/too much effort != impossible. For every entry point to
> the module we have a known state of the system prior to and after the
> call. We could potentially checksum the whole of memory before and after
> and find out exactly what the module has changed.
This is not going to work. Data structures can and probably will change.
> Anyway, I'm going to drop this conversation now. Work to do :>
Good. :)
On Wed, Apr 28, 2004 at 03:18:35PM +1000, Nigel Cunningham wrote:
> On Wed, 28 Apr 2004 01:19:32 -0400, Chris Friesen <[email protected] wrote:
> >
> >There has already been a case mentioned of a binary module that messed
> >up something that was only visible once that module was unloaded and
> >another one loaded. It all depends totally on usage patterns.
>
> I don't know what module you're talking about, but surely there must be
> something that could be done kernel-side to protect against such problems.
> Reference counting or such like? I guess if it was a hardware issue, but
> then again that might be an issue with too many assumptions being made
> about prior state? Maybe I am being too naive :>
The problem is with corrupted data structures, pointers, etc. An
evil/incompetently written driver can screw up data structures long
after it has been unloaded. Historically, there was a time when a
certain set of propeitary six-letter video company beginning with 'N'
and ending with 'a' had serious bugs which would corrupt the kernel
and create random kernel panics far removed from the actual source of
the problems.
Stack overflows in a badly written device driver can overwrite task
structures and cause apparent filesystem problems which are blamed on
the hapless filesystem authors instead of where the blame properly
lies, namely the device driver author.
The thing we could do kernel-side is to implement full VM protections.
This is the microkernel approach; the problem though is the
performance overhead of having to go through protection boundaries,
setting up kernel-module-specific VM page tables, etc., etc. At some
level, if people could implement these propeitary code bases in
userspace, then there would be no need to risk corrupting internal
data structures, and no need to "taint" the kernel. But usually there
are performance reasons why the driver authors choose not to go down
that path.
- Ted
Theodore Ts'o <[email protected]> writes:
> On Wed, Apr 28, 2004 at 03:18:35PM +1000, Nigel Cunningham wrote:
>> On Wed, 28 Apr 2004 01:19:32 -0400, Chris Friesen <[email protected] wrote:
>> >
>> >There has already been a case mentioned of a binary module that messed
>> >up something that was only visible once that module was unloaded and
>> >another one loaded. It all depends totally on usage patterns.
>>
>> I don't know what module you're talking about, but surely there must be
>> something that could be done kernel-side to protect against such problems.
>> Reference counting or such like? I guess if it was a hardware issue, but
>> then again that might be an issue with too many assumptions being made
>> about prior state? Maybe I am being too naive :>
>
> The problem is with corrupted data structures, pointers, etc. An
> evil/incompetently written driver can screw up data structures long
> after it has been unloaded. Historically, there was a time when a
> certain set of propeitary six-letter video company beginning with 'N'
> and ending with 'a' had serious bugs which would corrupt the kernel
> and create random kernel panics far removed from the actual source of
> the problems.
>
> Stack overflows in a badly written device driver can overwrite task
> structures and cause apparent filesystem problems which are blamed on
> the hapless filesystem authors instead of where the blame properly
> lies, namely the device driver author.
Wouldn't the problem be just as difficult to pin to a certain module
even if the source code was open? I prefer open source modules (I
have Alpha machines), but I just can't see this argument work.
--
M?ns Rullg?rd
[email protected]
On Wed, Apr 28, 2004 at 02:48:30PM +0200, M?ns Rullg?rd wrote:
> > Stack overflows in a badly written device driver can overwrite task
> > structures and cause apparent filesystem problems which are blamed on
> > the hapless filesystem authors instead of where the blame properly
> > lies, namely the device driver author.
>
> Wouldn't the problem be just as difficult to pin to a certain module
> even if the source code was open? I prefer open source modules (I
> have Alpha machines), but I just can't see this argument work.
No. If the code is open, you can read it and find the bug - just by
reading it. If the code is closed, your only recourse is to observe
the corruption while it happens or read the assembly, which is quite a
lot more difficult.
Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/
Muli Ben-Yehuda <[email protected]> writes:
> On Wed, Apr 28, 2004 at 02:48:30PM +0200, M?ns Rullg?rd wrote:
>> > Stack overflows in a badly written device driver can overwrite task
>> > structures and cause apparent filesystem problems which are blamed on
>> > the hapless filesystem authors instead of where the blame properly
>> > lies, namely the device driver author.
>>
>> Wouldn't the problem be just as difficult to pin to a certain module
>> even if the source code was open? I prefer open source modules (I
>> have Alpha machines), but I just can't see this argument work.
>
> No. If the code is open, you can read it and find the bug - just by
> reading it. If the code is closed, your only recourse is to observe
> the corruption while it happens or read the assembly, which is quite a
> lot more difficult.
Something has to hint to as to which code to read. The usual way to
find the offending module is to remove modules until the problem goes
away. The availability of source code only matters when you have
found which module actually has the bug. If you have the source you
can fix it, otherwise you can't. If a bug in an open source module
causes random filesystem corruption people will be just as likely to
blame the filesystem code for it as if the buggy module is closed
source. This is pretty obvious, because if you don't know where the
bug actually is, the openness of that source code can't possibly make
a difference.
--
M?ns Rullg?rd
[email protected]
On Wed, Apr 28, 2004 at 03:27:00PM +0200, M?ns Rullg?rd wrote:
> Muli Ben-Yehuda <[email protected]> writes:
>
> > On Wed, Apr 28, 2004 at 02:48:30PM +0200, M?ns Rullg?rd wrote:
> >> > Stack overflows in a badly written device driver can overwrite task
> >> > structures and cause apparent filesystem problems which are blamed on
> >> > the hapless filesystem authors instead of where the blame properly
> >> > lies, namely the device driver author.
> >>
> >> Wouldn't the problem be just as difficult to pin to a certain module
> >> even if the source code was open? I prefer open source modules (I
> >> have Alpha machines), but I just can't see this argument work.
> >
> > No. If the code is open, you can read it and find the bug - just by
> > reading it. If the code is closed, your only recourse is to observe
> > the corruption while it happens or read the assembly, which is quite a
> > lot more difficult.
>
> Something has to hint to as to which code to read. The usual way to
> find the offending module is to remove modules until the problem goes
> away. The availability of source code only matters when you have
> found which module actually has the bug.
If it's closed, you may think you have found the bug, but you can't
verify. If it's open, you can.
Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/
>From Theodore Ts'o on Wednesday, 28 April, 2004:
[mucho mas snipping]
>The thing we could do kernel-side is to implement full VM protections.
>This is the microkernel approach; the problem though is the
>performance overhead of having to go through protection boundaries,
>setting up kernel-module-specific VM page tables, etc., etc. At some
>level, if people could implement these propeitary code bases in
>userspace, then there would be no need to risk corrupting internal
>data structures, and no need to "taint" the kernel. But usually there
>are performance reasons why the driver authors choose not to go down
>that path.
Would it be possible to, instead of implementing a full vm system inside
the kernel for device drivers, to provide a way to have the binary
drivers be a userland process?
I'd love to be able to keep binary drivers out of my kernel, and I
know many people harp on how hard it is to maintain binary drivers in
the Linux kernel due to the rapid evolution of Linux (namely, how fast
interfaces and structures change). If there were a way to put these
binary drivers in userspace, we could potentially solve both problems
in one swipe, no?
-Joseph
--
trelane@digitasaru.net--------------------------------------------------
"We continue to live in a world where all our know-how is locked into
binary files in an unknown format. If our documents are our corporate
memory, Microsoft still has us all condemned to Alzheimer's."
--Simon Phipps, http://theregister.com/content/4/30410.html
On Wed, 28 Apr 2004 15:18:35 +1000, Nigel Cunningham said:
> I don't know what module you're talking about, but surely there must be
> something that could be done kernel-side to protect against such problems.
> Reference counting or such like? I guess if it was a hardware issue, but
> then again that might be an issue with too many assumptions being made
> about prior state? Maybe I am being too naive :>
I once had the joy of debugging a memory overlay issue in an X.500 product,
that surfaced while porting from a "working" platform (IBM's AIX/370 product)
to IBM's AIX on the RS6K line.
The problem had the following characteristics:
It worked fine on AIX/370 (due to the way it's malloc() worked).
It worked fine on the RS6K if a debugging malloc() was used (and I tried
3 different ones).
It only crashed using the native malloc(), and the actual overlay happened
fairly early on, but the overlay's effects didn't become apparent till some 6
million (yes really) more malloc() calls allocated another 120M (yes really) on
the heap. It was going *way* off the end of an allocated array, and the
canaries allocated by the AIX/370 and debugging mallocs caused the stray store
to hit non-critical data - but it hit a pointer used by the native malloc
(actually hopping over 2 entire other structures in the process), and said
botched pointer didn't surface till free() was called on that specific
structure.
Isn't much you can do kernel-side to protect against that sort of stray
pointer, unless you're using a tagged architecture like the late Intel i432
chipset.