2002-08-02 19:30:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


[ linux-kernel cc'd, simply because I don't want to write the same thing
over and over again ]

[ Executive summary: the current large-page-patch is nothing but a magic
interface to the TLB. Don't view it as anything else, or you'll just be
confused by all the smoke and mirrors. ]

On Fri, 2 Aug 2002, Gerrit Huizenga wrote:
> > Because _none_ of the large-page codepaths are shared with _any_ of the
> > normal cases.
>
> Isn't that currently an implementation detail?

Yes and no.

We may well expand the FS layer to bigger pages, but "bigger" is almost
certainly not going to include things like 256MB pages - if for no other
reason than the fact that memory fragmentation really means that the limit
on page sizes in practice is somewhere around 128kB for any reasonable
usage patterns even with gigabytes of RAM.

And _maybe_ we might get to the single-digit megabytes. I doubt it, simply
because even with a good buddy allocator and a memory manager that
actively frees pages to get large contiguous chunks of RAM, it's basically
impossible to have something that can reliably give you that big chunks
without making normal performance go totally down the toiled.

(Yeah, once you have terabytes of memory, that worry probably ends up
largely going away. I don't think that is going to be a common enough
platform for Linux to care about in the next ten years, though).

So there are implementation issues, yes. In particular, there _is_ a push
for larger pages in the FS and generic MM layers too, but the issues there
are very different and have no basically no generality with the TLB and
page table mapping issues of the current push.

What this VM/VFS push means is that we may actually have a _different_
"large page" support on that level, where the most likely implementation
is that the "struct address_space" will at some point have a new member
that specifies the "page allocation order" for that address space. This
will allow us to do per-file allocations, so that some files (or some
filesystems) migth want to do all IO in 64kB chunks, and they'd just make
the address_space specify a page allocation order that matches that.

This is in fact one of the reasons I explicitly _want_ to keep the
interfaces separate - because there are two totally different issues at
play, and I suspect that we'll end up implementing _both_ of them, but
that they will _still_ have no commonalities.

The current largepage patch is really nothing but an interface to the TLB.
Please view it as that - a direct TLB interface that has zero impact on
the VFS or VM layers, and that is meant _purely_ as a way to expose hw
capabilities to the few applications that really really want them.

The important thing to take away from this is that _even_ if we could
change the FS and VM layers to know about a per-address_space variable-
sized PAGE_CACHE_SIZE (which I think it the long-term goal), that doesn't
impact the fact that we _also_ want to have the TLB interface.

Maybe the largepage patch could be improved upon by just renaming it, and
making clear that it's a "TLB_hugepage" thing. That's what a CPU designer
thinks of when you say "largepage" to him. Some of the confusion is
probably because a VM/FS person in an OS group does _not_ necessarily
think the same way, but thinks about doing big-granularity IO.

Linus


2002-08-03 03:16:49

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Fri, 2 Aug 2002 12:34:08 -0700 (PDT), Linus Torvalds <[email protected]> said:

Linus> We may well expand the FS layer to bigger pages, but "bigger"
Linus> is almost certainly not going to include things like 256MB
Linus> pages - if for no other reason than the fact that memory
Linus> fragmentation really means that the limit on page sizes in
Linus> practice is somewhere around 128kB for any reasonable usage
Linus> patterns even with gigabytes of RAM.

Linus> And _maybe_ we might get to the single-digit megabytes. I
Linus> doubt it, simply because even with a good buddy allocator and
Linus> a memory manager that actively frees pages to get large
Linus> contiguous chunks of RAM, it's basically impossible to have
Linus> something that can reliably give you that big chunks without
Linus> making normal performance go totally down the toiled.

The Rice people avoided some of the fragmentation problems by
pro-actively allocating a max-order physical page, even when only a
(small) virtual page was being mapped. This should work very well as
long as the total memory usage (including memory lost due to internal
fragmentation of max-order physical pages) doesn't exceed available
memory. That's not a condition which will hold for every system in
the world, but I suspect it is true for lots of systems for large
periods of time. And since superpages quickly become
counter-productive in tight-memory situations anyhow, this seems like
a very reasonable approach.

--david

2002-08-03 03:28:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Fri, 2 Aug 2002, David Mosberger wrote:
>
> The Rice people avoided some of the fragmentation problems by
> pro-actively allocating a max-order physical page, even when only a
> (small) virtual page was being mapped.

This probably works ok if
- the superpages are only slightly smaller than the smaller page
- superpages are a nice optimization.

> And since superpages quickly become
> counter-productive in tight-memory situations anyhow, this seems like
> a very reasonable approach.

Ehh.. The only people who are _really_ asking for the superpages want
almost nothing _but_ superpages. They are willing to use 80% of all memory
for just superpages.

Yes, it's Oracle etc, and the whole point for these users is to avoid
having any OS memory allocation for these areas.

Linus

2002-08-03 04:14:22

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Fri, 2 Aug 2002 20:32:10 -0700 (PDT), Linus Torvalds <[email protected]> said:

>> And since superpages quickly become counter-productive in
>> tight-memory situations anyhow, this seems like a very reasonable
>> approach.

Linus> Ehh.. The only people who are _really_ asking for the
Linus> superpages want almost nothing _but_ superpages. They are
Linus> willing to use 80% of all memory for just superpages.

Linus> Yes, it's Oracle etc, and the whole point for these users is
Linus> to avoid having any OS memory allocation for these areas.

My terminology is perhaps a bit too subtle: I user "superpage"
exclusively for the case where multiple pages get coalesced into a
larger page. The "large page" ("huge page") case that you were
talking about is different, since pages never get demoted or promoted.

I wasn't disagreeing with your case for separate large page syscalls.
Those syscalls certainly simplify implementation and, as you point
out, it well may be the case that a transparent superpage scheme never
will be able to replace the former.

--david

2002-08-03 04:22:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)



On Fri, 2 Aug 2002, David Mosberger wrote:
>
> My terminology is perhaps a bit too subtle: I user "superpage"
> exclusively for the case where multiple pages get coalesced into a
> larger page. The "large page" ("huge page") case that you were
> talking about is different, since pages never get demoted or promoted.

Ahh, ok.

> I wasn't disagreeing with your case for separate large page syscalls.
> Those syscalls certainly simplify implementation and, as you point
> out, it well may be the case that a transparent superpage scheme never
> will be able to replace the former.

Somebody already had patches for the transparent superpage thing for
alpha, which supports it. I remember seeing numbers implying that helped
noticeably.

But yes, that definitely doesn't work for humongous pages (or whatever we
should call the multi-megabyte-special-case-thing ;).

Linus

2002-08-03 04:36:16

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds <[email protected]> said:

>> I wasn't disagreeing with your case for separate large page
>> syscalls. Those syscalls certainly simplify implementation and,
>> as you point out, it well may be the case that a transparent
>> superpage scheme never will be able to replace the former.

Linus> Somebody already had patches for the transparent superpage
Linus> thing for alpha, which supports it. I remember seeing numbers
Linus> implying that helped noticeably.

Yes, I saw those. I still like the Rice work a _lot_ better. It's
just a thing of beauty, from a design point of view (disclaimer: I
haven't seen the implementation, so there may be ugly things
lurking...).

Linus> But yes, that definitely doesn't work for humongous pages (or
Linus> whatever we should call the multi-megabyte-special-case-thing
Linus> ;).

Yes, you're probably right. 2MB was reported to be fine in the Rice
experiments, but I doubt 256MB (and much less 4GB, as supported by
some CPUs) would fly.

--david

2002-08-03 05:30:26

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: David Mosberger <[email protected]>
Date: Fri, 2 Aug 2002 21:39:36 -0700

>>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds <[email protected]> said:

>> I wasn't disagreeing with your case for separate large page
>> syscalls. Those syscalls certainly simplify implementation and,
>> as you point out, it well may be the case that a transparent
>> superpage scheme never will be able to replace the former.

Linus> Somebody already had patches for the transparent superpage
Linus> thing for alpha, which supports it. I remember seeing numbers
Linus> implying that helped noticeably.

Yes, I saw those. I still like the Rice work a _lot_ better.

Now here's the thing. To me, we should be adding these superpage
syscalls to things like the implementation of malloc() :-) If you
allocate enough anonymous pages together, you should get a superpage
in the TLB if that is easy to do. Once any hint of memory pressure
occurs, you just break up the large page clusters as you hit such
ptes. This is what one of the Linux large-page implementations did
and I personally find it the most elegant way to handle the so called
"paging complexity" of transparent superpages.

At that point it's like "why the system call". If it would rather be
more of a large-page reservation system than a "optimization hint"
then these syscalls would sit better with me. Currently I think they
are superfluous. To me the hint to use large-pages is a given :-)

Stated another way, if these syscalls said "gimme large pages for this
area and lock them into memory", this would be fine. If the syscalls
say "use large pages if you can", that's crap. And in fact we could
use mmap() attribute flags if we really thought that stating this was
necessary.

2002-08-03 17:44:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)



On Fri, 2 Aug 2002, David S. Miller wrote:
>
> Now here's the thing. To me, we should be adding these superpage
> syscalls to things like the implementation of malloc() :-) If you
> allocate enough anonymous pages together, you should get a superpage
> in the TLB if that is easy to do.

For architectures that have these "small" superpages, we can just do it
transparently. That's what the alpha patches did.

The problem space is roughly the same as just page coloring.

> At that point it's like "why the system call". If it would rather be
> more of a large-page reservation system than a "optimization hint"
> then these syscalls would sit better with me. Currently I think they
> are superfluous. To me the hint to use large-pages is a given :-)

Yup.

David, you did page coloring once.

I bet your patches worked reasonably well to color into 4 or 8 colors.

How well do you think something like your old patches would work if

- you _require_ 1024 colors in order to get the TLB speedup on some
hypothetical machine (the same hypothetical machine that might
hypothetically run on 95% of all hardware ;)

- the machine is under heavy load, and heavy load is exactly when you
want this optimization to trigger.

Can you explain this difficulty to people?

> Stated another way, if these syscalls said "gimme large pages for this
> area and lock them into memory", this would be fine. If the syscalls
> say "use large pages if you can", that's crap. And in fact we could
> use mmap() attribute flags if we really thought that stating this was
> necessary.

I agree 100%.

I think we can at some point do the small cases completely transparently,
with no need for a new system call, and not even any new hint flags. We'll
just silently do 4/8-page superpages and be done with it. Programs don't
need to know about it to take advantage of better TLB usage.

Linus

2002-08-03 18:41:20

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 12:39 am, David Mosberger wrote:
> >>>>> On Fri, 2 Aug 2002 21:26:52 -0700 (PDT), Linus Torvalds
> >>>>> <[email protected]> said:
> >>
> >> I wasn't disagreeing with your case for separate large page
> >> syscalls. Those syscalls certainly simplify implementation and,
> >> as you point out, it well may be the case that a transparent
> >> superpage scheme never will be able to replace the former.
>
> Linus> Somebody already had patches for the transparent superpage
> Linus> thing for alpha, which supports it. I remember seeing numbers
> Linus> implying that helped noticeably.
>
> Yes, I saw those. I still like the Rice work a _lot_ better. It's
> just a thing of beauty, from a design point of view (disclaimer: I
> haven't seen the implementation, so there may be ugly things
> lurking...).
>

I agree, the Rice solution is ellegant in the promotion and demotion.

> Linus> But yes, that definitely doesn't work for humongous pages (or
> Linus> whatever we should call the multi-megabyte-special-case-thing
> Linus> ;).
>
> Yes, you're probably right. 2MB was reported to be fine in the Rice
> experiments, but I doubt 256MB (and much less 4GB, as supported by
> some CPUs) would fly.
>
> --david

As if the page coloring, it certainly helps.
But I'd like to point out that superpages are there to reduce the number of
TLB misses by providing larger coverage. Simply providing page coloring
will not get you there.


--
-- Hubertus Franke ([email protected])

2002-08-03 19:26:49

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sat, 3 Aug 2002 10:35:00 -0700 (PDT), Linus Torvalds <[email protected]> said:

Linus> How well do you think something like your old patches would
Linus> work if

Linus> - you _require_ 1024 colors in order to get the TLB speedup
Linus> on some hypothetical machine (the same hypothetical machine
Linus> that might hypothetically run on 95% of all hardware ;)

Linus> - the machine is under heavy load, and heavy load is exactly
Linus> when you want this optimization to trigger.

Your point about wanting databases have access to giant pages even
under memory pressure is a good one. I had not considered that
before. However, what we really are talking about then is a security
or resource policy as to who gets to allocate from a reserved and
pinned pool of giant physical pages. You don't need separate system
calls for that: with a transparent superpage framework and a
privileged & reserved giant-page pool, it's trivial to set up things
such that your favorite data base will always be able to get the giant
pages (and hence the giant TLB mappings) it wants. The only thing you
lose in the transparent case is control over _which_ pages need to use
the pinned giant pages. I can certainly imagine cases where this
would be an issue, but I kind of doubt it would be an issue for
databases.

As Dave Miller justly pointed out, it's stupid for a task not to ask
for giant pages for anonymous memory. The only reason this is not a
smart thing overall is that globally it's not optimal (it is optimal
only locally, from the task's point of view). So if the only barrier
to getting the giant pinned pages is needing to know about the new
system calls, I'll predict that very soon we'll have EVERY task in the
system allocating such pages (and LD_PRELOAD tricks make that pretty
much trivial). Then we're back to square one, because the favorite
database may not even be able to start up, because all the "reserved"
memory is already used up by the other tasks.

Clearly there needs to be some additional policies in effect, no
matter what the implementation is (the normal VM policies don't work,
because, by definition, the pinned giant pages are not pageable).

In my opinion, the primary benefit of the separate syscalls is still
ease-of-implementation (which isn't unimportant, of course).

--david

2002-08-03 19:38:06

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sat, 3 Aug 2002 14:41:29 -0400, Hubertus Franke <[email protected]> said:

Hubertus> But I'd like to point out that superpages are there to
Hubertus> reduce the number of TLB misses by providing larger
Hubertus> coverage. Simply providing page coloring will not get you
Hubertus> there.

Yes, I agree.

It appears that Juan Navarro, the primary author behind the Rice
project, is working on breaking down the superpage benefits they
observed. That would tell us how much benefit is due to page-coloring
and how much is due to TLB effects. Here in our lab, we do have some
(weak) empirical evidence that some of the SPECint benchmarks benefit
primarily from page-coloring, but clearly there are others that are
TLB limited.

--daivd

2002-08-03 19:48:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)



On Sat, 3 Aug 2002, Hubertus Franke wrote:
>
> But I'd like to point out that superpages are there to reduce the number of
> TLB misses by providing larger coverage. Simply providing page coloring
> will not get you there.

Superpages can from a memory allocation angle be seen as a very strict
form of page coloring - the problems are fairly closely related, I think
(superpages are just a lot stricter, in that it's not enough to get "any
page of color X", you have to get just the _right_ page).

Doing superpages will automatically do coloring (while the reverse is
obviously not true). And the way David did coloring a long time ago (if
I remember his implementation correctly) was the same way you'd do
superpages: just do higher order allocations.

Linus

2002-08-03 19:53:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)



On Sat, 3 Aug 2002, David Mosberger wrote:
>
> Your point about wanting databases have access to giant pages even
> under memory pressure is a good one. I had not considered that
> before. However, what we really are talking about then is a security
> or resource policy as to who gets to allocate from a reserved and
> pinned pool of giant physical pages.

Absolutely. We can't allow just anybody to allocate giant pages, since
they are a scarce resource (set up at boot time in both Ingo's and Intels
patches - with the potential to move things around later with additional
interfaces).

> You don't need separate system
> calls for that: with a transparent superpage framework and a
> privileged & reserved giant-page pool, it's trivial to set up things
> such that your favorite data base will always be able to get the giant
> pages (and hence the giant TLB mappings) it wants. The only thing you
> lose in the transparent case is control over _which_ pages need to use
> the pinned giant pages. I can certainly imagine cases where this
> would be an issue, but I kind of doubt it would be an issue for
> databases.

That's _probably_ true. There aren't that many allocations that ask for
megabytes of consecutive memory that wouldn't want to do it. However,
there might certainly be non-critical maintenance programs (with the same
privileges as the database program proper) that _do_ do large allocations,
and that we don't want to give large pages to.

Guessing is always bad, especially since the application certainly does
know what it wants.

Linus

2002-08-03 20:53:32

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 03:41 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 14:41:29 -0400, Hubertus Franke
> >>>>> <[email protected]> said:
>
> Hubertus> But I'd like to point out that superpages are there to
> Hubertus> reduce the number of TLB misses by providing larger
> Hubertus> coverage. Simply providing page coloring will not get you
> Hubertus> there.
>
> Yes, I agree.
>
> It appears that Juan Navarro, the primary author behind the Rice
> project, is working on breaking down the superpage benefits they
> observed. That would tell us how much benefit is due to page-coloring
> and how much is due to TLB effects. Here in our lab, we do have some
> (weak) empirical evidence that some of the SPECint benchmarks benefit
> primarily from page-coloring, but clearly there are others that are
> TLB limited.
>
> --daivd

Cool.
Does that mean that BSD already has page coloring implemented ?

The agony is:
Page Coloring helps to reduce cache conflicts in low associative caches
while large pages may reduce TLB overhead.

One shouldn't rule out one for the other, there is a place for both.

How did you arrive to the (weak) empirical evidence?
You checked TLB misses and cache misses and turned
page coloring on and off and large pages on and off?

--
-- Hubertus Franke ([email protected])

2002-08-03 21:14:57

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sat, 3 Aug 2002 12:43:47 -0700 (PDT), Linus Torvalds <[email protected]> said:

>> You don't need separate system calls for that: with a transparent
>> superpage framework and a privileged & reserved giant-page pool,
>> it's trivial to set up things such that your favorite data base
>> will always be able to get the giant pages (and hence the giant
>> TLB mappings) it wants. The only thing you lose in the
>> transparent case is control over _which_ pages need to use the
>> pinned giant pages. I can certainly imagine cases where this
>> would be an issue, but I kind of doubt it would be an issue for
>> databases.

Linus> That's _probably_ true. There aren't that many allocations
Linus> that ask for megabytes of consecutive memory that wouldn't
Linus> want to do it. However, there might certainly be non-critical
Linus> maintenance programs (with the same privileges as the
Linus> database program proper) that _do_ do large allocations, and
Linus> that we don't want to give large pages to.

Linus> Guessing is always bad, especially since the application
Linus> certainly does know what it wants.

Yes, but that applies even to a transparent superpage scheme: in those
instances where an application knows what page size is optimal, it's
better if the application can express that (saves time
promoting/demoting pages needlessly). It's not unlike madvise() or
the readahead() syscall: use reasonable policies for the ordinary
apps, and provide the means to let the smart apps tell the kernel
exactly what they need.

--david

2002-08-03 21:22:59

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sat, 3 Aug 2002 16:53:39 -0400, Hubertus Franke <[email protected]> said:

Hubertus> Cool. Does that mean that BSD already has page coloring
Hubertus> implemented ?

FreeBSD (at least on Alpha) makes some attempts at page-coloring, but
it's said to be far from perfect.

Hubertus> The agony is: Page Coloring helps to reduce cache
Hubertus> conflicts in low associative caches while large pages may
Hubertus> reduce TLB overhead.

Why agony? The latter helps the TLB _and_ solves the page coloring
problem (assuming the largest page size is bigger than the largest
cache; yeah, I see that could be a problem on some Power 4
machines... ;-)

Hubertus> One shouldn't rule out one for the other, there is a place
Hubertus> for both.

Hubertus> How did you arrive to the (weak) empirical evidence? You
Hubertus> checked TLB misses and cache misses and turned page
Hubertus> coloring on and off and large pages on and off?

Yes, that's basically what we did (there is a patch implementing a
page coloring kernel module floating around).

--david

2002-08-03 21:49:52

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 05:26 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 16:53:39 -0400, Hubertus Franke
> >>>>> <[email protected]> said:
>
> Hubertus> Cool. Does that mean that BSD already has page coloring
> Hubertus> implemented ?
>
> FreeBSD (at least on Alpha) makes some attempts at page-coloring, but
> it's said to be far from perfect.
>
> Hubertus> The agony is: Page Coloring helps to reduce cache
> Hubertus> conflicts in low associative caches while large pages may
> Hubertus> reduce TLB overhead.
>
> Why agony? The latter helps the TLB _and_ solves the page coloring
> problem (assuming the largest page size is bigger than the largest
> cache; yeah, I see that could be a problem on some Power 4
> machines... ;-)
>

In essense, remember page coloring preserves the same bits used
for cache indexing from virtual to physical. If these bits are covered
by the large page, then ofcourse you will get page coloring for free
otherwise you won't.
Also, page coloring is mainly helpful in low associativity caches.
>From my recollection of the literature, for 4-way or higher its not
worth the trouble.

Just to rephrase:
- Large pages almost always solve your page coloring problem.
- Page coloring never solves your TLB coverage problem.

> Hubertus> One shouldn't rule out one for the other, there is a place
> Hubertus> for both.
>
> Hubertus> How did you arrive to the (weak) empirical evidence? You
> Hubertus> checked TLB misses and cache misses and turned page
> Hubertus> coloring on and off and large pages on and off?
>
> Yes, that's basically what we did (there is a patch implementing a
> page coloring kernel module floating around).
>
> --david

--
-- Hubertus Franke ([email protected])

2002-08-03 21:53:05

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 05:18 pm, David Mosberger wrote:
> >>>>> On Sat, 3 Aug 2002 12:43:47 -0700 (PDT), Linus Torvalds
> >>>>> <[email protected]> said:
> >>
> >> You don't need separate system calls for that: with a transparent
> >> superpage framework and a privileged & reserved giant-page pool,
> >> it's trivial to set up things such that your favorite data base
> >> will always be able to get the giant pages (and hence the giant
> >> TLB mappings) it wants. The only thing you lose in the
> >> transparent case is control over _which_ pages need to use the
> >> pinned giant pages. I can certainly imagine cases where this
> >> would be an issue, but I kind of doubt it would be an issue for
> >> databases.
>
> Linus> That's _probably_ true. There aren't that many allocations
> Linus> that ask for megabytes of consecutive memory that wouldn't
> Linus> want to do it. However, there might certainly be non-critical
> Linus> maintenance programs (with the same privileges as the
> Linus> database program proper) that _do_ do large allocations, and
> Linus> that we don't want to give large pages to.
>
> Linus> Guessing is always bad, especially since the application
> Linus> certainly does know what it wants.
>
> Yes, but that applies even to a transparent superpage scheme: in those
> instances where an application knows what page size is optimal, it's
> better if the application can express that (saves time
> promoting/demoting pages needlessly). It's not unlike madvise() or
> the readahead() syscall: use reasonable policies for the ordinary
> apps, and provide the means to let the smart apps tell the kernel
> exactly what they need.
>
> --david

So that's what is/can-be done through the madvice call or a flag on MMAP().
Force a specific size and policy. Why do you need a new system call.

The Rice paper solved this reasonably elegant. Reservation and check
after a while. If you didn't use reserved memory, you loose it, this is the
auto promotion/demotion.

For special apps one provides the interface using madvice().
--
-- Hubertus Franke ([email protected])

2002-08-04 00:38:26

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: Linus Torvalds <[email protected]>
Date: Sat, 3 Aug 2002 10:35:00 -0700 (PDT)

David, you did page coloring once.

I bet your patches worked reasonably well to color into 4 or 8 colors.

How well do you think something like your old patches would work if

- you _require_ 1024 colors in order to get the TLB speedup on some
hypothetical machine (the same hypothetical machine that might
hypothetically run on 95% of all hardware ;)

- the machine is under heavy load, and heavy load is exactly when you
want this optimization to trigger.

Can you explain this difficulty to people?

Actually, we need some clarification here. I tried coloring several
times, the problem with my diffs is that I tried to do the coloring
all the time no matter what.

I wanted strict coloring on the 2-color level for broken L1 caches
that have aliasing problems. If I could make this work, all of the
dumb cache flushing I have to do on Sparcs could be deleted. Because
of this, I couldn't legitimately change the cache flushing rules
unless I had absolutely strict coloring done on all pages where it
mattered (basically anything that could end up in the user's address
space).

So I kept track of color existence precisely in the page lists. The
implementation was fast, but things got really bad fragmentation wise.

No matter how I tweaked things, just running a kernel build 40 or 50
times would fragment the free page lists to shreds such that 2-order
and up pages simply did not exist.

Another person did an implementation of coloring which basically
worked by allocating a big-order chunk and slicing that up. It's not
strictly done and that is why his version works better. In fact I
like that patch a lot and it worked quite well for L2 coloring on
sparc64. Any time there is page pressure, he tosses away all of the
color carving big-order pages.

I think we can at some point do the small cases completely transparently,
with no need for a new system call, and not even any new hint flags. We'll
just silently do 4/8-page superpages and be done with it. Programs don't
need to know about it to take advantage of better TLB usage.

Ok. I think even 64-page ones are viable to attempt but we'll see.
Most TLB's that do superpages seem to have a range from the base
page size to the largest supported superpage with 2-powers of two
being incrememnted between each supported size.

For example on Sparc64 this is:

8K PAGE_SIZE
64K PAGE_SIZE * 8
512K PAGE_SIZE * 64
4M PAGE_SIZE * 512

One of the transparent large page implementations just defined a
small array that the core code used to try and see "hey how big
a superpage can we try" and if the largest for the area failed
(because page orders that large weren't available) it would simply
fall back to the next smallest superpage size.

2002-08-04 00:46:15

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: Hubertus Franke <[email protected]>
Date: Sat, 3 Aug 2002 17:54:30 -0400

The Rice paper solved this reasonably elegant. Reservation and check
after a while. If you didn't use reserved memory, you loose it, this is the
auto promotion/demotion.

I keep seeing this Rice stuff being mentioned over and over,
can someone post a URL pointer to this work?

2002-08-04 00:43:48

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: Hubertus Franke <[email protected]>
Date: Sat, 3 Aug 2002 16:53:39 -0400

Does that mean that BSD already has page coloring implemented ?

FreeBSD has had page coloring for quite some time.

Because they don't use buddy lists and don't allow higher-order
allocations fundamentally in the page allocator, they don't have
to deal with all the buddy fragmentation issues we do.

On the other hand, since higher-order page allocations are not
a fundamental operation it might be more difficult for FreeBSD
to implement superpage support efficiently like we can with
the buddy lists.

2002-08-04 00:41:49

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: Linus Torvalds <[email protected]>
Date: Sat, 3 Aug 2002 12:39:40 -0700 (PDT)

And the way David did coloring a long time ago (if
I remember his implementation correctly) was the same way you'd do
superpages: just do higher order allocations.

Although it wasn't my implementation which did this,
one of them did do it this way. I agree that it is
the nicest way to do coloring.

2002-08-04 00:41:57

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: David Mosberger <[email protected]>
Date: Sat, 3 Aug 2002 12:41:33 -0700

It appears that Juan Navarro, the primary author behind the Rice
project, is working on breaking down the superpage benefits they
observed. That would tell us how much benefit is due to page-coloring
and how much is due to TLB effects. Here in our lab, we do have some
(weak) empirical evidence that some of the SPECint benchmarks benefit
primarily from page-coloring, but clearly there are others that are
TLB limited.

There was some comparison done between large-page vs. plain
page coloring for a bunch of scientific number crunchers.

Only one benefitted from page coloring and not from TLB
superpage use.

The ones that benefitted from both coloring and superpages, the
superpage gain was about equal to the coloring gain. Basically,
superpages ended up giving the necessary coloring :-)

Search for the topic "Areas for superpage discussion" in the
[email protected] list archives, it has pointers to
all the patches and test programs involved.

2002-08-04 02:22:17

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sat, 03 Aug 2002 17:35:30 -0700 (PDT), "David S. Miller" <[email protected]> said:

DaveM> From: Hubertus Franke <[email protected]> Date: Sat,
DaveM> 3 Aug 2002 17:54:30 -0400

DaveM> The Rice paper solved this reasonably elegant. Reservation
DaveM> and check after a while. If you didn't use reserved memory,
DaveM> you loose it, this is the auto promotion/demotion.

DaveM> I keep seeing this Rice stuff being mentioned over and over,
DaveM> can someone post a URL pointer to this work?

Sure thing. It's the first link under "Publications" at this URL:

http://www.cs.rice.edu/~jnavarro/

--david

2002-08-04 17:17:54

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 10:25 pm, David Mosberger wrote:
> >>>>> On Sat, 03 Aug 2002 17:35:30 -0700 (PDT), "David S. Miller"
> >>>>> <[email protected]> said:
>
> DaveM> From: Hubertus Franke <[email protected]> Date: Sat,
> DaveM> 3 Aug 2002 17:54:30 -0400
>
> DaveM> The Rice paper solved this reasonably elegant. Reservation
> DaveM> and check after a while. If you didn't use reserved memory,
> DaveM> you loose it, this is the auto promotion/demotion.
>
> DaveM> I keep seeing this Rice stuff being mentioned over and over,
> DaveM> can someone post a URL pointer to this work?
>
> Sure thing. It's the first link under "Publications" at this URL:
>
> http://www.cs.rice.edu/~jnavarro/
>
> --david

Also in this context:

"Implemenation of Multiple Pagesize Support in HP-UX"
http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/subramanian/subramanian.pdf

"General Purpose Operating System Support for Multiple Page Sizes"
htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf

--
-- Hubertus Franke ([email protected])

2002-08-04 17:24:20

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 08:31 pm, David S. Miller wrote:
> From: David Mosberger <[email protected]>
> Date: Sat, 3 Aug 2002 12:41:33 -0700
>
> It appears that Juan Navarro, the primary author behind the Rice
> project, is working on breaking down the superpage benefits they
> observed. That would tell us how much benefit is due to page-coloring
> and how much is due to TLB effects. Here in our lab, we do have some
> (weak) empirical evidence that some of the SPECint benchmarks benefit
> primarily from page-coloring, but clearly there are others that are
> TLB limited.
>
> There was some comparison done between large-page vs. plain
> page coloring for a bunch of scientific number crunchers.
>
> Only one benefitted from page coloring and not from TLB
> superpage use.
>

I would expect that from scientific apps, which often go through their
dataset in a fairy regular pattern. If sequential, then page coloring
is at its best, because your cache can become the limiting factor, if
you can't squeeze data into the cache due to false sharing in the same
cache class.

The way I see page coloring is that any hard work done in virtual space
(either by compiler or by app writer [ latter holds for numerical apps ])
to be cache friendly, is not circumvented by a <stupid> physical page
assignment by the OS that leads to less than complete cache utilization.
That's why the cache index bits from the address are carried over or
are kept the same in virtual and physical address. That's the purpose of
page coloring.....

This regular access pattern is not necessarily true in apps like JVM or other
object oriented code where data accesses can be less predictive. There page
coloring might not help you at all.

> The ones that benefitted from both coloring and superpages, the
> superpage gain was about equal to the coloring gain. Basically,
> superpages ended up giving the necessary coloring :-)
>
> Search for the topic "Areas for superpage discussion" in the
> [email protected] list archives, it has pointers to
> all the patches and test programs involved.


--
-- Hubertus Franke ([email protected])

2002-08-04 17:35:09

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Saturday 03 August 2002 08:28 pm, David S. Miller wrote:
> From: Linus Torvalds <[email protected]>
> Date: Sat, 3 Aug 2002 10:35:00 -0700 (PDT)
>
> David, you did page coloring once.
>
> I bet your patches worked reasonably well to color into 4 or 8 colors.
>
> How well do you think something like your old patches would work if
>
> - you _require_ 1024 colors in order to get the TLB speedup on some
> hypothetical machine (the same hypothetical machine that might
> hypothetically run on 95% of all hardware ;)
>
> - the machine is under heavy load, and heavy load is exactly when you
> want this optimization to trigger.
>
> Can you explain this difficulty to people?
>
> Actually, we need some clarification here. I tried coloring several
> times, the problem with my diffs is that I tried to do the coloring
> all the time no matter what.
>
> I wanted strict coloring on the 2-color level for broken L1 caches
> that have aliasing problems. If I could make this work, all of the
> dumb cache flushing I have to do on Sparcs could be deleted. Because
> of this, I couldn't legitimately change the cache flushing rules
> unless I had absolutely strict coloring done on all pages where it
> mattered (basically anything that could end up in the user's address
> space).
>
> So I kept track of color existence precisely in the page lists. The
> implementation was fast, but things got really bad fragmentation wise.
>
> No matter how I tweaked things, just running a kernel build 40 or 50
> times would fragment the free page lists to shreds such that 2-order
> and up pages simply did not exist.
>
> Another person did an implementation of coloring which basically
> worked by allocating a big-order chunk and slicing that up. It's not
> strictly done and that is why his version works better. In fact I
> like that patch a lot and it worked quite well for L2 coloring on
> sparc64. Any time there is page pressure, he tosses away all of the
> color carving big-order pages.
>
> I think we can at some point do the small cases completely
> transparently, with no need for a new system call, and not even any new
> hint flags. We'll just silently do 4/8-page superpages and be done with it.
> Programs don't need to know about it to take advantage of better TLB usage.
>
> Ok. I think even 64-page ones are viable to attempt but we'll see.
> Most TLB's that do superpages seem to have a range from the base
> page size to the largest supported superpage with 2-powers of two
> being incrememnted between each supported size.
>
> For example on Sparc64 this is:
>
> 8K PAGE_SIZE
> 64K PAGE_SIZE * 8
> 512K PAGE_SIZE * 64
> 4M PAGE_SIZE * 512
>
> One of the transparent large page implementations just defined a
> small array that the core code used to try and see "hey how big
> a superpage can we try" and if the largest for the area failed
> (because page orders that large weren't available) it would simply
> fall back to the next smallest superpage size.


Well, that's exactly what we do !!!!

We also ensure that if one process opens with basic page size and
the next one opens with super page size that we appropriately map
the second one to smaller pages to avoid conflict in case of shared
memory or memory mapped files.

As of the page coloring !
Can we tweak the buddy allocator to give us this additional functionality?
Seems like we can have a free-list per color and if that's empty we go back to
the buddy sys. There we should be able to do some magic based on the bit maps
to figure out where which page is to be used that fits the right color?

Fragmentation is an issue.

--
-- Hubertus Franke ([email protected])


2002-08-04 18:48:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Sun, 4 Aug 2002, Hubertus Franke wrote:
>
> As of the page coloring !
> Can we tweak the buddy allocator to give us this additional functionality?

I would really prefer to avoid this, and get "95% coloring" by just doing
read-ahead with higher-order allocations instead of the current "loop
allocation of one block".

I bet that you will get _practically_ perfect coloring with just two small
changes:

- do_anonymous_page() looks to see if the page tables are empty around
the faulting address (and check vma ranges too, of course), and
optimistically does a non-blocking order-X allocation.

If the order-X allocation fails, we're likely low on memory (this is
_especially_ true since the very fact that we do lots of order-X
allocations will probably actually help keep fragementation down
normally), and we just allocate one page (with a regular GFP_USER this
time).

Map in all pages.

- do the same for page_cache_readahead() (this, btw, is where radix trees
will kick some serious ass - we'd have had a hard time doing the "is
this range of order-X pages populated" efficiently with the old hashes.

I bet just those fairly small changes will give you effective coloring,
_and_ they are also what you want for doing small superpages.

And no, I do not want separate coloring support in the allocator. I think
coloring without superpage support is stupid and worthless (and
complicates the code for no good reason).

Linus

2002-08-04 19:28:59

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
> On Sun, 4 Aug 2002, Hubertus Franke wrote:
> > As of the page coloring !
> > Can we tweak the buddy allocator to give us this additional
> > functionality?
>
> I would really prefer to avoid this, and get "95% coloring" by just doing
> read-ahead with higher-order allocations instead of the current "loop
> allocation of one block".
>
Yes, if we (correctly) assume that page coloring only buys you significant
benefits for small associative caches (e.g. <4 or <= 8).

> I bet that you will get _practically_ perfect coloring with just two small
> changes:
>
> - do_anonymous_page() looks to see if the page tables are empty around
> the faulting address (and check vma ranges too, of course), and
> optimistically does a non-blocking order-X allocation.
>
As long as the alignments are observed, which you I guess imply by the range.

> If the order-X allocation fails, we're likely low on memory (this is
> _especially_ true since the very fact that we do lots of order-X
> allocations will probably actually help keep fragementation down
> normally), and we just allocate one page (with a regular GFP_USER this
> time).
>
Correct.

> Map in all pages.
>
> - do the same for page_cache_readahead() (this, btw, is where radix trees
> will kick some serious ass - we'd have had a hard time doing the "is
> this range of order-X pages populated" efficiently with the old hashes.
>

Hey, we use the radix tree to track page cache mappings for large pages
particularly for this reason...

> I bet just those fairly small changes will give you effective coloring,
> _and_ they are also what you want for doing small superpages.
>

Well, in what you described above there is no concept of superpages
the way it is defined for the purpose of <tracking> and <TLB overhead
reduction>.
If you don't know about super pages at the VM level, then you need to
deal with them at TLB fault level to actually create the <large TLB>
entry. That what the INTC patch will do, namely throughing all the
complexity over the fence for the page fault.
In your case not keeping track of the super pages in the
VM layer and PT layer requires to discover the large page at soft TLB
time by scanning PT proximity for contigous pages if we are talking now
about the read_ahead ....
In our case, we store the same physical address of the super page
in the PTEs spanning the superpage together with the page order.
At software TLB time we simply extra the single PTE from the PT based
on the faulting address and move it into the TLB. This ofcourse works only
for software TLBs (PwrPC, MIPS, IA64). For HW TLB (x86) the PT structure
by definition overlaps the large page size support.
The HW TLB case can be extended to not store the same PA in all the PTEs,
but conceptually carry the superpage concept for the purpose described above.

We have that concept exactly the way you want it, but the dress code
seems to be wrong. That can be worked on.
Our goal was in the long run 2.7 to explore the Rice approach to see
whether it yields benefits or whether we getting down the road of
fragmentation reduction overhead that will kill all the benefits we get
from reduced TLB overhead. Time would tell.

But to go down this route we need the concept of a superpage in the VM,
not just at TLB time or a hack that throws these things over the fence.


> And no, I do not want separate coloring support in the allocator. I think
> coloring without superpage support is stupid and worthless (and
> complicates the code for no good reason).
>
> Linus

That <stupid> seems premature. You are mixing the concept of
superpage from a TLB miss reduction perspective
with the concept of superpage for page coloring.

In a low associative cache (<=4) you have a larger number of colors (~100s)
To be reasonable effective you need to provide these large
number of colors, that could be quite a waste of memory if you do it
only through super pages.
On the otherhand, if you simply try to get a page from a targeted class X
you can solve this problem one page at a time. This still makes sense.
Last you can move these two approaches together by providing small
conceptual super pages (nothing or not necessarily anything to do with your
TLB at this point) and provide a smaller number of classes from where
superpages will be allocated. I hope you meant the latter one when
referring to <stupid>.
Eitherway, you need the concept of a superpage IMHO in the VM to
support all this stuff.

And we got just the right stuff for you :-).
Again the final dress code and capabilities are still up for discussion.

Bill Irwin and I are working on moving Simon's 2.4.18 patch up to 2.5.30.
Clean up some of the stuff and make sure that the integration with the latest
radix tree and writeback functionality is proper.
There aren't that many major changes. We hope to have something for
review soon.

Cheers.
--
-- Hubertus Franke ([email protected])

2002-08-04 19:38:45

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, 4 Aug 2002, Linus Torvalds wrote:
> On Sun, 4 Aug 2002, Hubertus Franke wrote:
> >
> > As of the page coloring !
> > Can we tweak the buddy allocator to give us this additional functionality?
>
> I would really prefer to avoid this, and get "95% coloring" by just doing
> read-ahead with higher-order allocations instead of the current "loop
> allocation of one block".

OK, now I'm really going to start on some code to try and free
physically contiguous pages when a higher-order allocation comes
in ;)

(well, after this hamradio rpm I started)

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-04 20:23:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> As long as the alignments are observed, which you I guess imply by the range.

On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>> If the order-X allocation fails, we're likely low on memory (this is
>> _especially_ true since the very fact that we do lots of order-X
>> allocations will probably actually help keep fragementation down
>> normally), and we just allocate one page (with a regular GFP_USER this
>> time).

Later on I can redo one of the various online defragmentation things
that went around last October or so if it would help with this.


On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>> Map in all pages.
>> - do the same for page_cache_readahead() (this, btw, is where radix trees
>> will kick some serious ass - we'd have had a hard time doing the "is
>> this range of order-X pages populated" efficiently with the old hashes.

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> Hey, we use the radix tree to track page cache mappings for large pages
> particularly for this reason...

Proportion of radix tree populated beneath a given node can be computed
by means of traversals adding up ->count or by incrementally maintaining
a secondary counter for ancestors within the radix tree node. I can look
into this when I go over the path compression heuristics, which would
help the space consumption for access patterns fooling the current one.
Getting physical contiguity out of that is another matter, but the code
can be used for other things (e.g. exec()-time prefaulting) until that's
worked out, and it's not a focus or requirement of this code anyway.


On Sunday 04 August 2002 02:38 pm, Linus Torvalds wrote:
>> I bet just those fairly small changes will give you effective coloring,
>> _and_ they are also what you want for doing small superpages.

On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> The HW TLB case can be extended to not store the same PA in all the PTEs,
> but conceptually carry the superpage concept for the purpose described above.

Pagetable walking gets a tiny hook, not much interesting goes on there.
A specialized wrapper for extracting physical pfn's from the pmd's like
the one for testing whether they're terminal nodes might look more
polished, but that's mostly cosmetic.

Hmm, from looking at the "small" vs. "large" page bits, I have an
inkling this may be relative to the machine size. 256GB boxen will
probably think of 4MB pages as small.


On Sun, Aug 04, 2002 at 03:30:24PM -0400, Hubertus Franke wrote:
> But to go down this route we need the concept of a superpage in the VM,
> not just at TLB time or a hack that throws these things over the fence.

The bit throwing it over the fence is probably still useful, as Oracle
knows what it's doing and I suspect it's largely to dodge pagetable
space consumption OOM'ing machines as opposed to optimizing anything.
It pretty much wants the kernel out of the way aside from as a big bag
of device drivers, so I'm not surprised they're more than happy to have
the MMU in their hands too. The more I think about it, the less related
to superpages it seems. The motive for superpages is 100% TLB, not a
workaround for pagetable OOM.


Cheers,
Bill

2002-08-05 05:50:38

by David Miller

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

From: Hubertus Franke <[email protected]>
Date: Sun, 4 Aug 2002 13:31:24 -0400

Can we tweak the buddy allocator to give us this additional functionality?

Absolutely not, it's a total lose.

I have tried at least 5 times to make it work without fragmenting the
buddy lists to shit. I channege you to code one up that works without
fragmenting things to shreds. Just run an endless kernel build over
and over in a loop for a few hours to a day. If the buddy lists are
not fragmented after these runs, then you have succeeded in my
challenge.

Do not even reply to this email without meeting the challenge as it
will fall on deaf ears. I've been there and I've done that, and at
this point code talks bullshit walks when it comes to trying to
colorize the buddy allocator in a way that actually works and isn't
disgusting.

2002-08-05 16:55:54

by David Mosberger

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

>>>>> On Sun, 4 Aug 2002 15:30:24 -0400, Hubertus Franke <[email protected]> said:

Hubertus> Yes, if we (correctly) assume that page coloring only buys
Hubertus> you significant benefits for small associative caches
Hubertus> (e.g. <4 or <= 8).

This seems to be a popular misconception. Yes, page-coloring
obviously plays no role as long as your cache no bigger than
PAGE_SIZE*ASSOCIATIVITY. IIRC, Xeon can have up to 1MB of cache and I
bet that it doesn't have a 1MB/4KB=256-way associative cache. Thus,
I'm quite confident that it's possible to observe significant
page-coloring effects even on a Xeon.

--david

2002-08-05 17:19:48

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Monday 05 August 2002 12:59 pm, David Mosberger wrote:
> >>>>> On Sun, 4 Aug 2002 15:30:24 -0400, Hubertus Franke
> >>>>> <[email protected]> said:
>
> Hubertus> Yes, if we (correctly) assume that page coloring only buys
> Hubertus> you significant benefits for small associative caches
> Hubertus> (e.g. <4 or <= 8).
>
> This seems to be a popular misconception. Yes, page-coloring
> obviously plays no role as long as your cache no bigger than
> PAGE_SIZE*ASSOCIATIVITY. IIRC, Xeon can have up to 1MB of cache and I
> bet that it doesn't have a 1MB/4KB=256-way associative cache. Thus,
> I'm quite confident that it's possible to observe significant
> page-coloring effects even on a Xeon.
>
> --david

The wording was "significant" benefits.
The point is/was that as your associativity goes up, the likelihood of
full cache occupancy increases, with cache thrashing in each class decreasing.
Would have to dig through the literature to figure out at what point
the benefits are insignificant (<1 %) wrt page coloring.

I am probably missing something in your argument?
How is the Xeon cache indexed (bits), what's the cache line size ?
My assumptions are as follows.

Take the bits of an address to be two different bit assignments.

< PG , PGOFS > with PG=V,X and PGOFS=<Y,Z> => < <V, X>, Y, Z >
where Z is the cacheline size,
<X,Y> is used to index the cache (that is not strictly required to be
contiguous, but apparently many arch do it that way).
Page coloring should guarantee that X remains the same in the virtual and the
physical address assigned to it.
As your associativity goes up, your number of rows (colors) in the cache comes
down !!

We can take this offline as to not bother the rest, your call. Just interested
in flushing out the arguments.

--
-- Hubertus Franke ([email protected])

2002-08-05 21:09:13

by Jamie Lokier

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

Hubertus Franke wrote:
> The wording was "significant" benefits. The point is/was that as your
> associativity goes up, the likelihood of full cache occupancy
> increases, with cache thrashing in each class decreasing.
> Would have to dig through the literature to figure out at what point
> the benefits are insignificant (<1 %) wrt page coloring.

One of the benefits of page colouring may be that a program's run time
may be expected to vary less from run to run?

In the old days (6 years ago), I found that a video game I was working
on would vary in its peak frame rate by about 3-5% (I don't recall
exactly). Once the program was started, it would remain operating at
the peak frame rate it had selected, and killing and restarting the
program didn't often make a difference either. In DOS, the same program
always ran at a consistent frame rate (higher than Linux as it happens).
The actual number of objects executing in the program, and the amount of
memory allocated, were deterministic in these tests.

This is pointing at a cache colouring issue to me -- although quite
which cache I am not sure. I suppose it could have been something to do
with Linux' VM page scanner access patterns into the page array instead.

-- Jamie

2002-08-09 15:18:57

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> "General Purpose Operating System Support for Multiple Page Sizes"
> htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf

This reference describes roughly what I had in mind for active
defragmentation, which depends on reverse mapping. The main additional
wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
which means the caller promises not to pin the allocation unit for long
periods and does not mind if the underlying physical page changes
spontaneously. Defragmenting in this zone is straightforward.

--
Daniel

2002-08-09 16:04:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Fri, 9 Aug 2002, Daniel Phillips wrote:
>
> This reference describes roughly what I had in mind for active
> defragmentation, which depends on reverse mapping.

Note that even active defrag won't be able to handle the case where you
want have lots of big pages, consituting a large percentage of available
memory.

Not unless you think I am crazy enough to do garbage collection on kernel
data structures (repeat after me: "garbage collection is stupid, slow, bad
for caches, and only for people who cannot count").

Also, I think the jury (ie Andrew) is still out on whether rmap is worth
it.

Linus

2002-08-09 16:14:13

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 17:56, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping.
>
> Note that even active defrag won't be able to handle the case where you
> want have lots of big pages, consituting a large percentage of available
> memory.

Perhaps I'm missing something, but I don't see why.

> Not unless you think I am crazy enough to do garbage collection on kernel
> data structures (repeat after me: "garbage collection is stupid, slow, bad
> for caches, and only for people who cannot count").

Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE
earlier) and so would be allocated outside ZONE_LARGE.

> Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> it.

Tell me about it. Well, I feel strongly enough about it to spend the next
week coding yet another pte chain optimization.

--
Daniel

2002-08-09 16:24:30

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Fri, 9 Aug 2002, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> >
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping.
>
> Note that even active defrag won't be able to handle the case where you
> want have lots of big pages, consituting a large percentage of available
> memory.
>
> Not unless you think I am crazy enough to do garbage collection on kernel
> data structures (repeat after me: "garbage collection is stupid, slow, bad
> for caches, and only for people who cannot count").

It's also necessary if you want to prevent death by physical
memory exhaustion since it's pretty easy to construct workloads
where the page table memory requirement is larger than physical
memory.

OTOH, I also think that it's (probably, almost certainly) not
worth doing active defragmenting for really huge superpages.
This category of garbage collection just gets into the 'rediculous'
class ;)

> Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> it.

One problem we're running into here is that there are absolutely
no tools to measure some of the things rmap is supposed to fix,
like page replacement.

Sure, Craig Kulesa's tests all went faster on rmap than on the
virtual scanning VM, but that's just one application. There doesn't
seem to exist any kind of tool to quantify things like "quality
of page replacement" or even "efficiency of page replacement" ...

I suspect this is true for many pieces of the kernel, no tools
available to measure the benefits of the code, but only tools
to microbenchmark the _overhead_ of the code...

kind regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-09 16:28:19

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Fri, 9 Aug 2002, Daniel Phillips wrote:
> On Friday 09 August 2002 17:56, Linus Torvalds wrote:

> > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > it.
>
> Tell me about it. Well, I feel strongly enough about it to spend the
> next week coding yet another pte chain optimization.

Well yes, we've _seen_ that 2.4 -rmap improves system behaviour,
but we don't have any tools to _quantify_ that improvement.

As long as the only measurable thing is the overhead (which may
get close to zero, but will never become zero) the numbers will
continue being against rmap. Not because of rmap, but just
because the overhead is the only thing being measured ;)

Personally I'll spend some more time just improving the behaviour
of the VM, even if we don't have tools to quantify the improvement.

Somehow there seems to be a lack of meaningful "macrobenchmarks" ;)

(as opposed to microbenchmarks, which can don't always have a
relation to how the performance of the system as a whole will
be influenced by some code change)

kind regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-09 17:01:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Fri, 9 Aug 2002, Rik van Riel wrote:
>
> > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > it.
>
> One problem we're running into here is that there are absolutely
> no tools to measure some of the things rmap is supposed to fix,
> like page replacement.

Read up on positivism.

"If it can't be measured, it doesn't exist".

The point being that there are things we can measure, and until anything
else comes around, those are the things that will have to guide us.

Linus

2002-08-09 17:00:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Fri, 9 Aug 2002, Daniel Phillips wrote:
> >
> > Note that even active defrag won't be able to handle the case where you
> > want have lots of big pages, consituting a large percentage of available
> > memory.
>
> Perhaps I'm missing something, but I don't see why.

The statistics are against you. rmap won't help at all with all the other
kernel allocations, and the dcache/icache is often large, and on big
machines while there may be tens of thousands of idle entries, there will
also be hundreds of _non_idle entries that you can't just remove.

> Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE
> earlier) and so would be allocated outside ZONE_LARGE.

.. at which poin tyou then get zone balancing problems.

Or we end up with the same kind of special zone that we have _anyway_ in
the current large-page patch, in which case the point of doing this is
what?

Linus

2002-08-09 17:10:21

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 18:51, Linus Torvalds wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > Slab allocations would not have GFP_DEFRAG (I mistakenly wrote GFP_LARGE
> > earlier) and so would be allocated outside ZONE_LARGE.
>
> .. at which poin tyou then get zone balancing problems.
>
> Or we end up with the same kind of special zone that we have _anyway_ in
> the current large-page patch, in which case the point of doing this is
> what?

The current large-page patch doesn't have any kind of defragmentation in the
special zone and that memory is just not available for other uses. The thing
is, when demand for large pages is low the zone should be allowed to fragment.

All of highmem also qualifies as defraggable memory, so certainly on these
big memory machines we can easily get a majority of memory in large pages.

I don't see a fundamental reason for new zone balancing problems. The fact
that balancing has sucked by tradition is not a fundamental reason ;-)

--
Daniel

2002-08-09 17:43:02

by Victor Yodaiken

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

> On Fri, 9 Aug 2002, Rik van Riel wrote:
> One problem we're running into here is that there are absolutely
> no tools to measure some of the things rmap is supposed to fix,
> like page replacement.

But page replacement is a means to an end. One thing tht would be
very interesting to know is how well the basic VM assumptions about
locality work in a Linux server, desktop, and embedded environment.

You have a LRU approximation that is supposed to approximate working
sets that were originally understood and measured on < 1Meg machines
with static libraries, tiny cache, no GUI and no mmap.

L.T. writes:

> Read up on positivism.

It's been discredited as recursively unsound reasoning.

---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-08-09 17:43:18

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Fri, Aug 09, 2002 at 09:52:53AM -0700, Linus Torvalds wrote:
> Read up on positivism.

Please don't. Read Karl Popper instead.

> "If it can't be measured, it doesn't exist".

The positivist Copenhagen interpretation stifled important areas of
physics for half a century. There is a distinction to be made between
an explanatory construct (whereby I mean to imply nothing fancy, no
quarks, just a brick), and the evidence that supports that construct
in the form of observable quantities. It's all there in Popper's work.

> The point being that there are things we can measure, and until anything
> else comes around, those are the things that will have to guide us.

True, as far as it goes. Measurement=good, idle-speculation=bad.

But it pays to keep in mind that progress is nonlinear. In 1988, Van
Jabobsen noted (http://www.kohala.com/start/vanj.88jul20.txt):

(I had one test case that went like

Basic system: 600 KB/s
add feature A: 520 KB/s
drop A, add B: 530 KB/s
add both A & B: 700 KB/s

Obviously, any statement of the form "feature A/B is good/bad"
is bogus.) But, in spite of the ambiguity, some of the network
design folklore I've heard seems to be clearly wrong.

Such anomalies abound.

Regards,

Bill Rugolsky

2002-08-09 18:06:49

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 18:31, Rik van Riel wrote:
> On Fri, 9 Aug 2002, Daniel Phillips wrote:
> > On Friday 09 August 2002 17:56, Linus Torvalds wrote:
>
> > > Also, I think the jury (ie Andrew) is still out on whether rmap is worth
> > > it.
> >
> > Tell me about it. Well, I feel strongly enough about it to spend the
> > next week coding yet another pte chain optimization.
>
> Well yes, we've _seen_ that 2.4 -rmap improves system behaviour,
> but we don't have any tools to _quantify_ that improvement.
>
> As long as the only measurable thing is the overhead (which may
> get close to zero, but will never become zero) the numbers will
> continue being against rmap. Not because of rmap, but just
> because the overhead is the only thing being measured ;)

You know what to do, instead of moaning about it. Just code up a test load
that blatantly favors rmap and post the results. In effect, that's what
Andrew's 'doitlots' benchmark does, in the other direction.

--
Daniel

2002-08-09 18:31:11

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > "General Purpose Operating System Support for Multiple Page Sizes"
> > htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_pape
> >rs/ganapathy/ganapathy.pdf
>
> This reference describes roughly what I had in mind for active
> defragmentation, which depends on reverse mapping. The main additional
> wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> which means the caller promises not to pin the allocation unit for long
> periods and does not mind if the underlying physical page changes
> spontaneously. Defragmenting in this zone is straightforward.

I think the objection to that is that in many cases the cost of
defragmentation is to heavy to be recollectable through TLB miss handling
alone.
What the above paper does is a reservation protocol with timeouts
which decide that either (a) the reserved mem was used in time and hence
the page is upgraded to a large page OR (b) the reserved mem is not used and
hence unused parts are released.
It relies on the fact that within the given timeout, most/mamy pages are
typically referenced.

In our patch we have the ZONE_LARGE into which we allocate the
large page. Currently they are effectively pinned down, but in 2.4.18
we had it backed by the page cache.

My gut feeling right now would be to follow the reservation based scheme,
but as said its a gut feeling.
Defragmenting to me seems a matter of last resort, Copying pages is expensive.
If you however simply target the superpages for smaller clusters, then its an
option. But at the same time one might contemplate to simply make
the base page 16K or 32K and page fault time simply map / swap / read /
writeback the whole cluster.
What studies has been done on this wrt to benefits of such an approach.
I talked to Ted Tso who would really like small super pages for better I/O
performance...

--
-- Hubertus Franke ([email protected])

2002-08-09 18:38:30

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 20:32, Hubertus Franke wrote:
> On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_pape
> > >rs/ganapathy/ganapathy.pdf
> >
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping. The main additional
> > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> > which means the caller promises not to pin the allocation unit for long
> > periods and does not mind if the underlying physical page changes
> > spontaneously. Defragmenting in this zone is straightforward.
>
> I think the objection to that is that in many cases the cost of
> defragmentation is to heavy to be recollectable through TLB miss handling
> alone.

You pay the cost only on transition from a load that doesn't use many large
pages to one that does, it is not an ongoing cost.

> [...]
>
> Defragmenting to me seems a matter of last resort, Copying pages is expensive.

It is the only way to ever have a seamless implementation. Really, I don't
understand this fear of active defragmentation. Oh well, like davem said,
code talks.

--
Daniel

2002-08-09 19:11:53

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Fri, 9 Aug 2002 [email protected] wrote:
> > On Fri, 9 Aug 2002, Rik van Riel wrote:
> > One problem we're running into here is that there are absolutely
> > no tools to measure some of the things rmap is supposed to fix,
> > like page replacement.
>
> But page replacement is a means to an end. One thing tht would be
> very interesting to know is how well the basic VM assumptions about
> locality work in a Linux server, desktop, and embedded environment.
>
> You have a LRU approximation that is supposed to approximate working
> sets that were originally understood and measured on < 1Meg machines
> with static libraries, tiny cache, no GUI and no mmap.

Absolutely, it would be interesting to know this.
However, up to now I haven't seen any programs that
measure this.

In this case we know what we want to measure, know we
want to measure it for all workloads, but don't know
how to do this in a quantifyable way.

> L.T. writes:
>
> > Read up on positivism.
>
> It's been discredited as recursively unsound reasoning.

To further this point, by how much has the security number
of Linux improved as a result of the inclusion of the Linux
Security Module framework ? ;)

I'm sure even Linus will agree that the security potential
has increased, even though he can't measure or quantify it.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-09 19:17:48

by Hubertus Franke

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Friday 09 August 2002 02:43 pm, Daniel Phillips wrote:
> On Friday 09 August 2002 20:32, Hubertus Franke wrote:
> > On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> > > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > > htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_
> > > >pape rs/ganapathy/ganapathy.pdf
> > >
> > > This reference describes roughly what I had in mind for active
> > > defragmentation, which depends on reverse mapping. The main additional
> > > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and
> > > GPF_LARGE, which means the caller promises not to pin the allocation
> > > unit for long periods and does not mind if the underlying physical page
> > > changes spontaneously. Defragmenting in this zone is straightforward.
> >
> > I think the objection to that is that in many cases the cost of
> > defragmentation is to heavy to be recollectable through TLB miss handling
> > alone.
>
> You pay the cost only on transition from a load that doesn't use many large
> pages to one that does, it is not an ongoing cost.
>

Correct. Maybe I misunderstood, when are you doing the coalloction of
adjacent pages (page-clusters, super pages).
Our intend was to do it at page fault time and breakup only during
memory pressure.

> > [...]
> >
> > Defragmenting to me seems a matter of last resort, Copying pages is
> > expensive.
>
> It is the only way to ever have a seamless implementation. Really, I don't
> understand this fear of active defragmentation. Oh well, like davem said,
> code talks.

--
-- Hubertus Franke ([email protected])

2002-08-09 21:15:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Fri, 9 Aug 2002, Rik van Riel wrote:
>
> To further this point, by how much has the security number
> of Linux improved as a result of the inclusion of the Linux
> Security Module framework ? ;)
>
> I'm sure even Linus will agree that the security potential
> has increased, even though he can't measure or quantify it.

Actually, the security number is irrelevant to me - the "noise index" from
people who think security protocols are interesting is what drove that
patch (and that one is definitely measurable).

This way, the security noise is now in somebody elses court ;)

Linus

2002-08-11 19:06:59

by Alan

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Fri, 2002-08-09 at 16:20, Daniel Phillips wrote:
> On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > "General Purpose Operating System Support for Multiple Page Sizes"
> > htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf
>
> This reference describes roughly what I had in mind for active
> defragmentation, which depends on reverse mapping. The main additional
> wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> which means the caller promises not to pin the allocation unit for long
> periods and does not mind if the underlying physical page changes
> spontaneously. Defragmenting in this zone is straightforward.

Slight problem. This paper is about a patented SGI method for handling
defragmentation into large pages (6,182,089). They patented it before
the presentation.

They also hold patents on the other stuff that you've recently been
discussing about not keeping seperate rmap structures until there are
more than some value 'n' when they switch from direct to indirect lists
of reverse mappings (6,112,286)

If you are going read and propose things you find on Usenix at least
check what the authors policies on patents are.

Perhaps someone should first of all ask SGI to give the Linux community
permission to use it in a GPL'd operating system ?


2002-08-11 22:32:15

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sunday 11 August 2002 22:30, Alan Cox wrote:
> On Fri, 2002-08-09 at 16:20, Daniel Phillips wrote:
> > On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > > "General Purpose Operating System Support for Multiple Page Sizes"
> > > htpp://http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/ganapathy/ganapathy.pdf
> >
> > This reference describes roughly what I had in mind for active
> > defragmentation, which depends on reverse mapping. The main additional
> > wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> > which means the caller promises not to pin the allocation unit for long
> > periods and does not mind if the underlying physical page changes
> > spontaneously. Defragmenting in this zone is straightforward.
>
> Slight problem. This paper is about a patented SGI method for handling
> defragmentation into large pages (6,182,089). They patented it before
> the presentation.

See 'straightforward' above, i.e., obvious to a practitioner of the art.
This is another one-click patent.

Look at claim 16, it covers our buddy allocator quite nicely:

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1='6182089'.WKU.&OS=PN/6182089&RS=PN/6182089

Claim 1 covers the idea of per-size freelist thresholds, below which no
coalescing is done.

Claim 13 covers the idea of having a buddy system on each node of a numa
system. Bill is going to be somewhat disappointed to find out he can't do
that any more.

It goes on in this vein. I suggest all vm hackers have a close look at
this. Yes, it's stupid, but we can't just ignore it.

> They also hold patents on the other stuff that you've recently been
> discussing about not keeping seperate rmap structures until there are
> more than some value 'n' when they switch from direct to indirect lists
> of reverse mappings (6,112,286)

This is interesting. By setting their 'm' to 1, you get essentially the
scheme implemented by Dave a few weeks ago, and by setting 'm' to 0, the
patent covers pretty much every imaginable reverse mapping scheme. Gee,
so SGI thought of reverse mapping in 1997 or thereabouts, and nobody ever
did before?

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1='6112286'.WKU.&OS=PN/6112286&RS=PN/6112286

Claim 2 covers use of their reverse mapping scheme, which as we have seen,
includes all reverse mapping schemes, for migrating the data content of
pages, and updating the page table pointers.

Claim 4 goes on to cover migration of data pages between nodes of a numa
system. (Got that wli?)

This patent goes on to claim just about everything you can do with a
reverse map. It's sure lucky for SGI that they were the first to think
of the idea of reverse mapping.

> If you are going read and propose things you find on Usenix at least
> check what the authors policies on patents are.

As always, I developed my ideas from first principles. I never saw or
heard of the paper until a few days ago. I don't need their self-serving
paper to figure this stuff out, and if they are going to do blatantly
commercial stuff like that, I'd rather the paper were not published at
all. Perhaps Usenix needs to establish a policy about that.

> Perhaps someone should first of all ask SGI to give the Linux community
> permission to use it in a GPL'd operating system ?

Yes, we should ask nicely, if we run into something that matters. Asking
nicely isn't the only option though.

And yes, I'm trying to be polite. It's just so stupid.

--
Daniel

2002-08-11 23:03:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Sun, 11 Aug 2002, Linus Torvalds wrote:
>
> If somebody sues you, you change the algorithm or you just hire a
> hit-man to whack the stupid git.

Btw, I'm not a lawyer, and I suspect this may not be legally tenable
advice. Whatever. I refuse to bother with the crap.

Linus

2002-08-11 23:02:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Mon, 12 Aug 2002, Daniel Phillips wrote:
>
> It goes on in this vein. I suggest all vm hackers have a close look at
> this. Yes, it's stupid, but we can't just ignore it.

Actually, we can, and I will.

I do not look up any patents on _principle_, because (a) it's a horrible
waste of time and (b) I don't want to know.

The fact is, technical people are better off not looking at patents. If
you don't know what they cover and where they are, you won't be knowingly
infringing on them. If somebody sues you, you change the algorithm or you
just hire a hit-man to whack the stupid git.

Linus

2002-08-11 23:12:46

by Larry McVoy

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, Aug 11, 2002 at 03:55:08PM -0700, Linus Torvalds wrote:
>
> On Mon, 12 Aug 2002, Daniel Phillips wrote:
> >
> > It goes on in this vein. I suggest all vm hackers have a close look at
> > this. Yes, it's stupid, but we can't just ignore it.
>
> Actually, we can, and I will.
>
> I do not look up any patents on _principle_, because (a) it's a horrible
> waste of time and (b) I don't want to know.
>
> The fact is, technical people are better off not looking at patents. If
> you don't know what they cover and where they are, you won't be knowingly
> infringing on them. If somebody sues you, you change the algorithm or you
> just hire a hit-man to whack the stupid git.

This issue is more complicated than you might think. Big companies with
big pockets are very nervous about being too closely associated with
Linux because of this problem. Imagine that IBM, for example, starts
shipping IBM Linux. Somewhere in the code there is something that
infringes on a patent. Given that it is IBM Linux, people can make
the case that IBM should have known and should have fixed it and
since they didn't, they get sued. Notice that IBM doesn't ship
their own version of Linux, they ship / support Red Hat or Suse
(maybe others, doesn't matter). So if they ever get hassled, they'll
vector the problem to those little guys and the issue will likely
get dropped because the little guys have no money to speak of.

Maybe this is all good, I dunno, but be aware that the patents
have long arms and effects.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-08-11 23:22:31

by Alan

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, 2002-08-11 at 23:56, Linus Torvalds wrote:
>
> On Sun, 11 Aug 2002, Linus Torvalds wrote:
> >
> > If somebody sues you, you change the algorithm or you just hire a
> > hit-man to whack the stupid git.
>
> Btw, I'm not a lawyer, and I suspect this may not be legally tenable
> advice. Whatever. I refuse to bother with the crap.

In which case you might as well do the rest of the world a favour and
restrict US usage of Linux in the license file while you are at it.
Unfortunately the USA forces people to deal with this crap. I'd hope SGI
would be decent enough to explicitly state they will license this stuff
freely for GPL use (although having shipping Linux themselves the
question is partly moot as the GPL says they can't impose additional
restrictions)

Alan

2002-08-11 23:40:11

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On 12 Aug 2002, Alan Cox wrote:

> Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> would be decent enough to explicitly state they will license this stuff
> freely for GPL use

I seem to remember Apple having a clause for this in
their Darwin sources, forbidding people who contribute
code from suing them about patent violations due to
the code they themselves contributed.

kind regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-11 23:38:40

by William Lee Irwin III

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, 11 Aug 2002, Linus Torvalds wrote:
>> If somebody sues you, you change the algorithm or you just hire a
>> hit-man to whack the stupid git.

On Sun, Aug 11, 2002 at 03:56:10PM -0700, Linus Torvalds wrote:
> Btw, I'm not a lawyer, and I suspect this may not be legally tenable
> advice. Whatever. I refuse to bother with the crap.

I'm not really sure what to think of all this patent stuff myself, but
I may need to get some directions from lawyerish types before moving on
here. OTOH I certainly like the suggested approach more than my
conservative one, even though I'm still too chicken to follow it. =)

On a more practical note, though, someone left out an essential 'h'
from my email address. Please adjust the cc: list. =)


Thanks,
Bill

2002-08-11 23:39:25

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Monday 12 August 2002 02:46, Alan Cox wrote:
> On Sun, 2002-08-11 at 23:56, Linus Torvalds wrote:
> >
> > On Sun, 11 Aug 2002, Linus Torvalds wrote:
> > >
> > > If somebody sues you, you change the algorithm or you just hire a
> > > hit-man to whack the stupid git.
> >
> > Btw, I'm not a lawyer, and I suspect this may not be legally tenable
> > advice. Whatever. I refuse to bother with the crap.
>
> In which case you might as well do the rest of the world a favour and
> restrict US usage of Linux in the license file while you are at it.
> Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> would be decent enough to explicitly state they will license this stuff
> freely for GPL use (although having shipping Linux themselves the
> question is partly moot as the GPL says they can't impose additional
> restrictions)

I do not agree that it is enough to license it for 'GPL' use. If there is
a license, it should impose no restrictions that the GPL does not. There
is a big distinction. Anything else, and the licensor is sending the message
that they reserve the right to enforce against Linux users.

In other words, a license grant has to cover *all* uses of Linux and not just
GPL uses.

In my opinion, RedHat has set a bad example by stopping short of promising
free use of Ingo's patents for all Linux users. We are entering a difficult
time, and such a wrong-footed move simply makes it more difficult.

--
Daniel

2002-08-11 23:47:46

by Larry McVoy

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sun, Aug 11, 2002 at 08:42:16PM -0300, Rik van Riel wrote:
> On 12 Aug 2002, Alan Cox wrote:
>
> > Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> > would be decent enough to explicitly state they will license this stuff
> > freely for GPL use
>
> I seem to remember Apple having a clause for this in
> their Darwin sources, forbidding people who contribute
> code from suing them about patent violations due to
> the code they themselves contributed.

IBM has a fantastic clause in their open source license. The license grants
you various rights to use, etc., and then goes on to say something in
the termination section (I think) along the lines of

In the event that You or your affiliates instigate patent, trademark,
and/or any other intellectual property suits, this license terminates
as of the filing date of said suit[s].

You get the idea. It's basically "screw me, OK, then screw you too" language.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-08-12 01:34:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Sun, 11 Aug 2002, Larry McVoy wrote:
>
> This issue is more complicated than you might think.

No, it's not. You miss the point.

> Big companies with
> big pockets are very nervous about being too closely associated with
> Linux because of this problem.

The point being that that is _their_ problem, and at a level that has
nothing to do with technology.

I'm saying that technical people shouldn't care. I certainly don't. The
people who _should_ care are patent attourneys etc, since they actually
get paid for it, and can better judge the matter anyway.

Everybody in the whole software industry knows that any non-trivial
program (and probably most trivial programs too, for that matter) will
infringe on _some_ patent. Ask anybody. It's apparently an accepted fact,
or at least a saying that I've heard too many times.

I just don't care. Clearly, if all significant programs infringe on
something, the issue is no longer "do we infringe", but "is it an issue"?

And that's _exactly_ why technical people shouldn't care. The "is it an
issue" is not something a technical guy can answer, since the answer
depends on totally non-technical things.

Ask your legal counsel, and I strongly suspect that if he is any good, he
will tell you the same thing. Namely that it's _his_ problem, and that
your engineers should not waste their time trying to find existing
patents.

Linus

2002-08-12 05:03:41

by Larry McVoy

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

> Ask your legal counsel, and I strongly suspect that if he is any good, he
> will tell you the same thing. Namely that it's _his_ problem, and that
> your engineers should not waste their time trying to find existing
> patents.

Partially true for us. We do do patent searches to make sure we aren't
doing anything blatently stupid.

I do agree with you 100% that it is impossible to ship any software that
does not infringe on some patent. It's a big point of contention in
contract negotiations because everyone wants you to warrant that your
software doesn't infringe and indemnify them if it does.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-08-12 08:21:57

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Monday 12 August 2002 01:50, Larry McVoy wrote:
> On Sun, Aug 11, 2002 at 08:42:16PM -0300, Rik van Riel wrote:
> > On 12 Aug 2002, Alan Cox wrote:
> >
> > > Unfortunately the USA forces people to deal with this crap. I'd hope SGI
> > > would be decent enough to explicitly state they will license this stuff
> > > freely for GPL use
> >
> > I seem to remember Apple having a clause for this in
> > their Darwin sources, forbidding people who contribute
> > code from suing them about patent violations due to
> > the code they themselves contributed.
>
> IBM has a fantastic clause in their open source license. The license grants
> you various rights to use, etc., and then goes on to say something in
> the termination section (I think) along the lines of
>
> In the event that You or your affiliates instigate patent, trademark,
> and/or any other intellectual property suits, this license terminates
> as of the filing date of said suit[s].
>
> You get the idea. It's basically "screw me, OK, then screw you too" language.

Yes. I would like to add my current rmap optimization work, if it is worthy
for the usual reasons, to the kernel under a DPL license which is in every
respect the GPL, except that it adds one additional restriction along the
lines:

"If you enforce a patent against a user of this code, or you have a
beneficial relationship with someone who does, then your licence to
use or distribute this code is automatically terminated"

with more language to extend the protection to the aggregate work, and to
specify that we are talking about enforcement of patents concerned with any
part of the aggregate work. Would something like that fly?

In other words, use copyright law as a lever against patent law.

This would tend to provide protection against 'our friends', who on the one
hand, depend on Linux in their businesses, and on the other hand, do seem to
be holding large portfolios of equivalently stupid patents.

As far as protection against those who would have no intention or need to use
the aggregate work anyway, that's an entirely separate question. Frankly, I
enjoy the sport of undermining a patent much more when it is held by someone
who is not a friend.

--
Daniel

2002-08-12 09:07:57

by Alan

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Mon, 2002-08-12 at 02:26, Linus Torvalds wrote:
> Ask your legal counsel, and I strongly suspect that if he is any good, he
> will tell you the same thing. Namely that it's _his_ problem, and that
> your engineers should not waste their time trying to find existing
> patents.

Wasn't a case of wasting time. That one is extremely well known because
there were upset people when SGI patented it and then submitted a usenix
paper on it.


2002-08-13 13:38:05

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Monday 12 August 2002 04:22 am, Daniel Phillips wrote:

> Yes. I would like to add my current rmap optimization work, if it is
> worthy for the usual reasons, to the kernel under a DPL license which is in
> every respect the GPL, except that it adds one additional restriction along
> the lines:
>
> "If you enforce a patent against a user of this code, or you have a
> beneficial relationship with someone who does, then your licence to
> use or distribute this code is automatically terminated"
>
> with more language to extend the protection to the aggregate work, and to
> specify that we are talking about enforcement of patents concerned with any
> part of the aggregate work. Would something like that fly?
>
> In other words, use copyright law as a lever against patent law.

More than that, the GPL could easily be used to form a "patent pool". Just
say "This patent is licensed for use in GPL code. If you want to use it
outside of GPL code, you need a seperate license."

The purpose of modern patents is Mutually Assured Destruction: If you sue me,
I have 800 random patents you're bound to have infringed just by breating,
and even though they won't actually hold up to scrutiny I can keep you tied
up in court for years and force you to spend millions on legal fees. So why
don't you just cross-license your entire patent portfolio with us, and that
way we can make the whole #*%(&#% patent issue just go away. (Notice: when
aybody DOES sue, the result is usually a cross-licensing agreement of the
entire patent portfolio. Even in those rare cases when the patent
infringement is LEGITIMATE, the patent system is too screwed up to function
against large corporations due to the zillions of frivolous patents and the
tendency for corporations to have lawyers on staff so defending a lawsuit
doesn't really cost them anything.)

This is how companies like IBM and even Microsoft think. They get as many
patents as possible to prevent anybody ELSE from suing them, because the
patent office is stupid enough to give out a patent on scrollbars a decade
after the fact and they don't want to be on the receiving end of this
nonsense. And then they blanket cross-license with EVERYBODY, so nobody can
sue them.

People do NOT want to give a blanket license to everybody for any use on
these patents because it gives up the one thing they're good for: mutually
assured destruction. Licensing for "open source licenses" could mean "BSD
license but we never gave anybody any source code, so ha ha."

But if people with patents were to license all their patents FOR USE IN GPL
CODE, then any proprietary infringement (or attempt to sue) still gives them
leverage for a counter-suit. (IBM retained counter-suit ability in a
different way: you sue, the license terminates. That's not bad, but I think
sucking the patent system into the GPL the same way copyright gets inverted
would be more useful.)

This is more or less what Red Hat's done with its patents, by the way.
Blanket license for use under GPL-type licenses, but not BSD because that
would disarm mutually assured destruction. Now if we got somebody like IBM
on board a GPL patent pool (with more patents than anybody else, as far as I
know), that would realy mean something...

Unfortunately, the maintainer of the GPL is Stallman, so he's the logical guy
to spearhead a "GPL patent pool" project, but any time anybody mentions the
phrase "intellectual property" to him he goes off on a tangent about how you
shouldn't call anything "intellectual property", so how can you have a
discussion about it, and nothing ever gets done. It's FRUSTRATING to see
somebody with such brilliant ideas hamstrung not just by idealism, but
PEDANTIC idealism.

Sigh...

Rob

2002-08-13 13:48:50

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:

> In other words, a license grant has to cover *all* uses of Linux and not
> just GPL uses.

Including a BSD license where source code is never released? Or dot-net
application servers hosted on a Linux system under lock and key in a vault
somewhere? And no termination clause, so this jerk can still sue you over
other frivolous patents?

So you would object to microsoft granting rights to its patents saying "you
can use this patent in software that runs on windows, but use it on any other
platform and we'll sue you", but you don't mind going the other way?

Either way BSD gets the shaft, of course. But then BSDI was doing that them
a decade ago, and Sun hired away Bill Joy and forked off SunOS years before
that, so they should be used to it by now... :) (And BSD runs plenty of GPL
application code...)

> In my opinion, RedHat has set a bad example by stopping short of promising
> free use of Ingo's patents for all Linux users. We are entering a
> difficult time, and such a wrong-footed move simply makes it more
> difficult.

Imagine a slimeball company that puts out proprietary software, gets a patent
on turning a computer on, and sues everybody in the northern hemisphere ala
rambus. They run a Linux system in the corner in their office, therefore
they are "a linux user". How do you stop somebody with that mindset from
finding a similarly trivial loophole in your language? (Think Spamford
Wallace. Think the CEO of Rambus. Think Unisys and the gif patent. Think
the people who recently got a patent on JPEG. Think the british telecom
idiots trying to patent hyperlinking a decade after Tim Berners-Lee's first
code drop to usenet...)

Today, all these people do NOT sue IBM, unless they're really stupid. (And
if they do, they will have cross-licensed their patent portfolio with IBM in
a year or two. Pretty much guaranteed.)

Rob

2002-08-13 15:08:39

by Alan

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tue, 2002-08-13 at 09:40, Rob Landley wrote:
> Unfortunately, the maintainer of the GPL is Stallman, so he's the logical guy
> to spearhead a "GPL patent pool" project, but any time anybody mentions the
> phrase "intellectual property" to him he goes off on a tangent about how you
> shouldn't call anything "intellectual property", so how can you have a
> discussion about it, and nothing ever gets done. It's FRUSTRATING to see
> somebody with such brilliant ideas hamstrung not just by idealism, but
> PEDANTIC idealism.
>

Richard isnt daft on this one. The FSF does not have the 30 million
dollars needed to fight a *single* US patent lawsuit. The problem also
reflects back on things like Debian, because Debian certainly cannot
afford to play the patent game either.

2002-08-13 16:33:42

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 11:06 am, Alan Cox wrote:
> On Tue, 2002-08-13 at 09:40, Rob Landley wrote:
> > Unfortunately, the maintainer of the GPL is Stallman, so he's the logical
> > guy to spearhead a "GPL patent pool" project, but any time anybody
> > mentions the phrase "intellectual property" to him he goes off on a
> > tangent about how you shouldn't call anything "intellectual property", so
> > how can you have a discussion about it, and nothing ever gets done. It's
> > FRUSTRATING to see somebody with such brilliant ideas hamstrung not just
> > by idealism, but PEDANTIC idealism.
>
> Richard isnt daft on this one. The FSF does not have the 30 million
> dollars needed to fight a *single* US patent lawsuit. The problem also
> reflects back on things like Debian, because Debian certainly cannot
> afford to play the patent game either.

Agreed, but they can try to give standing to companies that have either the
resources or the need to do it themselves, and also to placate people who see
patent applications by SGI and Red Hat as evil proprietary encroachment
rather than an attempt to scrape together some kind of defense against the
insanity of the patent system.

Like politics: it's a game you can't win by ignoring, you can only try to use
it against itself. The GPL did a great job of this with copyright law: it
doesn't abandon stuff into the public domain for other people to copyright
and claim, but keeps it copyrighted and uses that copyright against the
copyright system. But at the time software patents weren't enforceable yet
and I'm guessing the wording of the license didn't want to lend credibility
to the concept. This situation has changed since: now software patents are
themselves an IP threat to free software that needs a copyleft solution.

Releasing a GPL 2.1 with an extra clause about a patent pool wouldn't cost
$30 million. (I.E. patents used in GPL code are copyleft style licensed and
not BSD style licensed: they can be used in GPL code but use outside it
requires a seperate license. Right now it says something like "free for use
by all" which makes the mutually assured destruction people cringe.)

By the way, the average figure I've heard to defend against a patent suit is
about $2 1/2 million. That's defend and not pursue, and admittedly that's
not near the upper limit, but it CAN be done for less. And what you're
looking for in a patent pool is something to countersue with in a defense,
not something to initiate action with. (Obviously, I'm not a professional
intellectual property lawyer. I know who to ask, but to get more than an off
the cuff remark I'd have to sponsor some research...)

Last time I really looked into all this, Stallman was trying to do an
enormous new GPL 3.0, addressing application service providers. That seems
to have fallen though (as has the ASP business model), but the patent issue
remains unresolved.

Red Hat would certainly be willing to play in a GPL patent pool. The
statement on their website already gives blanket permission to use patents in
GPL code (and a couple similar licenses; this would be a subset of the
permission they've already given). Red Hat's participation might convince
other distributors to do a "me too" thing (there's certainly precedent for
it). SGI could probably be talked into it as well, since they need the
goodwill of the Linux community unless they want to try to resurrect Irix.
IBM would take some convincing, it took them a couple years to get over their
distaste for the GPL in the first place, and they hate to be first on
anything, but if they weren't first... HP I haven't got a CLUE about with
Fiorina at the helm. Dell is being weird too...

Dunno. But ANY patent pool is better than none. If suing somebody for the
use of a patent in GPL code terminates your right to participate in a GPL
patent pool and makes you vulnerable to a suit over violating any patent in
the pool, then the larger the pool is the more incentive there is NOT to
sue...

Rob

2002-08-13 16:46:01

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 10:51, Rob Landley wrote:
> On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:
> So you would object to microsoft granting rights to its patents saying "you
> can use this patent in software that runs on windows, but use it on any other
> platform and we'll sue you", but you don't mind going the other way?

You missed the point. I was talking about using copyright against patents,
and specifically in the case where patents are held by people who also want
to use the copyrighted code. The intention is to help keep our friends
honest.

Dealing with Microsoft, or anyone else whose only motivation is to obstruct,
is an entirely separate issue.

--
Daniel

2002-08-13 17:00:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Tue, 13 Aug 2002, Rob Landley wrote:
>
> Last time I really looked into all this, Stallman was trying to do an
> enormous new GPL 3.0, addressing application service providers. That seems
> to have fallen though (as has the ASP business model), but the patent issue
> remains unresolved.

At least one problem is exactly the politics played by the FSF, which
means that a lot of people (not just me), do not trust such new versions
of the GPL. Especially since the last time this happened, it all happened
in dark back-rooms, and I got to hear about it not off any of the lists,
but because I had an insider snitch on it.

I lost all respect I had for the FSF due to its sneakiness.

The kernel explicitly states that it is under the _one_ particular version
of the "GPL v2" that is included with the kernel. Exactly because I do not
want to have politics dragged into the picture by an external party (and
I'm anal enough that I made sure that "version 2" cannot be misconstrued
to include "version 2.1".

Also, a license is a two-way street. I do not think it is morally right to
change an _existing_ license for any other reason than the fact that it
has some technical legal problem. I intensely dislike the fact that many
people seem to want to extend the current GPL as a way to take advantage
of people who used the old GPL and agreed with _that_ - but not
necessarily the new one.

As a result, every time this comes up, I ask for any potential new
"patent-GPL" to be a _new_ license, and not try to feed off existing
works. Please dopn't make it "GPL". Make it the GPPL for "General Public
Patent License" or something. And let people buy into it on its own
merits, not on some "the FSF decided unilaterally to make this decision
for us".

I don't like patents. But I absolutely _hate_ people who play politics
with other peoples code. Be up-front, not sneaky after-the-fact.

Linus

2002-08-13 17:13:56

by Ruth Ivimey-Cook

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tue, 13 Aug 2002, Linus Torvalds wrote:
>I don't like patents. But I absolutely _hate_ people who play politics
>with other peoples code. Be up-front, not sneaky after-the-fact.


Well said :-)

Ruth

--
Ruth Ivimey-Cook
Software engineer and technical writer.

2002-08-13 17:27:14

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tue, 13 Aug 2002, Linus Torvalds wrote:

> Also, a license is a two-way street. I do not think it is morally right
> to change an _existing_ license for any other reason than the fact that
> it has some technical legal problem.

Agreed, but we might be running into one of these.

> I don't like patents. But I absolutely _hate_ people who play politics
> with other peoples code. Be up-front, not sneaky after-the-fact.

Suppose somebody sends you a patch which implements a nice
algorithm that just happens to be patented by that same
somebody. You don't know about the patent.

You integrate the patch into the kernel and distribute it,
one year later you get sued by the original contributor of
that patch because you distribute code that is patented by
that person.

Not having some protection in the license could open you
up to sneaky after-the-fact problems.

Having a license that explicitly states that people who
contribute and use Linux shouldn't sue you over it might
prevent some problems.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-13 17:43:13

by Alexander Viro

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)



On Tue, 13 Aug 2002, Rik van Riel wrote:

> Suppose somebody sends you a patch which implements a nice
> algorithm that just happens to be patented by that same
> somebody. You don't know about the patent.
>
> You integrate the patch into the kernel and distribute it,
> one year later you get sued by the original contributor of
> that patch because you distribute code that is patented by
> that person.
>
> Not having some protection in the license could open you
> up to sneaky after-the-fact problems.

Accepting non-trivial patches from malicious source means running code
from malicious source on your boxen. In kernel mode. And in that case
patents are the least of your troubles...

2002-08-13 17:50:11

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 12:51 pm, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rob Landley wrote:
> > Last time I really looked into all this, Stallman was trying to do an
> > enormous new GPL 3.0, addressing application service providers. That
> > seems to have fallen though (as has the ASP business model), but the
> > patent issue remains unresolved.
>
> At least one problem is exactly the politics played by the FSF, which
> means that a lot of people (not just me), do not trust such new versions
> of the GPL. Especially since the last time this happened, it all happened
> in dark back-rooms, and I got to hear about it not off any of the lists,
> but because I had an insider snitch on it.
>
> I lost all respect I had for the FSF due to its sneakiness.

Exactly why I was thinking a minimalist version (I.E. one more paragraph) was
about the biggest change the community would be likely to accept. I strongly
suspeced GPL 3.0 was going nowhere long before it actually got bogged down...

And the politics being played by the FSF seem to be why they're NOT
interested in a specific patch to fix a specific problem (lack of addressing
patents). If you want bug fixes, they want to log-roll huge new
infrastructure changes and force you to swallow the whole upgrade. That's
been a problem on this list before. :)

> The kernel explicitly states that it is under the _one_ particular version
> of the "GPL v2" that is included with the kernel. Exactly because I do not
> want to have politics dragged into the picture by an external party (and
> I'm anal enough that I made sure that "version 2" cannot be misconstrued
> to include "version 2.1".

Sure. But it's been re-licensed before. Version 0.12, if I recall. (And
the statement of restriction to 2.0 could also be considered a re-licensing,
albeit a minor one.) How much leverage even YOU have to fiddle with the
license at this point is an open question, but if a version 2.1 WAS
acceptable (if, if, important word that, and obviously this would be after
seeing it), and you decided to relax the 2.0 restriction to a 2.1 restriction
(still operating under the "if" here, I can include semicolons if you
like...), it probably wouldn't muddy the legal waters too much if the sucker
later had to be upheld in court (<- nested if).

"Probably" meaning "ask a lawyer", of course...

> Also, a license is a two-way street. I do not think it is morally right to
> change an _existing_ license for any other reason than the fact that it
> has some technical legal problem. I intensely dislike the fact that many
> people seem to want to extend the current GPL as a way to take advantage
> of people who used the old GPL and agreed with _that_ - but not
> necessarily the new one.

The only reason I'd worry about trying to integrate it is to ensure that a
"patent pool" adendum was compatible with the GPL itself. It's not an
additional restricition that would violate the GPL, it's a grant of license
on an area not explicitly addressed by the GPL, and it's a grant of
permissions giving you rights you wouldn't otherwise necessarily have.

The problem comes with the "if you sue, your rights terminate" clause. On
the one hand, the GPL is generally incompatable with additional termination
clauses. On the other hand, it's a termination clause only of the additional
rights granted by the patent license, not of the rights granted by the GPL
itself, which is a copyright license...

It's a bit of legal hacking that would definitely require vetting by a
professional...

On the other hand, cross-licensing ALL your patents with a GPL patent pool
would probably have to be a seperate statement from the license, that's a
bigger decision than simply releasing GPL code that might use one or two
patents, and it's best to have that decision explicitly made and explicitly
stated. (The GPL only applies to what you specifically release under it...)
So making an external statement be compatable with the GPL is definitely a
good thing anyway.

A case could be made that section 7 sort of implies an intent that enforcing
patent restrictions violate the license and thus terminate your rights to
distribute under section 4, and could be argued to mean that you can't put
code under the GPL without at least implying a license to your own patents.
But that doesn't solve the "third party who never contributed" problem.
(That's what requires the patent license termination clause, thus making you
vulnerable to suits for infringing other patents in the pool...)

I THINK it could be made to work as a seperate supplementary licensing
statement, compatable with the GPL. I know it could be made to work as an
upgrade to the GPL, but you're right there's huge problems with that
approach...

Either way, it's vaporware until acceptable language is stitched together and
run by a competent IP attourney...

> As a result, every time this comes up, I ask for any potential new
> "patent-GPL" to be a _new_ license, and not try to feed off existing
> works. Please dopn't make it "GPL". Make it the GPPL for "General Public
> Patent License" or something. And let people buy into it on its own
> merits, not on some "the FSF decided unilaterally to make this decision
> for us".

GPL+, possibly...

In either case it would be a new license. The people putting "or later" in
their copyright notices trust the FSF and thus the FSF's new licenses (if
any). The people who specify a specific version don't. The license seems to
have been intentionally written to leave the option of making this
distinction open.

> I don't like patents. But I absolutely _hate_ people who play politics
> with other peoples code. Be up-front, not sneaky after-the-fact.

Well, GPL section 9 did plant this particular land mine in 1991, so this is
probably a case of being sneaky up front. :) But it's still being sneaky...

That said, section 9 just states that the FSF will put out new versions and
that code that says a version number "or later", or who don't specify any
version, can automatically be used under the new version. The ones that
specify a specific version don't automatically get re-licensed in future by
section 9, so the linux case is pretty clear. (Well, disregarding the binary
module thing, anyway. 8)

> Linus

Rob

P.S. Yes everybody, RTFL: http://www.gnu.org/copyleft/gpl.html

2002-08-13 18:03:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Tue, 13 Aug 2002, Rik van Riel wrote:
>
> Having a license that explicitly states that people who
> contribute and use Linux shouldn't sue you over it might
> prevent some problems.

The thing is, if you own the patent, and you sneaked the code into the
kernel, you will almost certainly be laughed out of court for trying to
enforce it.

And if somebody else owns the patent, no amount of copyright license makes
any difference.

Linus

2002-08-13 17:56:50

by Rik van Riel

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tue, 13 Aug 2002, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rik van Riel wrote:
> >
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
>
> The thing is, if you own the patent, and you sneaked the code into the
> kernel, you will almost certainly be laughed out of court for trying to
> enforce it.

Apparently not everybody agrees on this:

http://zdnet.com.com/2100-1106-884681.html

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-13 18:16:00

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 01:29 pm, Rik van Riel wrote:
> On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > Also, a license is a two-way street. I do not think it is morally right
> > to change an _existing_ license for any other reason than the fact that
> > it has some technical legal problem.
>
> Agreed, but we might be running into one of these.
>
> > I don't like patents. But I absolutely _hate_ people who play politics
> > with other peoples code. Be up-front, not sneaky after-the-fact.
>
> Suppose somebody sends you a patch which implements a nice
> algorithm that just happens to be patented by that same
> somebody. You don't know about the patent.

That would be entrapment. When they submit the patch, they're giving you an
implied license to use it, even if they don't SAY so, just because they
voluntarily submitted it and can't claim to be surpised it was then used, or
that they didn't want it to be. You could put up a heck of a defense in
court on that one.

It's people who submit patches that use OTHER people's patents you have to
worry about, and that's something you just can't filter for with the patent
numbers rapidly approaching what, eight digits?

> Having a license that explicitly states that people who
> contribute and use Linux shouldn't sue you over it might
> prevent some problems.

Such a clause is what IBM insisted on having in ITS open source license. You
sue, your rights under this license terminate, which is basically automatic
grounds for a countersuit for infringement.

(IBM has a lot of lawyers, and they pay them a lot of money. It's
conceivable they may actually have a point from time to time... :)

> regards,
>
> Rik

Rob

2002-08-13 18:06:21

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 12:47 pm, Daniel Phillips wrote:
> On Tuesday 13 August 2002 10:51, Rob Landley wrote:
> > On Sunday 11 August 2002 07:44 pm, Daniel Phillips wrote:
> > So you would object to microsoft granting rights to its patents saying
> > "you can use this patent in software that runs on windows, but use it on
> > any other platform and we'll sue you", but you don't mind going the other
> > way?
>
> You missed the point. I was talking about using copyright against patents,
> and specifically in the case where patents are held by people who also want
> to use the copyrighted code. The intention is to help keep our friends
> honest.

Does the little company that recently got a patent on JPEG actually used open
source code in-house? They might be a windows-only shop. I don't know.

> Dealing with Microsoft, or anyone else whose only motivation is to
> obstruct, is an entirely separate issue.

Oddly enough, Microsoft isn't a major threat here. They don't seem to want
to lob the first nuke any more than anybody else here. They have too much to
lose. If they unleashed their patent portfolio upon the Linux community,
there are enough big players with their own patent portfolios and a vested
interest in Linux to respond in kind. (Microsoft is happy to rattled their
saber about the unenforceability of the GPL, and threaten to use patents to
stop it, but you'll notice they haven't DONE it yet. Threats are cheap, in
the legal world. As far as I can tell, at least 90% of any legal maneuvering
is posturing and seeing if the other guy blinks. It's mostly a game of
chicken, you never know WHAT a judge or jury will actually say, when it comes
down to it.)

They can't go after the big players with patents anyway, they've already got
cross-licensing agreements with most of them (which is the point of patent
portfolios in the first place). So it's only the small players they could
really go up against, and they simply don't see those as their real
competition except as allies to big players like IBM, HPaq, Dell...

And if they're going after the small fry, having already been convicted in
court of being an abusive monopoly, they open themselves to a class-action
suit by ambulance chasers working on retainer against the prospect of tapping
Microsoft's deep pockets in a judgement or settlement. (Sort of like suing
the tobacco industry: it's not easy but lawyers still sign on because there's
so much MONEY to be gained if they win...) An explicit patent infringement
suit does NOT give plausible deniability of the "you can't prove we didn't
win simply because we were better in the marketplace" kind. (You can't prove
the tooth fairy doesn't exist, either. Hard to prove a negative.)

Other than FUD, the more likely pragmatic problem is some small fry with no
stake in anything who thinks he can get rich quick by being really slimy.
Anybody remember the origin of the Linux trademark? THAT is the most
annoying problem patents pose, being nibbled to death by ants...

Rob

2002-08-13 18:48:08

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 02:32 pm, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rob Landley wrote:
> > > Having a license that explicitly states that people who
> > > contribute and use Linux shouldn't sue you over it might
> > > prevent some problems.
> >
> > Such a clause is what IBM insisted on having in ITS open source license.
> > You sue, your rights under this license terminate, which is basically
> > automatic grounds for a countersuit for infringement.
>
> Note that I personally think the "you screw with me, I screw with you"
> approach is a fine one. After all, the GPL is based on "you help me, I'll
> help you", so it fits fine.
>
> However, it doesn't work due to the distributed nature of the GPL. The FSF
> tried to do something like it in the GPL 3.0 discussions, and the end
> result was a total disaster. The GPL 3.0 suggestion was something along
> the lines of "you sue any GPL project, you lose all GPL rights". Which to
> me makes no sense at all - I could imagine that there might be some GPL
> project out there that _deserves_ getting sued(*) and it has nothing to do
> with Linux.

So this is another argument in favor of having the patent addendum be
separate then. Software patents as a class are basically evil, and valid
ones are clearly the exception. Copyrights are NOT evil (or at least are
inherently more tightly focused), and valid ones are the rule.

There is also the legal precent of patent pools, which are an established
legal concept as far as I know. Joining a patent pool means you license all
your patents to get a license to all their patents, and bringing a patent
suit within the pool would violate your agreement and cut you off from the
pool. (If I'm wrong, somebody correct me on this please.)

The open source community's problem is that it historically hasn't had the
entry fee to participate in this sort of arrangement, and solving it on a
company by company basis doesn't help the community. These days open source
has a lot more resources than it used to.

I think Red Hat is actually trying to help on this front by getting patents
and licensing them for use in GPL code. By itself, this is not a solution,
but it could be the seed of one...

Right, at this point I need to go bug a lawyer, I think...

> Linus
>
> (*) "GNU Emacs, the defendent, did inefariously conspire to play
> towers-of-hanoy, while under the guise of a harmless editor".

But remember, you can't spell "evil" without "vi"... :)

Rob

2002-08-13 18:31:49

by Rob Landley

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 01:59 pm, Rik van Riel wrote:
> On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > On Tue, 13 Aug 2002, Rik van Riel wrote:
> > > Having a license that explicitly states that people who
> > > contribute and use Linux shouldn't sue you over it might
> > > prevent some problems.
> >
> > The thing is, if you own the patent, and you sneaked the code into the
> > kernel, you will almost certainly be laughed out of court for trying to
> > enforce it.
>
> Apparently not everybody agrees on this:
>
> http://zdnet.com.com/2100-1106-884681.html

This is just a case of IBM's left hand not knowing what the right hand is
doing. An official representative of IBM gave statements to the committee
that their contributions were unencumbered. If he honestly was acting in his
capacity as a representative of IBM, and had the authority to make that
statement, then that statement IS permission equivalent to a royalty-free
license to use the patent.

Going through court to prove this could, of course, take years and millions
of dollars, and nobody's going to use the standard until it's resolved, which
is why everybody's groaning that big blue is being either evil or really
really stupid by not just giving in on this one.

It's a PR black eye for IBM ("We're big, we're blue, we're dumb") but doesn't
change the nature of the legal arguments...

Any time ANYBODY sues you, no matter how frivolous, it could easily be long
and exensive. That's why you countersue for damages and get them to pay your
costs for the trial if you win, plus punitive damages, plus pain and
suffering, plus a stupidity tax, plus...)

This topic's wandering a bit far afield. CC: list trimmed...

Rob

2002-08-13 18:41:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)


On Tue, 13 Aug 2002, Rob Landley wrote:
>
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
>
> Such a clause is what IBM insisted on having in ITS open source license. You
> sue, your rights under this license terminate, which is basically automatic
> grounds for a countersuit for infringement.

Note that I personally think the "you screw with me, I screw with you"
approach is a fine one. After all, the GPL is based on "you help me, I'll
help you", so it fits fine.

However, it doesn't work due to the distributed nature of the GPL. The FSF
tried to do something like it in the GPL 3.0 discussions, and the end
result was a total disaster. The GPL 3.0 suggestion was something along
the lines of "you sue any GPL project, you lose all GPL rights". Which to
me makes no sense at all - I could imagine that there might be some GPL
project out there that _deserves_ getting sued(*) and it has nothing to do
with Linux.

Linus

(*) "GNU Emacs, the defendent, did inefariously conspire to play
towers-of-hanoy, while under the guise of a harmless editor".

2002-08-13 18:45:36

by Mike Galbraith

[permalink] [raw]
Subject: Re: large page patch (fwd)


>Also, a license is a two-way street. I do not think it is morally right to
>change an _existing_ license for any other reason than the fact that it
>has some technical legal problem. I intensely dislike the fact that many
>people seem to want to extend the current GPL as a way to take advantage
>of people who used the old GPL and agreed with _that_ - but not
>necessarily the new one.

Amen.

2002-08-13 19:10:18

by Daniel Phillips

[permalink] [raw]
Subject: Re: large page patch (fwd) (fwd)

On Tuesday 13 August 2002 19:55, Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Rik van Riel wrote:
> >
> > Having a license that explicitly states that people who
> > contribute and use Linux shouldn't sue you over it might
> > prevent some problems.
>
> The thing is, if you own the patent, and you sneaked the code into the
> kernel, you will almost certainly be laughed out of court for trying to
> enforce it.
>
> And if somebody else owns the patent, no amount of copyright license makes
> any difference.

I don't think that's correct. SGI needs to use and distribute Linux more
than they need to enforce their reverse mapping patents against Linux
users.

--
Daniel