LinuxLists.cc - Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

2008-10-02 22:24:47

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

Ping DaveM. Does this look ok? What else would we need for you to remove
your range checking from sparc?

Thanks,
Jesse

On Monday, September 29, 2008 8:19 pm Jesse Brandeburg wrote:
> From: Jesse Barnes <[email protected]>
>
> > I did some snooping around, and while doing so I noticed that the PCI
> > mmap code for x86 doesn't do one bit of range checking on the size, or
> > any other aspect of the request, wrt. the MMIO regions actually mapped
> > in the BARs of the PCI device.
>
> Here's a patch that adds range checking to the sysfs mappings at
> least. This patch should catch the case where X (or some other
> process) tries to map beyond the specific BAR it's (supposedly)
> trying to access, making things safer in general. FWIW both my
> F9 and development versions of X start up fine with this patch
> applied.
>
> DaveM, will this work for you on sparc? It looked like your code
> was allowing bridge window mappings, but that behavior should be
> preserved as long as your bridge devices reflect their window
> sizes correctly in their pdev->resources?
>
> If we add similar code to the procfs stuff we wouldn't need to do
> any checking in the arches.
>
> Signed-off-by: Jesse Barnes <[email protected]>
> Signed-off-by: Jesse Brandeburg <[email protected]>
> ---
>
> drivers/pci/pci-sysfs.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 9c71858..4d1aa6e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -16,6 +16,7 @@
>
>
> #include <linux/kernel.h>
> +#include <linux/sched.h>
> #include <linux/pci.h>
> #include <linux/stat.h>
> #include <linux/topology.h>
> @@ -502,6 +503,8 @@ pci_mmap_resource(struct kobject *kobj, struct
> bin_attribute *attr, struct resource *res = (struct resource
> *)attr->private;
> enum pci_mmap_state mmap_type;
> resource_size_t start, end;
> + unsigned long map_len = vma->vm_end - vma->vm_start;
> + unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
> int i;
>
> for (i = 0; i < PCI_ROM_RESOURCE; i++)
> @@ -510,6 +513,17 @@ pci_mmap_resource(struct kobject *kobj, struct
> bin_attribute *attr, if (i >= PCI_ROM_RESOURCE)
> return -ENODEV;
>
> + /*
> + * Make sure the range the user is trying to map falls within
> + * the resource
> + */
> + if (map_offset + map_len > pci_resource_len(pdev, i)) {
> + WARN(1, "process \"%s\" tried to map 0x%08lx-0x%08lx on BAR %d (size
> 0x%08lx)\n", + current->comm, map_offset, map_offset + map_len, i,
> + (unsigned long)pci_resource_len(pdev, i));
> + return -EINVAL;
> + }
> +
> /* pci_mmap_page_range() expects the same kind of entry as coming
> * from /proc/bus/pci/ which is a "user visible" value. If this is
> * different from the resource itself, arch will do necessary fixup.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Jesse Barnes, Intel Open Source Technology Center

2008-10-03 20:46:54

by David Miller

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

From: Jesse Barnes <[email protected]>
Date: Thu, 2 Oct 2008 15:23:43 -0700

> Ping DaveM. Does this look ok? What else would we need for you to remove
> your range checking from sparc?

Sorry, I've been in Paris for more than a week otherwise
I would have looked at this already.

I fly home tomorrow, so I'll try to look at it by next
week.

2008-10-03 21:30:28

by Jesse Barnes

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Friday, October 3, 2008 1:46 pm David Miller wrote:
> From: Jesse Barnes <[email protected]>
> Date: Thu, 2 Oct 2008 15:23:43 -0700
>
> > Ping DaveM. Does this look ok? What else would we need for you to
> > remove your range checking from sparc?
>
> Sorry, I've been in Paris for more than a week otherwise
> I would have looked at this already.
>
> I fly home tomorrow, so I'll try to look at it by next
> week.

Ok, thanks. You'll have to check Linus' tree for sanity though, he just
merged a variant on my patch for 2.6.27. See
b5ff7df3df9efab511244d5a299fce706c71af48 and yell if it broke something for
you.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2008-10-03 21:45:22

by Jiri Kosina

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Fri, 3 Oct 2008, Jesse Barnes wrote:

> Ok, thanks. You'll have to check Linus' tree for sanity though, he just
> merged a variant on my patch for 2.6.27. See
> b5ff7df3df9efab511244d5a299fce706c71af48 and yell if it broke something
> for you.

Karsten has been testing kernel with these three patches from the series
applied:

e1000e: reset swflag after resetting hardware
e1000e: fix lockdep issues
e1000e: debug contention on NVM SWFLAG

This was done on a hardware which previously triggered the bug in just a
few test iterations in quite a reliable way. Now, with these patches
applied, the EEPROM corruption didn't happen after several tens of
iterations.

Please note, that the patch that disables the writes to EEPROM on hardware
level was *not* involved in this testing.

Therefore it currently seems that these three patches really address the
race condition issue that was present in the e1000e driver.

It is still not clear why the bug started triggering all of a sudden for
so many people though.

--
Jiri Kosina
SUSE Labs

2008-10-03 23:28:20

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Fri, Oct 3, 2008 at 2:45 PM, Jiri Kosina <[email protected]> wrote:
> Karsten has been testing kernel with these three patches from the series
> applied:
>
> e1000e: reset swflag after resetting hardware
> e1000e: fix lockdep issues
> e1000e: debug contention on NVM SWFLAG
>
> This was done on a hardware which previously triggered the bug in just a
> few test iterations in quite a reliable way. Now, with these patches
> applied, the EEPROM corruption didn't happen after several tens of
> iterations.
>
> Please note, that the patch that disables the writes to EEPROM on hardware
> level was *not* involved in this testing.
>
> Therefore it currently seems that these three patches really address the
> race condition issue that was present in the e1000e driver.

Our experience is different. We are also testing with the "protection
patch" reverted.

We see that the problem specifically comes and goes when
removing/adding the use of set_memory_ro/set_memory_rw to the driver.

I'm working to catch the bad element in the act with a hardware
breakpoint or an ITP (we're trying both)

> It is still not clear why the bug started triggering all of a sudden for
> so many people though.

we plan to keep on working on this until we understand what is going on.

2008-10-03 23:30:23

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Fri, Oct 3, 2008 at 4:28 PM, Jesse Brandeburg
<[email protected]> wrote:
> On Fri, Oct 3, 2008 at 2:45 PM, Jiri Kosina <[email protected]> wrote:
>> Karsten has been testing kernel with these three patches from the series
>> applied:
>>
>> e1000e: reset swflag after resetting hardware
>> e1000e: fix lockdep issues
>> e1000e: debug contention on NVM SWFLAG
>>
>> This was done on a hardware which previously triggered the bug in just a
>> few test iterations in quite a reliable way. Now, with these patches
>> applied, the EEPROM corruption didn't happen after several tens of
>> iterations.
>>
>> Please note, that the patch that disables the writes to EEPROM on hardware
>> level was *not* involved in this testing.
>>
>> Therefore it currently seems that these three patches really address the
>> race condition issue that was present in the e1000e driver.
>
> Our experience is different. We are also testing with the "protection
> patch" reverted.
>
> We see that the problem specifically comes and goes when
> removing/adding the use of set_memory_ro/set_memory_rw to the driver.
>
> I'm working to catch the bad element in the act with a hardware
> breakpoint or an ITP (we're trying both)
>
>> It is still not clear why the bug started triggering all of a sudden for
>> so many people though.
>
> we plan to keep on working on this until we understand what is going on.

I removed the bad addresses from the cc: list

2008-10-04 10:21:50

by Jiri Kosina

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Fri, 3 Oct 2008, Jesse Brandeburg wrote:

> Our experience is different. We are also testing with the "protection
> patch" reverted.
> We see that the problem specifically comes and goes when
> removing/adding the use of set_memory_ro/set_memory_rw to the driver.

But if this patch (which is an obvious workaround, compared to the other
patches which fix real bugs, right?) would be catching some malicious
accessess to the mapped EEPROM, there should be stacktraces present in the
kernel log, right?

Thanks,

--
Jiri Kosina
SUSE Labs

2008-10-04 11:04:18

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sat, 4 Oct 2008, Jiri Kosina wrote:
> On Fri, 3 Oct 2008, Jesse Brandeburg wrote:
> > Our experience is different. We are also testing with the "protection
> > patch" reverted.
> > We see that the problem specifically comes and goes when
> > removing/adding the use of set_memory_ro/set_memory_rw to the driver.
>
> But if this patch (which is an obvious workaround, compared to the other
> patches which fix real bugs, right?) would be catching some malicious
> accessess to the mapped EEPROM, there should be stacktraces present in the
> kernel log, right?

Exactly. The access to a ro region results in a fault. I have nowhere
seen that trigger, but I can reproduce the trylock() WARN_ON, which
confirms that there is concurrent access to the NVRAM registers. The
backtrace pattern is similar to the one you have seen.

There are two possible bad results from that concurrent access:

1) Task A issues command A
Task B issues command B
Task A writes data for A
which end up in B

2) Task A acquires the software flag
......

Task B acquires the software flag

Task A releases the software flag

The firmware accesses NVRAM Task B accesses the NVRAM

Both are probably serious enough to result in random NVRAM corruption.
There is no doubt: The missing serialization is a real bug.

Your question why this just happens now, while the bug is there for
ever, is definitely a good one. My opinion on that is that we just
have been lucky or some minor modification somewhere else in the
e1000e code or even in the generic/architecture code removed an
accidental serializing effect.

I was not able to reproduce the trylock warning on Fedora 8, but
Fedora 10-Beta triggers it once in 50 boots. I'm not going to remove
the mutex to verify whether it actually would corrupt the NVRAM :)

In theory we should be able to reproduce the problem with older kernel
versions as well. Maybe not the corruption, but we might see the
mutex_trylock check trigger.

Thanks,

tglx

2008-10-05 01:24:23

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sat, Oct 4, 2008 at 4:02 AM, Thomas Gleixner <[email protected]> wrote:
> On Sat, 4 Oct 2008, Jiri Kosina wrote:
>> On Fri, 3 Oct 2008, Jesse Brandeburg wrote:
>> > Our experience is different. We are also testing with the "protection
>> > patch" reverted.
>> > We see that the problem specifically comes and goes when
>> > removing/adding the use of set_memory_ro/set_memory_rw to the driver.
>>
>> But if this patch (which is an obvious workaround, compared to the other
>> patches which fix real bugs, right?) would be catching some malicious
>> accessess to the mapped EEPROM, there should be stacktraces present in the
>> kernel log, right?

yes, but I think it is just changing timing, i don't see any backtraces either.

> Exactly. The access to a ro region results in a fault. I have nowhere
> seen that trigger, but I can reproduce the trylock() WARN_ON, which
> confirms that there is concurrent access to the NVRAM registers. The
> backtrace pattern is similar to the one you have seen.

are you still getting WARN_ON *with* all the mutex based fixes already applied?

with the mutex patches in place (without protection patch) we are
still reproducing the issue, until we apply the set_memory_ro patch.
I had no luck on friday setting a hardware breakpoint on memory access
with kgdb to catch the writer with a breakpoint.

2008-10-05 08:52:32

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
> > Exactly. The access to a ro region results in a fault. I have nowhere
> > seen that trigger, but I can reproduce the trylock() WARN_ON, which
> > confirms that there is concurrent access to the NVRAM registers. The
> > backtrace pattern is similar to the one you have seen.
>
> are you still getting WARN_ON *with* all the mutex based fixes already applied?

The WARN_ON triggers with current mainline. Is there any fixlet in
Linus tree missing ?

> with the mutex patches in place (without protection patch) we are
> still reproducing the issue, until we apply the set_memory_ro patch.

That does not make sense to me. If the memory_ro patch is providing
_real_ protection then you _must_ run into an access violation. If not,
then the patch just papers over the real problem in some mysterious
way.

The patch does:

+ set_memory_rw((unsigned long)hw->flash_address,
+ hw->flash_len >> PAGE_SHIFT);
writew(val, hw->flash_address + reg);
+ set_memory_ro((unsigned long)hw->flash_address,
+ hw->flash_len >> PAGE_SHIFT);

This changes massively the timing of the flash access. Could this be
the problem on the machine which needs the set_memory_ro patch to
survive ?

> I had no luck on friday setting a hardware breakpoint on memory access
> with kgdb to catch the writer with a breakpoint.

Well, why should you get a hardware breakpoint when the _ro protection
does not trigger in the first place ?

Granted there could be a _rw alias mapping, but then the problem must
be still visible with the _ro patch applied.

Thanks,

tglx

2008-10-05 15:06:50

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

Thomas Gleixner wrote:
> On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
>>> Exactly. The access to a ro region results in a fault. I have nowhere
>>> seen that trigger, but I can reproduce the trylock() WARN_ON, which
>>> confirms that there is concurrent access to the NVRAM registers. The
>>> backtrace pattern is similar to the one you have seen.
>> are you still getting WARN_ON *with* all the mutex based fixes already applied?
>
> The WARN_ON triggers with current mainline. Is there any fixlet in
> Linus tree missing ?
>
>> with the mutex patches in place (without protection patch) we are
>> still reproducing the issue, until we apply the set_memory_ro patch.
>
> That does not make sense to me. If the memory_ro patch is providing
> _real_ protection then you _must_ run into an access violation. If not,
> then the patch just papers over the real problem in some mysterious
> way.
>

not if the bad code is doing copy_to_user .... (or similar)

2008-10-05 15:57:15

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> Thomas Gleixner wrote:
> > On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
> > > > Exactly. The access to a ro region results in a fault. I have nowhere
> > > > seen that trigger, but I can reproduce the trylock() WARN_ON, which
> > > > confirms that there is concurrent access to the NVRAM registers. The
> > > > backtrace pattern is similar to the one you have seen.
> > > are you still getting WARN_ON *with* all the mutex based fixes already
> > > applied?
> >
> > The WARN_ON triggers with current mainline. Is there any fixlet in
> > Linus tree missing ?
> >
> > > with the mutex patches in place (without protection patch) we are
> > > still reproducing the issue, until we apply the set_memory_ro patch.
> >
> > That does not make sense to me. If the memory_ro patch is providing
> > _real_ protection then you _must_ run into an access violation. If not,
> > then the patch just papers over the real problem in some mysterious
> > way.
> >
>
> not if the bad code is doing copy_to_user .... (or similar)

You mean: copy_from_user :) This would require that the e1000e
nvram region is writable via copy_from_user by an e1000e user space
interface. A quick grep does not reviel such a horrible interface.

Thanks,

tglx

2008-10-05 16:02:55

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sun, 5 Oct 2008 17:55:14 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> > Thomas Gleixner wrote:
> > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
> > > > > Exactly. The access to a ro region results in a fault. I have
> > > > > nowhere seen that trigger, but I can reproduce the trylock()
> > > > > WARN_ON, which confirms that there is concurrent access to
> > > > > the NVRAM registers. The backtrace pattern is similar to the
> > > > > one you have seen.
> > > > are you still getting WARN_ON *with* all the mutex based fixes
> > > > already applied?
> > >
> > > The WARN_ON triggers with current mainline. Is there any fixlet in
> > > Linus tree missing ?
> > >
> > > > with the mutex patches in place (without protection patch) we
> > > > are still reproducing the issue, until we apply the
> > > > set_memory_ro patch.
> > >
> > > That does not make sense to me. If the memory_ro patch is
> > > providing _real_ protection then you _must_ run into an access
> > > violation. If not, then the patch just papers over the real
> > > problem in some mysterious way.
> > >
> >
> > not if the bad code is doing copy_to_user .... (or similar)
>
> You mean: copy_from_user :) This would require that the e1000e
> nvram region is writable via copy_from_user by an e1000e user space
> interface. A quick grep does not reviel such a horrible interface.

I meant a "copy_to_user" to a duff pointer, somewhere in the kernel.

2008-10-05 16:21:34

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> On Sun, 5 Oct 2008 17:55:14 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
> > On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> > > Thomas Gleixner wrote:
> > > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
> > > > > > Exactly. The access to a ro region results in a fault. I have
> > > > > > nowhere seen that trigger, but I can reproduce the trylock()
> > > > > > WARN_ON, which confirms that there is concurrent access to
> > > > > > the NVRAM registers. The backtrace pattern is similar to the
> > > > > > one you have seen.
> > > > > are you still getting WARN_ON *with* all the mutex based fixes
> > > > > already applied?
> > > >
> > > > The WARN_ON triggers with current mainline. Is there any fixlet in
> > > > Linus tree missing ?
> > > >
> > > > > with the mutex patches in place (without protection patch) we
> > > > > are still reproducing the issue, until we apply the
> > > > > set_memory_ro patch.
> > > >
> > > > That does not make sense to me. If the memory_ro patch is
> > > > providing _real_ protection then you _must_ run into an access
> > > > violation. If not, then the patch just papers over the real
> > > > problem in some mysterious way.
> > > >
> > >
> > > not if the bad code is doing copy_to_user .... (or similar)
> >
> > You mean: copy_from_user :) This would require that the e1000e
> > nvram region is writable via copy_from_user by an e1000e user space
> > interface. A quick grep does not reviel such a horrible interface.
>
> I meant a "copy_to_user" to a duff pointer, somewhere in the kernel.

Hmm, don't we check the *to address on copy_to_user ?

Thanks,

tglx

2008-10-05 17:01:55

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

On Sun, 5 Oct 2008 18:16:29 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> > On Sun, 5 Oct 2008 17:55:14 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> >
> > > On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> > > > Thomas Gleixner wrote:
> > > > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote:
> > > > > > > Exactly. The access to a ro region results in a fault. I
> > > > > > > have nowhere seen that trigger, but I can reproduce the
> > > > > > > trylock() WARN_ON, which confirms that there is
> > > > > > > concurrent access to the NVRAM registers. The backtrace
> > > > > > > pattern is similar to the one you have seen.
> > > > > > are you still getting WARN_ON *with* all the mutex based
> > > > > > fixes already applied?
> > > > >
> > > > > The WARN_ON triggers with current mainline. Is there any
> > > > > fixlet in Linus tree missing ?
> > > > >
> > > > > > with the mutex patches in place (without protection patch)
> > > > > > we are still reproducing the issue, until we apply the
> > > > > > set_memory_ro patch.
> > > > >
> > > > > That does not make sense to me. If the memory_ro patch is
> > > > > providing _real_ protection then you _must_ run into an access
> > > > > violation. If not, then the patch just papers over the real
> > > > > problem in some mysterious way.
> > > > >
> > > >
> > > > not if the bad code is doing copy_to_user .... (or similar)
> > >
> > > You mean: copy_from_user :) This would require that the e1000e
> > > nvram region is writable via copy_from_user by an e1000e user
> > > space interface. A quick grep does not reviel such a horrible
> > > interface.
> >
> > I meant a "copy_to_user" to a duff pointer, somewhere in the kernel.
>
> Hmm, don't we check the *to address on copy_to_user ?
>

fair point

and we do exception catching for copy_from_user as well on the source,
just by how it's implemented

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-10-07 23:19:34

by David Miller

[permalink] [raw]

Subject: Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote:

From: Jesse Barnes <[email protected]>
Date: Thu, 2 Oct 2008 15:23:43 -0700

> Ping DaveM. Does this look ok? What else would we need for you to remove
> your range checking from sparc?

I can't do that until you add similar checking to the other
pci_mmap_page_range() call site in drivers/pci/proc.c