2006-12-14 02:06:56

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.20-rc1


Ok, the two-week merge period is over, and -rc1 is out there.

I'm _really_ hoping that we can keep the 2.6.20 release calmer and without
any of the dragging-out-due-to-core-changes that we've had lately. We
didn't actually merge any really core changes here, with the biggest
conceptual one being the "work_struct" split into regular work and
"delayed" work, so I'm hoping we can really end up with an easy 2.6.20
release.

Some of the commits there are pretty big patches, but more than a couple
of them are due to fairly straightforward search-and-replace things (like
a largely scripted removal of unnecessary casts of the return value of
"kmalloc()", for example, or the switch to "ktermios" for the tty layer,
or the introduction of "struct path" in the VFS layer instead of keeping
the f_{dentry,vfsmnt} entries separate, or indeed the removal of SLAB_xxx
constant names in favour of the standard GFP_xxx ones).

So while the patch itself isn't actually all that much smaller than usual,
at least my personal gut feel is that the actual changes are not as
intrusive, just in some cases have big diffs.

But both the diffstat and the shortlog are still too big to fit in the
kernel mailing list limits, so you'll just have to take my word for it. Or
get the git repo, and do your own delving into things with

git log v2.6.19..v2.6.20-rc1 | git shortlog

There _are_ a few areas of note:

- the aforementioned "workqueue" changes (where we still have some work
to do to finalize the proper actions on all architectures: it's being
somewhat discussed on the arch mailing lists, hopefully we'll have it
all resolved by -rc2, and it doesn't really worry me)

- lockless page cache (RCU lookups of radix trees)

- kvm driver for all those crazy virtualization people to play with

- networking updates (DCCP, address-family agnostic connection tracking
in netfilter, sparse byte order annotations, yadda yadda)

- HID layer separated out of the USB stuff (bluetooth apparently wants
the HID stuff too)

- tons and tons of driver (ftape removal, ATA, pcmcia, i2c,
infiniband, dvb, networking..) and architecture updates (arm, mips,
powerpc, sh)

and probably some I just forgot about entirely.

Linus


2006-12-14 02:46:48

by Gene Heskett

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Wednesday 13 December 2006 21:06, Linus Torvalds wrote:
>Ok, the two-week merge period is over, and -rc1 is out there.
>
Ok, one not so silly Q (IMO) from the resident old fart. I saw, sometime
in the past week, a relatively huge ieee1394 update go by. And I have
some issues with the present 2.6.19 version causeing segfaults and kino
go-aways when trying to capture from my firewire movie camera. Problems
occur when trying to control the camera from kino.

Is this patchset in this -rc1? If it is, I'll see if I can get a build to
work and check it out.

Thanks.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-14 03:32:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1



On Wed, 13 Dec 2006, Gene Heskett wrote:
>
> Ok, one not so silly Q (IMO) from the resident old fart. I saw, sometime
> in the past week, a relatively huge ieee1394 update go by. And I have
> some issues with the present 2.6.19 version causeing segfaults and kino
> go-aways when trying to capture from my firewire movie camera. Problems
> occur when trying to control the camera from kino.
>
> Is this patchset in this -rc1? If it is, I'll see if I can get a build to
> work and check it out.

-rc1 does include a reasonably big firewire update, but I'm not sure how
it will affect your camera usage. In fact, the ieee1394 people seem to be
trying to deprecate the dv1394 stuff, in favour of just raw1394 and
user-mode libraries.

I think you can tell Kino to use either the DV or the raw interface, but
I'm not sure. If you can, try either. The raw interface seems to be
horribly misdesigned (security problems), but is the one to use.

But by all means, give it a shot, and regardless of whether it works
better or not, you might want to cry on the shoulder of Stefan Richter
<[email protected]> about the issues you see.. Of course, please
talk to the Kino people too (although I have absolutely no idea who they
would be).

Linus

2006-12-14 05:37:15

by Gene Heskett

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Wednesday 13 December 2006 22:32, Linus Torvalds wrote:
>On Wed, 13 Dec 2006, Gene Heskett wrote:
>> Ok, one not so silly Q (IMO) from the resident old fart. I saw,
>> sometime in the past week, a relatively huge ieee1394 update go by.
>> And I have some issues with the present 2.6.19 version causeing
>> segfaults and kino go-aways when trying to capture from my firewire
>> movie camera. Problems occur when trying to control the camera from
>> kino.
>>
>> Is this patchset in this -rc1? If it is, I'll see if I can get a
>> build to work and check it out.
>
>-rc1 does include a reasonably big firewire update, but I'm not sure how
>it will affect your camera usage. In fact, the ieee1394 people seem to
> be trying to deprecate the dv1394 stuff, in favour of just raw1394 and
> user-mode libraries.
>
>I think you can tell Kino to use either the DV or the raw interface, but
>I'm not sure. If you can, try either. The raw interface seems to be
>horribly misdesigned (security problems), but is the one to use.

It is using the raw stuff now.

>
>But by all means, give it a shot,

Will do, sometime in the next few days, its muzzleloading deer season here
ATM. And I wasted half the season on a tv station up in upstate MI.

>and regardless of whether it works
>better or not, you might want to cry on the shoulder of Stefan Richter
><[email protected]> about the issues you see.. Of course, please
>talk to the Kino people too (although I have absolutely no idea who they
>would be).
>
> Linus

That would be primarily Dan Dennedy although Stefan may be the next in
line. Dan tells us that kino will be finalized very shortly as he has
other irons in the fire too.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-14 13:59:26

by Alessandro Suardi

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On 12/14/06, Linus Torvalds <[email protected]> wrote:
>
> Ok, the two-week merge period is over, and -rc1 is out there.

Still need this libata-sff.c patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=116343564202844&q=raw

to have my root device detected, ata_piix probe would otherwise
fail as described in this thread:

http://www.ussg.iu.edu/hypermail/linux/kernel/0612.0/0690.html

--alessandro

"...when I get it, I _get_ it"

(Lara Eidemiller)

2006-12-14 14:18:11

by Steve Wise

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

And the patch was reposted here:

http://marc.theaimsgroup.com/?l=linux-kernel&m=116594961106441&w=2


On Thu, 2006-12-14 at 14:59 +0100, Alessandro Suardi wrote:
> On 12/14/06, Linus Torvalds <[email protected]> wrote:
> >
> > Ok, the two-week merge period is over, and -rc1 is out there.
>
> Still need this libata-sff.c patch:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=116343564202844&q=raw
>
> to have my root device detected, ata_piix probe would otherwise
> fail as described in this thread:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0612.0/0690.html
>
> --alessandro
>
> "...when I get it, I _get_ it"
>
> (Lara Eidemiller)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-12-14 15:40:28

by Alan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, 14 Dec 2006 14:59:23 +0100
"Alessandro Suardi" <[email protected]> wrote:

> On 12/14/06, Linus Torvalds <[email protected]> wrote:
> >
> > Ok, the two-week merge period is over, and -rc1 is out there.
>
> Still need this libata-sff.c patch:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=116343564202844&q=raw
>
> to have my root device detected, ata_piix probe would otherwise
> fail as described in this thread:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0612.0/0690.html
>

Yep - sorry about not dealing with this yet but I've not had opportunity
to do much but email. I'm grabbing 20-rc1 atm to check there are no other
outstanding bits.

Alan

2006-12-14 17:48:32

by Stefan Richter

[permalink] [raw]
Subject: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

Gene Heskett wrote:
> On Wednesday 13 December 2006 22:32, Linus Torvalds wrote:
>>On Wed, 13 Dec 2006, Gene Heskett wrote:
>>> Ok, one not so silly Q (IMO) from the resident old fart. I saw,
>>> sometime in the past week, a relatively huge ieee1394 update go by.
>>> And I have some issues with the present 2.6.19 version causeing
>>> segfaults and kino go-aways when trying to capture from my firewire
>>> movie camera. Problems occur when trying to control the camera from
>>> kino.
>>>
>>> Is this patchset in this -rc1? If it is, I'll see if I can get a
>>> build to work and check it out.

This time everything which was in linux1394-2.6.git before the post
2.6.19 merge window went into 2.6.20-rc1. However it wasn't much
substantial stuff; we didn't get much done during the past few months.
Here is my pull message: http://lkml.org/lkml/2006/12/07/323

You can patch the ieee1394 drivers in 2.6.{19,18,16.x} to nearly the
same level as in 2.6.20-rc1 (that is, minus changesets which only
address in-kernel API changes) by means of the patchkit v212 from
http://me.in-berlin.de/~s5r6/linux1394/merged/. However I'm 99.9% sure
that it won't fix the problems you got.

>>-rc1 does include a reasonably big firewire update, but I'm not sure how
>>it will affect your camera usage. In fact, the ieee1394 people seem to
>> be trying to deprecate the dv1394 stuff, in favour of just raw1394 and
>> user-mode libraries.

That's right.

>>I think you can tell Kino to use either the DV or the raw interface, but
>>I'm not sure. If you can, try either. The raw interface seems to be
>>horribly misdesigned (security problems), but is the one to use.

These security issues are partly inherent to the IEEE 1394 architecture,
if I may say so. Dan Dennedy has a patch as work in progress to improve
raw1394's security towards devices as far as possible (while still
allowing non-root users access to /dev/raw1394, unless the administrator
thinks otherwise).

The other issue is security of the local host against attacks from
malicious external devices or other PCs. Here the issue is with
OHCI-1394's physical DMA feature and the fact that the sbp2 driver needs
it enabled. I'm planning to implement filtered physical DMA as a simple
security improvement and, at some day in a /distant/ future, implement a
DMA mapping in sbp2 to work completely without physical DMA.

(Anyway, that's unrelated to Gene's issues.)
--
Stefan Richter
-=====-=-==- ==-- -===-
http://arcgraph.de/sr/

2006-12-14 19:30:09

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Hi Linus,

`hddtemp' has stopped working on 2.6.20-rc1:

[root] 19:25 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
/dev/sda: ATA WDC WD2500KS-00M: S.M.A.R.T. not available
/dev/sdb: ATA WDC WD2500KS-00M: S.M.A.R.T. not available
/dev/sdc: ATA Maxtor 6B200M0: S.M.A.R.T. not available
/dev/sdd: ATA Maxtor 6B200M0: S.M.A.R.T. not available

Stracing the binary reveals:

open("/dev/sdd", O_RDONLY|O_NONBLOCK) = 3
ioctl(3, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a0636c) = 0
ioctl(3, SG_IO, 0x7fffe8a06020) = 0
ioctl(3, SG_IO, 0x7fffe8a06040) = 0
ioctl(3, 0x30d, 0x506b80) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a06384) = 0
ioctl(3, SG_IO, 0x7fffe8a06240) = 0
open("/dev/sdc", O_RDONLY|O_NONBLOCK) = 4
ioctl(4, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a0636c) = 0
ioctl(4, SG_IO, 0x7fffe8a06020) = 0
ioctl(4, SG_IO, 0x7fffe8a06040) = 0
ioctl(4, 0x30d, 0x506b80) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(4, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a06384) = 0
ioctl(4, SG_IO, 0x7fffe8a06240) = 0
open("/dev/sdb", O_RDONLY|O_NONBLOCK) = 5
ioctl(5, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a0636c) = 0
ioctl(5, SG_IO, 0x7fffe8a06020) = 0
ioctl(5, SG_IO, 0x7fffe8a06040) = 0
ioctl(5, 0x30d, 0x506b80) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(5, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a06384) = 0
ioctl(5, SG_IO, 0x7fffe8a06240) = 0
open("/dev/sda", O_RDONLY|O_NONBLOCK) = 6
ioctl(6, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a0636c) = 0
ioctl(6, SG_IO, 0x7fffe8a06020) = 0
ioctl(6, SG_IO, 0x7fffe8a06040) = 0
ioctl(6, 0x30d, 0x506b80) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(6, SCSI_IOCTL_GET_BUS_NUMBER, 0x7fffe8a06384) = 0
ioctl(6, SG_IO, 0x7fffe8a06240) = 0
ioctl(6, SG_IO, 0x7fffe8a05d20) = 0

Is there a known workaround for this?

SMART is enabled in the BIOS and it's available in 2.6.19.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-14 19:58:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1



On Thu, 14 Dec 2006, Alistair John Strachan wrote:
>
> `hddtemp' has stopped working on 2.6.20-rc1:

Hmm. Can you do the strace on a working kernel too? For example, is it
that the 0x30d ioctl (which is HDIO_GET_IDENTITY) used to work? If it's a
SATA device, and you _used_ to use the PATA drivers, some of the old
IDE-only ioctl's simply don't work when used in native SATA
configurations.

[ Side note: I consider that to be a mis-feature, but it's not a new
regression, it's always been that way: different block subsystems have
had their own "private" ioctl spaces.

We've been moving more and more towards a unified space, and we could
probably make scsi_ioctl.c emulate at least _some_ of the HDIO_xxx calls
too, and try to support all the block ioctl's on all block devices
rather than have some that work only on some certain class of hardware.

But we're not there yet, and in the meantime it will actually make a
difference whether you use your disks through the kernel SCSI layer
(SATA and /dev/sdX) or through the IDE layer (IDE and /dev/hdX) ]

On the other hand, this _sounds_ very much like a bug that should have
been fixed before 2.6.20-rc1, which affected SG_IO.

If you can do a "git bisect" on this, that would help a lot.

(Btw, where is "hddtemp" from, anyway? Doesn't seem to be part of the
standard set of tools I have on any of my systems)

Linus

2006-12-14 20:06:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Linus Torvalds wrote:
>
> On Thu, 14 Dec 2006, Alistair John Strachan wrote:
>> `hddtemp' has stopped working on 2.6.20-rc1:
>
> Hmm. Can you do the strace on a working kernel too? For example, is it
> that the 0x30d ioctl (which is HDIO_GET_IDENTITY) used to work? If it's a
> SATA device, and you _used_ to use the PATA drivers, some of the old
> IDE-only ioctl's simply don't work when used in native SATA
> configurations.
>
> [ Side note: I consider that to be a mis-feature, but it's not a new
> regression, it's always been that way: different block subsystems have
> had their own "private" ioctl spaces.
>
> We've been moving more and more towards a unified space, and we could
> probably make scsi_ioctl.c emulate at least _some_ of the HDIO_xxx calls
> too, and try to support all the block ioctl's on all block devices
> rather than have some that work only on some certain class of hardware.

FWIW, libata generally follows a "implement it, if enough people care
about it" policy for the old HDIO_xxx ioctls.

There are plenty of HDIO_xxx ioctl should that have died back in the
days when people using the 'hd' driver rather than the newfangled IDE
driver.

So this change sorta filters out a lot of those older ioctls.

hddtemp is open source and reasonably well known, so I would certainly
like to support it, if its reasonable. For ATA disks, obtaining the
temperature sometimes requires vendor-specific or firmware-specific
knowledge. hddtemp centralizes all that info in a database.

Jeff


2006-12-14 20:16:41

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thursday 14 December 2006 19:57, Linus Torvalds wrote:
> On Thu, 14 Dec 2006, Alistair John Strachan wrote:
> > `hddtemp' has stopped working on 2.6.20-rc1:
>
> Hmm. Can you do the strace on a working kernel too? For example, is it
> that the 0x30d ioctl (which is HDIO_GET_IDENTITY) used to work? If it's a
> SATA device, and you _used_ to use the PATA drivers, some of the old
> IDE-only ioctl's simply don't work when used in native SATA
> configurations.

I've always been using sata_nv and libata. All the drives in question are SATA
devices, no configuration change other than this kernel has taken place.

Indeed, the configs are very similar. Find the configs, straces on both
kernels, and the hddtemp binary (AMD64, I'm afraid) here:

http://devzero.co.uk/~alistair/2.6.20-rc1-hddtemp/

> [ Side note: I consider that to be a mis-feature, but it's not a new
> regression, it's always been that way: different block subsystems have
> had their own "private" ioctl spaces.
>
> We've been moving more and more towards a unified space, and we could
> probably make scsi_ioctl.c emulate at least _some_ of the HDIO_xxx calls
> too, and try to support all the block ioctl's on all block devices
> rather than have some that work only on some certain class of hardware.
>
> But we're not there yet, and in the meantime it will actually make a
> difference whether you use your disks through the kernel SCSI layer
> (SATA and /dev/sdX) or through the IDE layer (IDE and /dev/hdX) ]
>
> On the other hand, this _sounds_ very much like a bug that should have
> been fixed before 2.6.20-rc1, which affected SG_IO.
>
> If you can do a "git bisect" on this, that would help a lot.

I'll do that if nobody comes up with anything obvious.

> (Btw, where is "hddtemp" from, anyway? Doesn't seem to be part of the
> standard set of tools I have on any of my systems)

http://www.guzu.net/linux/hddtemp.php

I run this in monitor mode, and then use the gkrellm plugin to keep an eye on
the disk temperatures. It's worked well for years, until now.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-14 20:26:13

by Erik Andersen

[permalink] [raw]
Subject: [PATCH] support HDIO_GET_IDENTITY in libata

On Thu Dec 14, 2006 at 03:05:52PM -0500, Jeff Garzik wrote:
> FWIW, libata generally follows a "implement it, if enough people care
> about it" policy for the old HDIO_xxx ioctls.

I personally care about HDIO_GET_IDENTITY and find it terribly
useful to quickly find out about a drive. Perhaps enough other
people care about this ioctl that it might make it into the official
libata tree. Well tested with a number of months of use.

Signed-off-by: Erik Andersen <[email protected]>

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--


--- orig/drivers/ata/libata-scsi.c 2006-10-16 16:50:50.000000000 -0600
+++ linux-2.6.18/drivers/ata/libata-scsi.c 2006-10-16 16:50:50.000000000 -0600
@@ -303,6 +303,172 @@
return rc;
}

+static void ide_fixstring (u8 *s, const int bytecount)
+{
+ u8 *p = s, *end = &s[bytecount & ~1]; /* bytecount must be even */
+
+#ifndef __BIG_ENDIAN
+# ifdef __LITTLE_ENDIAN
+ /* convert from big-endian to host byte order */
+ for (p = end ; p != s;) {
+ unsigned short *pp = (unsigned short *) (p -= 2);
+ *pp = ntohs(*pp);
+ }
+# else
+# error "Please fix <asm/byteorder.h>"
+# endif
+#endif
+ /* strip leading blanks */
+ while (s != end && *s == ' ')
+ ++s;
+ /* compress internal blanks and strip trailing blanks */
+ while (s != end && *s) {
+ if (*s++ != ' ' || (s != end && *s && *s != ' '))
+ *p++ = *(s-1);
+ }
+ /* wipe out trailing garbage */
+ while (p != end)
+ *p++ = '\0';
+}
+
+static void ide_fix_driveid (struct hd_driveid *id)
+{
+#ifndef __LITTLE_ENDIAN
+# ifdef __BIG_ENDIAN
+ int i;
+ u16 *stringcast;
+
+ id->config = __le16_to_cpu(id->config);
+ id->cyls = __le16_to_cpu(id->cyls);
+ id->reserved2 = __le16_to_cpu(id->reserved2);
+ id->heads = __le16_to_cpu(id->heads);
+ id->track_bytes = __le16_to_cpu(id->track_bytes);
+ id->sector_bytes = __le16_to_cpu(id->sector_bytes);
+ id->sectors = __le16_to_cpu(id->sectors);
+ id->vendor0 = __le16_to_cpu(id->vendor0);
+ id->vendor1 = __le16_to_cpu(id->vendor1);
+ id->vendor2 = __le16_to_cpu(id->vendor2);
+ stringcast = (u16 *)&id->serial_no[0];
+ for (i = 0; i < (20/2); i++)
+ stringcast[i] = __le16_to_cpu(stringcast[i]);
+ id->buf_type = __le16_to_cpu(id->buf_type);
+ id->buf_size = __le16_to_cpu(id->buf_size);
+ id->ecc_bytes = __le16_to_cpu(id->ecc_bytes);
+ stringcast = (u16 *)&id->fw_rev[0];
+ for (i = 0; i < (8/2); i++)
+ stringcast[i] = __le16_to_cpu(stringcast[i]);
+ stringcast = (u16 *)&id->model[0];
+ for (i = 0; i < (40/2); i++)
+ stringcast[i] = __le16_to_cpu(stringcast[i]);
+ id->dword_io = __le16_to_cpu(id->dword_io);
+ id->reserved50 = __le16_to_cpu(id->reserved50);
+ id->field_valid = __le16_to_cpu(id->field_valid);
+ id->cur_cyls = __le16_to_cpu(id->cur_cyls);
+ id->cur_heads = __le16_to_cpu(id->cur_heads);
+ id->cur_sectors = __le16_to_cpu(id->cur_sectors);
+ id->cur_capacity0 = __le16_to_cpu(id->cur_capacity0);
+ id->cur_capacity1 = __le16_to_cpu(id->cur_capacity1);
+ id->lba_capacity = __le32_to_cpu(id->lba_capacity);
+ id->dma_1word = __le16_to_cpu(id->dma_1word);
+ id->dma_mword = __le16_to_cpu(id->dma_mword);
+ id->eide_pio_modes = __le16_to_cpu(id->eide_pio_modes);
+ id->eide_dma_min = __le16_to_cpu(id->eide_dma_min);
+ id->eide_dma_time = __le16_to_cpu(id->eide_dma_time);
+ id->eide_pio = __le16_to_cpu(id->eide_pio);
+ id->eide_pio_iordy = __le16_to_cpu(id->eide_pio_iordy);
+ for (i = 0; i < 2; ++i)
+ id->words69_70[i] = __le16_to_cpu(id->words69_70[i]);
+ for (i = 0; i < 4; ++i)
+ id->words71_74[i] = __le16_to_cpu(id->words71_74[i]);
+ id->queue_depth = __le16_to_cpu(id->queue_depth);
+ for (i = 0; i < 4; ++i)
+ id->words76_79[i] = __le16_to_cpu(id->words76_79[i]);
+ id->major_rev_num = __le16_to_cpu(id->major_rev_num);
+ id->minor_rev_num = __le16_to_cpu(id->minor_rev_num);
+ id->command_set_1 = __le16_to_cpu(id->command_set_1);
+ id->command_set_2 = __le16_to_cpu(id->command_set_2);
+ id->cfsse = __le16_to_cpu(id->cfsse);
+ id->cfs_enable_1 = __le16_to_cpu(id->cfs_enable_1);
+ id->cfs_enable_2 = __le16_to_cpu(id->cfs_enable_2);
+ id->csf_default = __le16_to_cpu(id->csf_default);
+ id->dma_ultra = __le16_to_cpu(id->dma_ultra);
+ id->trseuc = __le16_to_cpu(id->trseuc);
+ id->trsEuc = __le16_to_cpu(id->trsEuc);
+ id->CurAPMvalues = __le16_to_cpu(id->CurAPMvalues);
+ id->mprc = __le16_to_cpu(id->mprc);
+ id->hw_config = __le16_to_cpu(id->hw_config);
+ id->acoustic = __le16_to_cpu(id->acoustic);
+ id->msrqs = __le16_to_cpu(id->msrqs);
+ id->sxfert = __le16_to_cpu(id->sxfert);
+ id->sal = __le16_to_cpu(id->sal);
+ id->spg = __le32_to_cpu(id->spg);
+ id->lba_capacity_2 = __le64_to_cpu(id->lba_capacity_2);
+ for (i = 0; i < 22; i++)
+ id->words104_125[i] = __le16_to_cpu(id->words104_125[i]);
+ id->last_lun = __le16_to_cpu(id->last_lun);
+ id->word127 = __le16_to_cpu(id->word127);
+ id->dlf = __le16_to_cpu(id->dlf);
+ id->csfo = __le16_to_cpu(id->csfo);
+ for (i = 0; i < 26; i++)
+ id->words130_155[i] = __le16_to_cpu(id->words130_155[i]);
+ id->word156 = __le16_to_cpu(id->word156);
+ for (i = 0; i < 3; i++)
+ id->words157_159[i] = __le16_to_cpu(id->words157_159[i]);
+ id->cfa_power = __le16_to_cpu(id->cfa_power);
+ for (i = 0; i < 14; i++)
+ id->words161_175[i] = __le16_to_cpu(id->words161_175[i]);
+ for (i = 0; i < 31; i++)
+ id->words176_205[i] = __le16_to_cpu(id->words176_205[i]);
+ for (i = 0; i < 48; i++)
+ id->words206_254[i] = __le16_to_cpu(id->words206_254[i]);
+ id->integrity_word = __le16_to_cpu(id->integrity_word);
+# else
+# error "Please fix <asm/byteorder.h>"
+# endif
+#endif
+ ide_fixstring(id->model, sizeof(id->model));
+ ide_fixstring(id->fw_rev, sizeof(id->fw_rev));
+ ide_fixstring(id->serial_no, sizeof(id->serial_no));
+}
+
+/**
+ * ata_identify_ioctl - Handler for HDIO_GET_IDENTITY ioctl
+ * @scsidev: Device to which we are issuing command
+ * @id: a SECTOR_SIZE buffer in which to return the ATA identity
+ *
+ * LOCKING:
+ * Defined by the SCSI layer. We don't really care.
+ *
+ * RETURNS:
+ * Zero on success, negative errno on error.
+ */
+int ata_identify_ioctl(struct scsi_device *scsidev, int cmd, u8 *argbuf)
+{
+ int rc = 0;
+ u8 scsi_cmd[MAX_COMMAND_SIZE];
+ struct scsi_sense_hdr sshdr;
+ struct hd_driveid *id;
+
+ memset(scsi_cmd, 0, sizeof(scsi_cmd));
+ scsi_cmd[0] = ATA_16;
+ scsi_cmd[1] = (4 << 1); /* PIO Data-in */
+ scsi_cmd[2] = 0x0e; /* no off.line or cc, read from dev,
+ block count in sector count field */
+ scsi_cmd[14] = cmd; /* WIN_IDENTIFY or WIN_PIDENTIFY */
+
+ /* Good values for timeout and retries? Values below
+ from scsi_ioctl_send_command() for default case... */
+ if (scsi_execute_req(scsidev, scsi_cmd, DMA_FROM_DEVICE,
+ argbuf, SECTOR_SIZE, &sshdr, (10*HZ), 5))
+ rc = -EIO;
+
+ /* Need code to retrieve data from check condition? */
+ id = (struct hd_driveid *) argbuf;
+ ide_fix_driveid(id);
+
+ return rc;
+}
+
int ata_scsi_ioctl(struct scsi_device *scsidev, int cmd, void __user *arg)
{
int val = -EINVAL, rc = -EINVAL;
@@ -330,6 +496,45 @@
return -EACCES;
return ata_task_ioctl(scsidev, arg);

+ case HDIO_GET_IDENTITY:
+ case HDIO_OBSOLETE_IDENTITY:
+ {
+ u8 *argbuf;
+ int ret, idcmd;
+ struct ata_port *ap;
+ struct ata_device *dev;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EACCES;
+
+ ap = (struct ata_port *) &scsidev->host->hostdata[0];
+ if (!ap)
+ return -ENODEV;
+
+ dev = ata_scsi_find_dev(ap, scsidev);
+ if (!dev)
+ return -ENODEV;
+
+ argbuf = kmalloc(SECTOR_SIZE, GFP_KERNEL);
+ if (NULL == (void *)argbuf) {
+ return -ENOMEM;
+ }
+
+ idcmd = WIN_IDENTIFY;
+ if (!atapi_enabled && dev->class == ATA_DEV_ATAPI) {
+ idcmd = WIN_PIDENTIFY;
+ }
+ ret = ata_identify_ioctl(scsidev, idcmd, argbuf);
+ if (ret!=0 || copy_to_user((char *)arg, (char *)argbuf,
+ (cmd == HDIO_GET_IDENTITY) ?
+ sizeof(struct hd_driveid) : 142))
+ {
+ ret = -EFAULT;
+ }
+ kfree(argbuf);
+ return ret;
+ }
+
default:
rc = -ENOTTY;
break;

2006-12-14 20:27:30

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, Dec 14 2006, Alistair John Strachan wrote:
> On Thursday 14 December 2006 19:57, Linus Torvalds wrote:
> > On Thu, 14 Dec 2006, Alistair John Strachan wrote:
> > > `hddtemp' has stopped working on 2.6.20-rc1:
> >
> > Hmm. Can you do the strace on a working kernel too? For example, is it
> > that the 0x30d ioctl (which is HDIO_GET_IDENTITY) used to work? If it's a
> > SATA device, and you _used_ to use the PATA drivers, some of the old
> > IDE-only ioctl's simply don't work when used in native SATA
> > configurations.
>
> I've always been using sata_nv and libata. All the drives in question are SATA
> devices, no configuration change other than this kernel has taken place.
>
> Indeed, the configs are very similar. Find the configs, straces on both
> kernels, and the hddtemp binary (AMD64, I'm afraid) here:
>
> http://devzero.co.uk/~alistair/2.6.20-rc1-hddtemp/

Looking at the strace, it would _seem_ that an SG_IO failure could very
well be the reason for the diverged paths. And that would indicate
another bug in that area, outside of what we already fixed for
2.6.20-rc1.

Is the hddtemp source not available?

> > If you can do a "git bisect" on this, that would help a lot.
>
> I'll do that if nobody comes up with anything obvious.

If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
patch again.

--
Jens Axboe

2006-12-14 20:31:33

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] support HDIO_GET_IDENTITY in libata

Erik Andersen wrote:
> + if (!atapi_enabled && dev->class == ATA_DEV_ATAPI) {

This seems like an impossible condition?

Jeff


2006-12-14 20:32:42

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan <s0348365 <at> sms.ed.ac.uk> writes:

> `hddtemp' has stopped working on 2.6.20-rc1:

http://bugzilla.kernel.org/show_bug.cgi?id=7581


2006-12-14 20:33:34

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Jens Axboe wrote:
> Is the hddtemp source not available?


http://www.guzu.net/linux/hddtemp.php says
http://www.guzu.net/files/hddtemp-0.3-beta15.tar.bz2

Jeff


2006-12-14 20:35:20

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, Dec 14 2006, Jeff Garzik wrote:
> Jens Axboe wrote:
> >Is the hddtemp source not available?
>
>
> http://www.guzu.net/linux/hddtemp.php says
> http://www.guzu.net/files/hddtemp-0.3-beta15.tar.bz2

Thanks! I'll await the 2.6.19-git1 test to see how to proceede.

--
Jens Axboe

2006-12-14 20:40:47

by Erik Andersen

[permalink] [raw]
Subject: Re: [PATCH] support HDIO_GET_IDENTITY in libata

On Thu Dec 14, 2006 at 03:31:30PM -0500, Jeff Garzik wrote:
> Erik Andersen wrote:
> >+ if (!atapi_enabled && dev->class == ATA_DEV_ATAPI) {
>
> This seems like an impossible condition?

Hmm, suppose so. Do you think that simply doing:
if (dev->class == ATA_DEV_ATAPI) {
here would be sufficient?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2006-12-14 20:47:31

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, Dec 14 2006, Jens Axboe wrote:
> > I'll do that if nobody comes up with anything obvious.
>
> If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> patch again.

Actually, you should test 2.6.19-git1 with this patch applied as well.

From: FUJITA Tomonori <[email protected]>
Date: Mon, 11 Dec 2006 09:01:34 +0000 (+0100)
Subject: [PATCH] fix SG_IO bio leak
X-Git-Url: http://git.home.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=77d172ce2719b5ad2dc0637452c8871d9cba344c

[PATCH] fix SG_IO bio leak

This patch fixes bio leaks in SG_IO. rq->bio can be changed after io
completion, so we need to reset rq->bio before calling blk_rq_unmap_user()

http://marc.theaimsgroup.com/?l=linux-kernel&m=116570666807983&w=2

Signed-off-by: FUJITA Tomonori <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---

--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -228,6 +228,7 @@ static int sg_io(struct file *file, requ
struct request *rq;
char sense[SCSI_SENSE_BUFFERSIZE];
unsigned char cmd[BLK_MAX_CDB];
+ struct bio *bio;

if (hdr->interface_id != 'S')
return -EINVAL;
@@ -308,6 +309,7 @@ static int sg_io(struct file *file, requ
if (ret)
goto out;

+ bio = rq->bio;
rq->retries = 0;

start_time = jiffies;
@@ -338,6 +340,7 @@ static int sg_io(struct file *file, requ
hdr->sb_len_wr = len;
}

+ rq->bio = bio;
if (blk_rq_unmap_user(rq))
ret = -EFAULT;


--
Jens Axboe

2006-12-14 21:13:31

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Hi Jens,

On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> On Thu, Dec 14 2006, Jens Axboe wrote:
> > > I'll do that if nobody comes up with anything obvious.
> >
> > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > patch again.
>
> Actually, you should test 2.6.19-git1 with this patch applied as well.

2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and hddtemp
continues to work fine:

[root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
/dev/sda: WDC WD2500KS-00MJB0: 29?C
/dev/sdb: WDC WD2500KS-00MJB0: 27?C
/dev/sdc: Maxtor 6B200M0: 28?C
/dev/sdd: Maxtor 6B200M0: 26?C

I've added the strace results to the URL previously posted, with the config.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-14 21:19:31

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, Dec 14 2006, Alistair John Strachan wrote:
> Hi Jens,
>
> On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> > On Thu, Dec 14 2006, Jens Axboe wrote:
> > > > I'll do that if nobody comes up with anything obvious.
> > >
> > > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > > patch again.
> >
> > Actually, you should test 2.6.19-git1 with this patch applied as well.
>
> 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and hddtemp
> continues to work fine:
>
> [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> /dev/sda: WDC WD2500KS-00MJB0: 29?C
> /dev/sdb: WDC WD2500KS-00MJB0: 27?C
> /dev/sdc: Maxtor 6B200M0: 28?C
> /dev/sdd: Maxtor 6B200M0: 26?C
>
> I've added the strace results to the URL previously posted, with the config.

Then it is likely the sata updates, SG_IO is off the hook.

--
Jens Axboe

2006-12-14 21:34:14

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan wrote:
> Hi Jens,
>
> On Thursday 14 December 2006 20:48, Jens Axboe wrote:
>> On Thu, Dec 14 2006, Jens Axboe wrote:
>>>> I'll do that if nobody comes up with anything obvious.
>>> If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
>>> patch again.
>> Actually, you should test 2.6.19-git1 with this patch applied as well.
>
> 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and hddtemp
> continues to work fine:
>
> [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> /dev/sda: WDC WD2500KS-00MJB0: 29?C
> /dev/sdb: WDC WD2500KS-00MJB0: 27?C
> /dev/sdc: Maxtor 6B200M0: 28?C
> /dev/sdd: Maxtor 6B200M0: 26?C

So can you bisect and see which patch broke things?

I do wonder if its the update to sata_nv ADMA.

Jeff



2006-12-14 21:44:13

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thursday 14 December 2006 21:33, Jeff Garzik wrote:
> Alistair John Strachan wrote:
> > Hi Jens,
> >
> > On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> >> On Thu, Dec 14 2006, Jens Axboe wrote:
> >>>> I'll do that if nobody comes up with anything obvious.
> >>>
> >>> If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> >>> patch again.
> >>
> >> Actually, you should test 2.6.19-git1 with this patch applied as well.
> >
> > 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and
> > hddtemp continues to work fine:
> >
> > [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> > /dev/sda: WDC WD2500KS-00MJB0: 29?C
> > /dev/sdb: WDC WD2500KS-00MJB0: 27?C
> > /dev/sdc: Maxtor 6B200M0: 28?C
> > /dev/sdd: Maxtor 6B200M0: 26?C
>
> So can you bisect and see which patch broke things?
>
> I do wonder if its the update to sata_nv ADMA.

Before I proceed with the horrors of an -rc1 bisection, could somebody send me
the ADMA patches so I can eliminate those first?

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-14 21:51:05

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan wrote:
> Before I proceed with the horrors of an -rc1 bisection, could somebody send me
> the ADMA patches so I can eliminate those first?

Run

git-whatchanged drivers/ata/sata_nv.c

and that will give you a list of recent changes. To obtain the "diff
-u" patch for a single commit, run

git-diff-tree -p $SHA_HASH > /tmp/patch

Regards,

Jeff


2006-12-14 21:54:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan wrote:
> Before I proceed with the horrors of an -rc1 bisection, could somebody send me
> the ADMA patches so I can eliminate those first?


BTW a bisection need not be blindly horrific... You can look at the
commit ids from git-whatchanged output mentioned in the previous email,
and make educated guesses about what might be a good or bad change.

Jeff


2006-12-14 22:32:59

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thursday 14 December 2006 21:50, Jeff Garzik wrote:
> Alistair John Strachan wrote:
> > Before I proceed with the horrors of an -rc1 bisection, could somebody
> > send me the ADMA patches so I can eliminate those first?
>
> Run
>
> git-whatchanged drivers/ata/sata_nv.c
>
> and that will give you a list of recent changes. To obtain the "diff
> -u" patch for a single commit, run
>
> git-diff-tree -p $SHA_HASH > /tmp/patch

I used a variation on this:

git-whatchanged -p v2.6.19.. drivers/ata/sata_nv.c >sata_nv

Reverted it (against v2.6.20-rc1), compiled that kernel, no dice.

[root] 22:32 [~] hddtemp /dev/sda
/dev/sda: ATA WDC WD2500KS-00M: S.M.A.R.T. not available

I'll start the bisection.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-14 23:22:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Nicolas Mailhot wrote:
> Alistair John Strachan <s0348365 <at> sms.ed.ac.uk> writes:
>
>> `hddtemp' has stopped working on 2.6.20-rc1:
>
> → http://bugzilla.kernel.org/show_bug.cgi?id=7581

I'm not sure I quite follow your bug report. Are you saying that the
patch you attached causes the problem?

Jeff



2006-12-14 23:55:53

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Le jeudi 14 décembre 2006 à 18:22 -0500, Jeff Garzik a écrit :
> Nicolas Mailhot wrote:
> > Alistair John Strachan <s0348365 <at> sms.ed.ac.uk> writes:
> >
> >> `hddtemp' has stopped working on 2.6.20-rc1:
> >
> > → http://bugzilla.kernel.org/show_bug.cgi?id=7581
>
> I'm not sure I quite follow your bug report. Are you saying that the
> patch you attached causes the problem?

No, I'm saying the hddtemp breakage people report against 2.6.20-rc1
also occurred between 2.6.19-rc5-mm2 and 2.6.19-rc6-mm1, as reported in
this bug.

Also this system sports two different (S)ATA controllers (SiI 3114 and
CK804), disks on both stopped reporting temp at the same time so the
change is not chipset-specific

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2006-12-15 00:48:25

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thursday 14 December 2006 21:20, Jens Axboe wrote:
> On Thu, Dec 14 2006, Alistair John Strachan wrote:
> > Hi Jens,
> >
> > On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> > > On Thu, Dec 14 2006, Jens Axboe wrote:
> > > > > I'll do that if nobody comes up with anything obvious.
> > > >
> > > > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > > > patch again.
> > >
> > > Actually, you should test 2.6.19-git1 with this patch applied as well.
> >
> > 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and
> > hddtemp continues to work fine:
> >
> > [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> > /dev/sda: WDC WD2500KS-00MJB0: 29?C
> > /dev/sdb: WDC WD2500KS-00MJB0: 27?C
> > /dev/sdc: Maxtor 6B200M0: 28?C
> > /dev/sdd: Maxtor 6B200M0: 26?C
> >
> > I've added the strace results to the URL previously posted, with the
> > config.
>
> Then it is likely the sata updates, SG_IO is off the hook.

I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, which
git reckons is the culprit. I wasn't able to revert this commit to test,
because it has conflicts.

Any ideas?

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-15 01:04:22

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan wrote:
> I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, which
> git reckons is the culprit. I wasn't able to revert this commit to test,
> because it has conflicts.
>
> Any ideas?

That would be this one I assume?

[PATCH] block: support larger block pc requests

author Mike Christie <[email protected]>
Fri, 1 Dec 2006 09:40:55 +0000 (10:40 +0100)
committer Jens Axboe <[email protected]>
Fri, 1 Dec 2006 09:40:55 +0000 (10:40 +0100)
commit 0e75f9063f5c55fb0b0b546a7c356f8ec186825e
tree db138f641175403546c2147def4b405f3ff453a8
parent ad2d7225709b11da47e092634cbdf0591829ae9c
[PATCH] block: support larger block pc requests

This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c
users so that it supports requests larger than bio by chaining them
together.

Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-12-15 01:41:48

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Friday 15 December 2006 00:48, Alistair John Strachan wrote:
> On Thursday 14 December 2006 21:20, Jens Axboe wrote:
> > On Thu, Dec 14 2006, Alistair John Strachan wrote:
> > > Hi Jens,
> > >
> > > On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> > > > On Thu, Dec 14 2006, Jens Axboe wrote:
> > > > > > I'll do that if nobody comes up with anything obvious.
> > > > >
> > > > > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > > > > patch again.
> > > >
> > > > Actually, you should test 2.6.19-git1 with this patch applied as
> > > > well.
> > >
> > > 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and
> > > hddtemp continues to work fine:
> > >
> > > [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> > > /dev/sda: WDC WD2500KS-00MJB0: 29?C
> > > /dev/sdb: WDC WD2500KS-00MJB0: 27?C
> > > /dev/sdc: Maxtor 6B200M0: 28?C
> > > /dev/sdd: Maxtor 6B200M0: 26?C
> > >
> > > I've added the strace results to the URL previously posted, with the
> > > config.
> >
> > Then it is likely the sata updates, SG_IO is off the hook.
>
> I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e,
> which git reckons is the culprit. I wasn't able to revert this commit to
> test, because it has conflicts.

Whatever the change is, it's subtle. I don't see the problem in git1+patch,
but I know this kernel _includes_ this changeset.

In total isolation, v2.6.19..0e75f9063f5c55fb0b0b546a7c356f8ec186825e it
breaks. Reverting just 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, it works
again.

So I think this is the source, but I can't explain why it "goes away" before
git1 and "comes back" before 2.6.20-rc1.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-15 03:17:54

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Thursday 14 December 2006 12:48, Stefan Richter wrote:
[...]
>
>(Anyway, that's unrelated to Gene's issues.)

And which I haven't had a chance to check yet, the camera is still in the
truck and I've been busier than a one legged man in an ass kicking
contest today. I did get 2.6.20-rc1 built and its whats running, but
that is as far as I got, too many other honeydo's. Tomorrow hopefully.
If I don't wind up using a backhoe for a divining rod, looking for our
sewer which is beginning to nag us occasionally.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-15 16:45:07

by Bill Davidsen

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Linus Torvalds wrote:
> Ok, the two-week merge period is over, and -rc1 is out there.
>
> I'm _really_ hoping that we can keep the 2.6.20 release calmer and without
> any of the dragging-out-due-to-core-changes that we've had lately. We
> didn't actually merge any really core changes here, with the biggest
> conceptual one being the "work_struct" split into regular work and
> "delayed" work, so I'm hoping we can really end up with an easy 2.6.20
> release.
>
> Some of the commits there are pretty big patches, but more than a couple
> of them are due to fairly straightforward search-and-replace things (like
> a largely scripted removal of unnecessary casts of the return value of
> "kmalloc()", for example, or the switch to "ktermios" for the tty layer,
> or the introduction of "struct path" in the VFS layer instead of keeping
> the f_{dentry,vfsmnt} entries separate, or indeed the removal of SLAB_xxx
> constant names in favour of the standard GFP_xxx ones).
>
> So while the patch itself isn't actually all that much smaller than usual,
> at least my personal gut feel is that the actual changes are not as
> intrusive, just in some cases have big diffs.
>
> But both the diffstat and the shortlog are still too big to fit in the
> kernel mailing list limits, so you'll just have to take my word for it. Or
> get the git repo, and do your own delving into things with
>
> git log v2.6.19..v2.6.20-rc1 | git shortlog
>
> There _are_ a few areas of note:
>
> - the aforementioned "workqueue" changes (where we still have some work
> to do to finalize the proper actions on all architectures: it's being
> somewhat discussed on the arch mailing lists, hopefully we'll have it
> all resolved by -rc2, and it doesn't really worry me)
>
> - lockless page cache (RCU lookups of radix trees)
>
> - kvm driver for all those crazy virtualization people to play with
>
> - networking updates (DCCP, address-family agnostic connection tracking
> in netfilter, sparse byte order annotations, yadda yadda)
>
> - HID layer separated out of the USB stuff (bluetooth apparently wants
> the HID stuff too)
>
> - tons and tons of driver (ftape removal, ATA, pcmcia, i2c,
> infiniband, dvb, networking..) and architecture updates (arm, mips,
> powerpc, sh)
>
Did I miss an alternate method of handling ftape devices, or are these
old beasts now unsupported? I occasionally have to be able to handle
that media, since the industrial device using ftape for control updates
cost more than a small house.

I can obviously keep an old slow machine to do the job, but I'd like to
know if I need to.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2006-12-15 17:20:33

by Alan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Fri, 15 Dec 2006 11:50:14 -0500
Bill Davidsen <[email protected]> wrote:
> Did I miss an alternate method of handling ftape devices, or are these
> old beasts now unsupported? I occasionally have to be able to handle
> that media, since the industrial device using ftape for control updates
> cost more than a small house.

Do you have hardware and the time to at least test cleanups ?

> I can obviously keep an old slow machine to do the job, but I'd like to
> know if I need to.

The assumption was that since in 2.6 it was so ancient and unloved that
nobody had even seen an ftape device this century. If it is still being
used and you can test cleanups then the removal should be reverted

Alan

2006-12-15 18:45:49

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] support HDIO_GET_IDENTITY in libata

On Thu, Dec 14, 2006 at 01:26:08PM -0700, Erik Andersen wrote:
> On Thu Dec 14, 2006 at 03:05:52PM -0500, Jeff Garzik wrote:
> > FWIW, libata generally follows a "implement it, if enough people care
> > about it" policy for the old HDIO_xxx ioctls.
>
> I personally care about HDIO_GET_IDENTITY and find it terribly
> useful to quickly find out about a drive. Perhaps enough other
> people care about this ioctl that it might make it into the official
> libata tree. Well tested with a number of months of use.

> --- orig/drivers/ata/libata-scsi.c
> +++ linux-2.6.18/drivers/ata/libata-scsi.c
> @@ -303,6 +303,172 @@
> return rc;
> }
>
> +static void ide_fixstring (u8 *s, const int bytecount)
> +{
> + u8 *p = s, *end = &s[bytecount & ~1]; /* bytecount must be even */
> +
> +#ifndef __BIG_ENDIAN
> +# ifdef __LITTLE_ENDIAN
> + /* convert from big-endian to host byte order */
> + for (p = end ; p != s;) {
> + unsigned short *pp = (unsigned short *) (p -= 2);
> + *pp = ntohs(*pp);
> + }
> +# else
> +# error "Please fix <asm/byteorder.h>"
> +# endif
> +#endif

Ugly. ntohs() will work on BE arches also.

> +static void ide_fix_driveid (struct hd_driveid *id)
> +{
> +#ifndef __LITTLE_ENDIAN
> +# ifdef __BIG_ENDIAN

Ditto.

> + int i;
> + u16 *stringcast;
> +
> + id->config = __le16_to_cpu(id->config);
> + id->cyls = __le16_to_cpu(id->cyls);
> + id->reserved2 = __le16_to_cpu(id->reserved2);
> + id->heads = __le16_to_cpu(id->heads);
> + id->track_bytes = __le16_to_cpu(id->track_bytes);
> + id->sector_bytes = __le16_to_cpu(id->sector_bytes);
> + id->sectors = __le16_to_cpu(id->sectors);
> + id->vendor0 = __le16_to_cpu(id->vendor0);
> + id->vendor1 = __le16_to_cpu(id->vendor1);
> + id->vendor2 = __le16_to_cpu(id->vendor2);
> + stringcast = (u16 *)&id->serial_no[0];
> + for (i = 0; i < (20/2); i++)
> + stringcast[i] = __le16_to_cpu(stringcast[i]);
> + id->buf_type = __le16_to_cpu(id->buf_type);
> + id->buf_size = __le16_to_cpu(id->buf_size);
> + id->ecc_bytes = __le16_to_cpu(id->ecc_bytes);
> + stringcast = (u16 *)&id->fw_rev[0];
> + for (i = 0; i < (8/2); i++)
> + stringcast[i] = __le16_to_cpu(stringcast[i]);
> + stringcast = (u16 *)&id->model[0];
> + for (i = 0; i < (40/2); i++)
> + stringcast[i] = __le16_to_cpu(stringcast[i]);
> + id->dword_io = __le16_to_cpu(id->dword_io);
> + id->reserved50 = __le16_to_cpu(id->reserved50);
> + id->field_valid = __le16_to_cpu(id->field_valid);
> + id->cur_cyls = __le16_to_cpu(id->cur_cyls);
> + id->cur_heads = __le16_to_cpu(id->cur_heads);
> + id->cur_sectors = __le16_to_cpu(id->cur_sectors);
> + id->cur_capacity0 = __le16_to_cpu(id->cur_capacity0);
> + id->cur_capacity1 = __le16_to_cpu(id->cur_capacity1);
> + id->lba_capacity = __le32_to_cpu(id->lba_capacity);
> + id->dma_1word = __le16_to_cpu(id->dma_1word);
> + id->dma_mword = __le16_to_cpu(id->dma_mword);
> + id->eide_pio_modes = __le16_to_cpu(id->eide_pio_modes);
> + id->eide_dma_min = __le16_to_cpu(id->eide_dma_min);
> + id->eide_dma_time = __le16_to_cpu(id->eide_dma_time);
> + id->eide_pio = __le16_to_cpu(id->eide_pio);
> + id->eide_pio_iordy = __le16_to_cpu(id->eide_pio_iordy);
> + for (i = 0; i < 2; ++i)
> + id->words69_70[i] = __le16_to_cpu(id->words69_70[i]);
> + for (i = 0; i < 4; ++i)
> + id->words71_74[i] = __le16_to_cpu(id->words71_74[i]);
> + id->queue_depth = __le16_to_cpu(id->queue_depth);
> + for (i = 0; i < 4; ++i)
> + id->words76_79[i] = __le16_to_cpu(id->words76_79[i]);
> + id->major_rev_num = __le16_to_cpu(id->major_rev_num);
> + id->minor_rev_num = __le16_to_cpu(id->minor_rev_num);
> + id->command_set_1 = __le16_to_cpu(id->command_set_1);
> + id->command_set_2 = __le16_to_cpu(id->command_set_2);
> + id->cfsse = __le16_to_cpu(id->cfsse);
> + id->cfs_enable_1 = __le16_to_cpu(id->cfs_enable_1);
> + id->cfs_enable_2 = __le16_to_cpu(id->cfs_enable_2);
> + id->csf_default = __le16_to_cpu(id->csf_default);
> + id->dma_ultra = __le16_to_cpu(id->dma_ultra);
> + id->trseuc = __le16_to_cpu(id->trseuc);
> + id->trsEuc = __le16_to_cpu(id->trsEuc);
> + id->CurAPMvalues = __le16_to_cpu(id->CurAPMvalues);
> + id->mprc = __le16_to_cpu(id->mprc);
> + id->hw_config = __le16_to_cpu(id->hw_config);
> + id->acoustic = __le16_to_cpu(id->acoustic);
> + id->msrqs = __le16_to_cpu(id->msrqs);
> + id->sxfert = __le16_to_cpu(id->sxfert);
> + id->sal = __le16_to_cpu(id->sal);
> + id->spg = __le32_to_cpu(id->spg);
> + id->lba_capacity_2 = __le64_to_cpu(id->lba_capacity_2);
> + for (i = 0; i < 22; i++)
> + id->words104_125[i] = __le16_to_cpu(id->words104_125[i]);
> + id->last_lun = __le16_to_cpu(id->last_lun);
> + id->word127 = __le16_to_cpu(id->word127);
> + id->dlf = __le16_to_cpu(id->dlf);
> + id->csfo = __le16_to_cpu(id->csfo);
> + for (i = 0; i < 26; i++)
> + id->words130_155[i] = __le16_to_cpu(id->words130_155[i]);
> + id->word156 = __le16_to_cpu(id->word156);
> + for (i = 0; i < 3; i++)
> + id->words157_159[i] = __le16_to_cpu(id->words157_159[i]);
> + id->cfa_power = __le16_to_cpu(id->cfa_power);
> + for (i = 0; i < 14; i++)
> + id->words161_175[i] = __le16_to_cpu(id->words161_175[i]);
> + for (i = 0; i < 31; i++)
> + id->words176_205[i] = __le16_to_cpu(id->words176_205[i]);
> + for (i = 0; i < 48; i++)
> + id->words206_254[i] = __le16_to_cpu(id->words206_254[i]);
> + id->integrity_word = __le16_to_cpu(id->integrity_word);
> +# else
> +# error "Please fix <asm/byteorder.h>"
> +# endif
> +#endif

2006-12-16 16:34:10

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] support HDIO_GET_IDENTITY in libata

Erik Andersen wrote:
> On Thu Dec 14, 2006 at 03:31:30PM -0500, Jeff Garzik wrote:
>> Erik Andersen wrote:
>>> + if (!atapi_enabled && dev->class == ATA_DEV_ATAPI) {
>> This seems like an impossible condition?
>
> Hmm, suppose so. Do you think that simply doing:
> if (dev->class == ATA_DEV_ATAPI) {
> here would be sufficient?

Yep. There is also a selection of helpers in include/linux/libata.h
starting with ata_class_enabled() that might be useful.

Jeff



2006-12-16 16:34:49

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] support HDIO_GET_IDENTITY in libata

Alexey Dobriyan wrote:
> On Thu, Dec 14, 2006 at 01:26:08PM -0700, Erik Andersen wrote:
>> On Thu Dec 14, 2006 at 03:05:52PM -0500, Jeff Garzik wrote:
>>> FWIW, libata generally follows a "implement it, if enough people care
>>> about it" policy for the old HDIO_xxx ioctls.
>> I personally care about HDIO_GET_IDENTITY and find it terribly
>> useful to quickly find out about a drive. Perhaps enough other
>> people care about this ioctl that it might make it into the official
>> libata tree. Well tested with a number of months of use.
>
>> --- orig/drivers/ata/libata-scsi.c
>> +++ linux-2.6.18/drivers/ata/libata-scsi.c
>> @@ -303,6 +303,172 @@
>> return rc;
>> }
>>
>> +static void ide_fixstring (u8 *s, const int bytecount)
>> +{
>> + u8 *p = s, *end = &s[bytecount & ~1]; /* bytecount must be even */
>> +
>> +#ifndef __BIG_ENDIAN
>> +# ifdef __LITTLE_ENDIAN
>> + /* convert from big-endian to host byte order */
>> + for (p = end ; p != s;) {
>> + unsigned short *pp = (unsigned short *) (p -= 2);
>> + *pp = ntohs(*pp);
>> + }
>> +# else
>> +# error "Please fix <asm/byteorder.h>"
>> +# endif
>> +#endif
>
> Ugly. ntohs() will work on BE arches also.
>
>> +static void ide_fix_driveid (struct hd_driveid *id)
>> +{
>> +#ifndef __LITTLE_ENDIAN
>> +# ifdef __BIG_ENDIAN
>
> Ditto.

Agreed...

Jeff



2006-12-16 21:36:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1



On Fri, 15 Dec 2006, Alistair John Strachan wrote:
>
> In total isolation, v2.6.19..0e75f9063f5c55fb0b0b546a7c356f8ec186825e it
> breaks. Reverting just 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, it works
> again.
>
> So I think this is the source, but I can't explain why it "goes away" before
> git1 and "comes back" before 2.6.20-rc1.

Can you see if the kernel state at commit 77d172ce ("[PATCH] fix SG_IO bio
leak") is good? Ie just do something like

git checkout -b test-branch 77d172ce

and compile and test that?

That commit _should_ be the one that fixed whatever problems that commit
0e75f906 introduced. It *did* fix it for other - somewhat similar -
situations.

That said: Jens - I think 0e75f906 was a mistake. "blk_rq_unmap()" really
should be passed the "struct bio", not the "struct request *". Right now
it does something _really_ strange with requests with linked bio's, and I
don't think your and FUJITA's "leak fix" really works. What happens when
the bio was a linked list on the request, and you put the old _head_ on
the request with "rq->bio = bio"? What happens to the other parts of it?

IOW, I think this is broken. I think we should revert 0e75f906. Or at
least you should explain to me why it's not broken, and why clearly people
(eg Alistair) still see problems with it?

Linus

2006-12-16 22:27:52

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Saturday 16 December 2006 21:36, Linus Torvalds wrote:
> On Fri, 15 Dec 2006, Alistair John Strachan wrote:
> > In total isolation, v2.6.19..0e75f9063f5c55fb0b0b546a7c356f8ec186825e it
> > breaks. Reverting just 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, it works
> > again.
> >
> > So I think this is the source, but I can't explain why it "goes away"
> > before git1 and "comes back" before 2.6.20-rc1.
>
> Can you see if the kernel state at commit 77d172ce ("[PATCH] fix SG_IO bio
> leak") is good? Ie just do something like
>
> git checkout -b test-branch 77d172ce
>
> and compile and test that?
>
> That commit _should_ be the one that fixed whatever problems that commit
> 0e75f906 introduced. It *did* fix it for other - somewhat similar -
> situations.
>
> That said: Jens - I think 0e75f906 was a mistake. "blk_rq_unmap()" really
> should be passed the "struct bio", not the "struct request *". Right now
> it does something _really_ strange with requests with linked bio's, and I
> don't think your and FUJITA's "leak fix" really works. What happens when
> the bio was a linked list on the request, and you put the old _head_ on
> the request with "rq->bio = bio"? What happens to the other parts of it?
>
> IOW, I think this is broken. I think we should revert 0e75f906. Or at
> least you should explain to me why it's not broken, and why clearly people
> (eg Alistair) still see problems with it?

It could simply be that bisect isn't working here because it's actually broken
by two separate patches. Out of bad luck, I've ended up singling out the one
that already has a "fix", and the "real bug" hasn't been found.

I guess I should repeat the bisection, and when the bio-fix isn't applied,
manually apply it? Is there some magical Git way to do this?

Here's the bisection log, for what it's worth;

[alistair] 22:27 [~/linux-git] git bisect log
git-bisect start
# bad: [cc016448b0bf0764928275d034e367753bde8162] Linux v2.6.20-rc1
git-bisect bad cc016448b0bf0764928275d034e367753bde8162
# good: [c3fe6924620fd733ffe8bc8a9da1e9cde08402b3] Linux 2.6.19
git-bisect good c3fe6924620fd733ffe8bc8a9da1e9cde08402b3
# bad: [b2c03941b50944a268ee4d5823872f220809a3ba] IPMI: Allow hot system interface remove
git-bisect bad b2c03941b50944a268ee4d5823872f220809a3ba
# bad: [29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32] [S390] pfault code cleanup.
git-bisect bad 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32
# bad: [0e11c91e1e912bc4db5b71607d149e7e9a77e756] [AF_PACKET]: annotate
git-bisect bad 0e11c91e1e912bc4db5b71607d149e7e9a77e756
# bad: [5f56bbdf1e35d41b4b3d4c92bdb3e70c63877e4d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
git-bisect bad 5f56bbdf1e35d41b4b3d4c92bdb3e70c63877e4d
# good: [94fcda1f8ab5e0cacc381c5ca1cc9aa6ad523576] usbcore: remove unused argument in autosuspend
git-bisect good 94fcda1f8ab5e0cacc381c5ca1cc9aa6ad523576
# bad: [4549df891a31b9a05b7d183106c09049b79327be] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 4549df891a31b9a05b7d183106c09049b79327be
# good: [5ab699810d46011ad2195c5916f3cbc684bfe3ee] driver core: Introduce device_find_child().
git-bisect good 5ab699810d46011ad2195c5916f3cbc684bfe3ee
# good: [6b44d4e69c6144d0df71ab47ec90d2009237d48f] Fix typos in drivers/isdn/hisax/isdnl2.c
git-bisect good 6b44d4e69c6144d0df71ab47ec90d2009237d48f
# bad: [6b8cc71ab2619a776b02869fd733ac1ead3db4e8] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
git-bisect bad 6b8cc71ab2619a776b02869fd733ac1ead3db4e8
# bad: [bb37b94c68e7b37eecea8576039ae9396ca07839] [BLOCK] Cleanup unused variable passing
git-bisect bad bb37b94c68e7b37eecea8576039ae9396ca07839
# good: [ad2d7225709b11da47e092634cbdf0591829ae9c] block: kill length alignment test in bio_map_user()
git-bisect good ad2d7225709b11da47e092634cbdf0591829ae9c
# bad: [0e75f9063f5c55fb0b0b546a7c356f8ec186825e] block: support larger block pc requests
git-bisect bad 0e75f9063f5c55fb0b0b546a7c356f8ec186825e

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-16 22:31:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alistair John Strachan wrote:
> It could simply be that bisect isn't working here because it's actually broken
> by two separate patches. Out of bad luck, I've ended up singling out the one
> that already has a "fix", and the "real bug" hasn't been found.
>
> I guess I should repeat the bisection, and when the bio-fix isn't applied,
> manually apply it? Is there some magical Git way to do this?
>
> Here's the bisection log, for what it's worth;


This may be totally subjective on my part, but if I get stuck (and I
have the patience) I sometimes like to look at the log and try to pick
arbitrary 'good' points, to stir the pot a bit.

Jeff


2006-12-16 23:00:07

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Saturday 16 December 2006 21:36, Linus Torvalds wrote:
> On Fri, 15 Dec 2006, Alistair John Strachan wrote:
> > In total isolation, v2.6.19..0e75f9063f5c55fb0b0b546a7c356f8ec186825e it
> > breaks. Reverting just 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, it works
> > again.
> >
> > So I think this is the source, but I can't explain why it "goes away"
> > before git1 and "comes back" before 2.6.20-rc1.
>
> Can you see if the kernel state at commit 77d172ce ("[PATCH] fix SG_IO bio
> leak") is good? Ie just do something like
>
> git checkout -b test-branch 77d172ce
>
> and compile and test that?

As predicted, it still doesn't work.

[alistair] 22:59 [~/linux-git] git-status
# On branch refs/heads/test-branch
nothing to commit

[root] 22:59 [~] hddtemp /dev/sda
/dev/sda: ATA WDC WD2500KS-00M: S.M.A.R.T. not available

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-12-17 18:11:43

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Thursday 14 December 2006 22:17, Gene Heskett wrote:
>On Thursday 14 December 2006 12:48, Stefan Richter wrote:
>[...]
>
>>(Anyway, that's unrelated to Gene's issues.)
>
>And which I haven't had a chance to check yet, the camera is still in
> the truck and I've been busier than a one legged man in an ass kicking
> contest today. I did get 2.6.20-rc1 built and its whats running, but
> that is as far as I got, too many other honeydo's. Tomorrow hopefully.
> If I don't wind up using a backhoe for a divining rod, looking for our
> sewer which is beginning to nag us occasionally.

Stefan, I did get a chance to try it out just now, and while I didn't try
to capture a 2 hour movie, I did use kino to control the camera playback,
rewind etc stuff for about 10 minutes and had no problems whatsoever.
Kino I believe, can only step forward and backwards one frame at a time
in its edit window as nothing happened when I tried those direct to the
camera while it was paused. I even went back and played spastic monkey
on the controls, but was unable to trigger a segfault exit or any other
kind of an error. The only entry in the messages log for all this was:

Dec 17 12:47:13 coyote kernel: WARNING: The dv1394 driver is unsupported
and will be removed from Linux soon. Use raw1394 instead.

Dec 17 12:47:13 coyote kernel: ieee1394: raw1394: /dev/raw1394 device
initialized

So whatever was done to the ieee1394 stuffs between 2.6.19 and 2.6.20-rc1
was a definite, and much appreciated improvement, many thanks to all
concerned, I probably owe somebody a pint (or more).

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-17 18:31:57

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

Gene Heskett wrote:
...
> while I didn't try
> to capture a 2 hour movie, I did use kino to control the camera playback,
> rewind etc stuff for about 10 minutes and had no problems whatsoever.
...
> The only entry in the messages log for all this was:
>
> Dec 17 12:47:13 coyote kernel: WARNING: The dv1394 driver is unsupported
> and will be removed from Linux soon. Use raw1394 instead.

Is your version of kino still using dv1394 or does it work without
dv1394 loaded too?

> Dec 17 12:47:13 coyote kernel: ieee1394: raw1394: /dev/raw1394 device
> initialized
>
> So whatever was done to the ieee1394 stuffs between 2.6.19 and 2.6.20-rc1
> was a definite, and much appreciated improvement,
...

>From what went into linux/drivers/ieee1394 after 2.6.19 was released,
I'm puzzled how this happened. :-)
--
Stefan Richter
-=====-=-==- ==-- =---=
http://arcgraph.de/sr/

2006-12-17 19:05:24

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Sunday 17 December 2006 13:31, Stefan Richter wrote:
>Gene Heskett wrote:
>...
>
>> while I didn't try
>> to capture a 2 hour movie, I did use kino to control the camera
>> playback, rewind etc stuff for about 10 minutes and had no problems
>> whatsoever.
>
>...
>
>> The only entry in the messages log for all this was:
>>
>> Dec 17 12:47:13 coyote kernel: WARNING: The dv1394 driver is
>> unsupported and will be removed from Linux soon. Use raw1394 instead.
>
>Is your version of kino still using dv1394 or does it work without
>dv1394 loaded too?
>
AFAIK, its kino is 0.9.3. Ok, while kino is running I did a modprobe -rv
dc1394 and it removed dv1394.ko and ohci1394.ko

Now when going to the capture screen is says the device /dev/raw1394 is
gone or the kernel module isn't loaded. raw1394 is still resident
according to an lsmod, so I'm puzzled as to why the device was removed
when the module is still present. So I modprobe -rv raw1394 and it took
out raw1394 and ieee1394, then reloaded raw1394 which loaded ieee1394.
And kino is still without controls but does not now report the error at
the bottom of its screen. Quitting it and restarting it restores the
error. Then loading dv1394 also brings in ohci1394, and a recycle out and
back to the capture window restored the controls.

So I removed dv1394, and reloaded only ohci1384, but an attempt to change
screens in kino locked the box up tight, not even the x-clock in the
right corner was running and I had to hard reset it.

With 4 modules to play with and 4!=15, its the lack of the dv1394 module
that isn't exactly friendly, the other combo's just kill the controls.

AIUI, it *should* be ohci1394 handling the controls stuff, so this
represents a bit of a puzzle. Or the AIUI is wrong :)

>> Dec 17 12:47:13 coyote kernel: ieee1394: raw1394: /dev/raw1394 device
>> initialized
>>
>> So whatever was done to the ieee1394 stuffs between 2.6.19 and
>> 2.6.20-rc1 was a definite, and much appreciated improvement,
>
>...
>
>>From what went into linux/drivers/ieee1394 after 2.6.19 was released,
>I'm puzzled how this happened. :-)

Me too, but the whole combination does work. Remove only dv1394=locked up
machine & hardware reset.

I hope this is helpfull to the next step in your quest to get rid of
dv1394.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-17 20:22:01

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

(Cc linux1394-devel, for the record)

Gene Heskett wrote:
> On Sunday 17 December 2006 13:31, Stefan Richter wrote:
>> Is your version of kino still using dv1394 or does it work without
>> dv1394 loaded too?
>>
> AFAIK, its kino is 0.9.3. Ok, while kino is running I did a modprobe -rv
> dc1394 and it removed dv1394.ko and ohci1394.ko

When dv1394 is loaded, the use count of ohci1394 is incremented because
dv1394 uses symbols of ohci1394. Likewise, when dv1394 is removed,
ohci1394's use count is decremented which may render ohci1394 "unused".
It will then be removed too. This is of course not desired in many cases.

> Now when going to the capture screen is says the device /dev/raw1394 is
> gone or the kernel module isn't loaded. raw1394 is still resident
> according to an lsmod, so I'm puzzled as to why the device was removed
> when the module is still present.

raw1394 does not mark FireWire hosts as in use. I think we should add
try_module_get(host->driver->owner) and module_put(host->driver->owner)
either to initiation and release of requests, or to the raw1394_open()
and raw1394_release() hooks for all hosts which are present when the
file is opened, or simply to raw1394's add_host() and remove_host() hooks.

I'll make a note on bugzilla.kernel.org and try to find a proper fix
some time.

> So I modprobe -rv raw1394 and it took
> out raw1394 and ieee1394, then reloaded raw1394 which loaded ieee1394.
> And kino is still without controls but does not now report the error at
> the bottom of its screen. Quitting it and restarting it restores the
> error. Then loading dv1394 also brings in ohci1394, and a recycle out and
> back to the capture window restored the controls.
>
> So I removed dv1394, and reloaded only ohci1384,

That's what was necessary after you removed dv1394.

> but an attempt to change
> screens in kino locked the box up tight, not even the x-clock in the
> right corner was running and I had to hard reset it.

This is, how should I put it, not good. Maybe raw1394 still held
resources on behalf of kino which related to the host that first
vanished, then reappeared as a new one. Anyway; raw1394 is supposed to
handle life addition and removal of hosts, but we obviously missed
something.

> With 4 modules to play with and 4!=15, its the lack of the dv1394 module
> that isn't exactly friendly, the other combo's just kill the controls.
>
> AIUI, it *should* be ohci1394 handling the controls stuff, so this
> represents a bit of a puzzle. Or the AIUI is wrong :)
...

What if you prevent dv1394 from ever being loaded, or don't build it in
the first place? CONFIG_IEEE1394_DV1394=n
--
Stefan Richter
-=====-=-==- ==-- =---=
http://arcgraph.de/sr/

2006-12-17 23:35:11

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Sunday 17 December 2006 15:21, Stefan Richter wrote:
[...]
>What if you prevent dv1394 from ever being loaded, or don't build it in
>the first place? CONFIG_IEEE1394_DV1394=n

How about '# CONFIG_IEEE1394_DV1394 is not set'?

Hand edited the .config and fired off my makeit script, which does it all
& rebooted.

[root@coyote ~]# lsmod|grep 1394
raw1394 32264 0
ohci1394 39088 0
ieee1394 305624 2 raw1394,ohci1394

The camera has been turned back off, but yes, it works absolutely normally
now. With no dv1394 in memory!

Then with the camera on and kino controlling it:
[root@coyote ~]# lsmod|grep 1394
raw1394 32264 4
ohci1394 39088 0
ieee1394 305624 2 raw1394,ohci1394

So we still don't appear to have any use of/for ohci1394. What the heck
is it supposed to be doing?

Now, do I need dv1394 to do the export if I were to want to re-write the
edited video back to the tape in the camera? This is something I believe
I'm supposed to be able to do, but never have yet as I consider the tape
a capture only medium due to incompatibilities with the vcr Joe Sixpack
has. Just about everyone has a dvd player these days, so that is what I
export to.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-18 01:06:09

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

Gene Heskett wrote:
> The camera has been turned back off, but yes, it works absolutely normally
> now. With no dv1394 in memory!
>
> Then with the camera on and kino controlling it:
> [root@coyote ~]# lsmod|grep 1394
> raw1394 32264 4
> ohci1394 39088 0
> ieee1394 305624 2 raw1394,ohci1394
>
> So we still don't appear to have any use of/for ohci1394. What the heck
> is it supposed to be doing?

What's missing in our implementation is that the use count of ohci1394
goes up too once a "high-level driver" uses resources of a host driven
by ohci1394.

The FireWire stack has three layers: Low level (ohci1394 and pcilynx;
control the host bus adapter), mid level (the ieee1394 core) and high
level (protocol drivers: dv1394, eth1394, sbp2, video1394; and the
multi-purpose bridge to userspace: raw1394). The mid level is supposed
to isolate high-level drivers from hardware-specific drivers.

However dv1394 and video1394 break this architecture. Parts of them
access ohci1394 directly. And in practice, sbp2 indirectly breaks this
architecture too because it never got decent DMA mapping implemented,
and what it got in this department bitrotted to a degree that it is
currently not really usable with pcilynx. (AFAIK, I don't have a PCILynx
controller.)

BTW, sbp2 had the same problem with missing use count of ohci1394 too
until Linux < 2.6.17. But it's a little bit harder to get right in raw1394.

> Now, do I need dv1394 to do the export if I were to want to re-write the
> edited video back to the tape in the camera?

I suppose Kino is exporting DV via raw1394 nowadays too. Raw1394
definitely has the means for it.

Anyway, I'm glad it sort of works for you now.
--
Stefan Richter
-=====-=-==- ==-- =--=-
http://arcgraph.de/sr/

2006-12-18 04:29:36

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Sunday 17 December 2006 20:05, Stefan Richter wrote:
>Gene Heskett wrote:
>> The camera has been turned back off, but yes, it works absolutely
>> normally now. With no dv1394 in memory!
>>
>> Then with the camera on and kino controlling it:
>> [root@coyote ~]# lsmod|grep 1394
>> raw1394 32264 4
>> ohci1394 39088 0
>> ieee1394 305624 2 raw1394,ohci1394
>>
>> So we still don't appear to have any use of/for ohci1394. What the
>> heck is it supposed to be doing?
>
>What's missing in our implementation is that the use count of ohci1394
>goes up too once a "high-level driver" uses resources of a host driven
>by ohci1394.

This needs some tlc then I assume?

>The FireWire stack has three layers: Low level (ohci1394 and pcilynx;
>control the host bus adapter),
The hardware
>mid level (the ieee1394 core)
which I assume (fuggly word) steers the high level stuff to the right
entry points in the hardware handler?
>and high level (protocol drivers: dv1394, eth1394, sbp2, video1394; and
the
>multi-purpose bridge to userspace: raw1394). The mid level is supposed
>to isolate high-level drivers from hardware-specific drivers.

ieee1394.ko

>However dv1394 and video1394 break this architecture. Parts of them
>access ohci1394 directly.

Opps. Bad dog, no bisquit.

>And in practice, sbp2 indirectly breaks this
>architecture too because it never got decent DMA mapping implemented,
>and what it got in this department bitrotted to a degree that it is
>currently not really usable with pcilynx. (AFAIK, I don't have a PCILynx
>controller.)

I did, but junked it when it couldn't be made to talk to this camera. It
was an old card I'd bought years ago, came in a Maxtor box IIRC. I could
probably dig it up (I'm an inveterate packrat) and see if I could find an
empty pci slot, but them are precious items in most systems, and test it,
or maybe even fwd it to someone capable of investigating it. If its
worth the effort, I have NDI how many of them there are out there. I
chose to throw money at the problem, it wasn't much, $30 USD IIRC at
WallyWorld, blisterpacked with Belkins logo on it.

>BTW, sbp2 had the same problem with missing use count of ohci1394 too
>until Linux < 2.6.17. But it's a little bit harder to get right in
> raw1394.

Humm, architectural? I wouldn't know right from wrong as my C skills have
rusted away over the last 15 years something awful.

>> Now, do I need dv1394 to do the export if I were to want to re-write
>> the edited video back to the tape in the camera?
>
>I suppose Kino is exporting DV via raw1394 nowadays too. Raw1394
>definitely has the means for it.

That got my curiosity bump itching, so I loaded about the first 5 minutes
of the last wedding into kino from disk, clipped off some junk video at
the begining, and this does use raw1394 to export back to the camera.
Previewed it spends some time preprocessing it, about 33 seconds I'd
guess, and then it shows on the camera's viewfinder. So then I did the
real export, and here it appears kino has a buglet that Dan should be
advised of. It starts the tape rolling while its doing this preparatory
stuff, so its recording dead tape (I don't think its outputting black,
but no data at all) for about 33 seconds after the camera's own preroll
delay of about 3 seconds to load the tape around the drum. Other than
that, it works just fine using raw1394 to move it back to the camera and
hence to the tape.

But, the export function screen has 5 more ways to do it. Next tab to the
right is DV file, with 4 methods available there,
DV avi type 1
DV avi type 2
OpenDML avi
raw DV, which I apparently have used. There is a 5th choice, quicktime,
but its greyed out, presumably because I don't have that library
installed.
Next tab is Stills, with a jpeg quality slider
Next tab is audio with resampling at 3 speeds and encoding with 5 choices
Next tab is mpeg with a plethora of choices for output
Last tab is Other, currently set for std VOB and FFMPEG but has 15 choices
in file format, and 4 in the profile selector.

All told, its a Swiss Army Knife that really needs better docs rather than
doing something, burning it to dvd or svcd and seeing it it will play on
the target friends machinery. You can make a lot of coasters that way. I
just found an Emerson TV with a builtin dvd that had no idea what to do
with an svcd all my stuff played just fine. This latter isn't a kernel
problem of course.

>Anyway, I'm glad it sort of works for you now.

Yes, rather nicely except for the exports start recording dead time, a
kino problem obviously.

Anyway, I'm a happy camper till the next time, but if by the time Dan does
kino-1.0, that particular buglet isn't fixed, I'll still be happy as its
otherwise a heck of a program. It fills a niche gap that none of the
other video processing programs even try to do.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-18 15:45:07

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

Gene Heskett wrote:
> On Sunday 17 December 2006 20:05, Stefan Richter wrote:
>>What's missing in our implementation is that the use count of ohci1394
>>goes up too once a "high-level driver" uses resources of a host driven
>>by ohci1394.
>
> This needs some tlc then I assume?

Yes. It's now logged at http://bugzilla.kernel.org/show_bug.cgi?id=7701
and will probably stay there until I do something about it myself.

>>The FireWire stack has three layers: Low level (ohci1394 and pcilynx;
>>control the host bus adapter),
> The hardware

Yes, or more precisely the built-in hardware, not hardware at the other
end of the wire.

>>mid level (the ieee1394 core)
> which I assume (fuggly word) steers the high level stuff to the right
> entry points in the hardware handler?

Yes. It implements common management functionality and makes actions
like "write into a register of a remote node" or "receive a stream into
a buffer" independent of the actual host adapter --- or at least that
was the intent when Linux' FireWire stack was designed. Years ago there
was actually another driver for a non-OHCI host adapter chip from
Adaptec; and there is a mostly functional but unmaintained out-of-tree
driver for a non-OHCI/ non-PCI adapter from Texas Instruments (TI GP2Lynx).

IOW the ieee1394 core provides a platform to stick application-level
drivers (protocol drivers) like for DV, IPv4 networking, SBP-2 storage
on top of it without having to care of how particular host adapter chips
are programmed. raw1394 basically extends ieee1394 to stick userspace
drivers on it.

But as mentioned, this layering is partly violated in the actual
implementation. Also, the ieee1394 core is itself needlessly dependent
on a PCI kernel API, making it harder for embedded developers to add
their own low-level drivers than it ought to be. (So I was told; I
actually rarely hear from embedded development projects.)
--
Stefan Richter
-=====-=-==- ==-- =--=-
http://arcgraph.de/sr/

2006-12-18 15:55:11

by Gene Heskett

[permalink] [raw]
Subject: Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

On Monday 18 December 2006 10:45, Stefan Richter wrote:
>Gene Heskett wrote:
>> On Sunday 17 December 2006 20:05, Stefan Richter wrote:
>>>What's missing in our implementation is that the use count of ohci1394
>>>goes up too once a "high-level driver" uses resources of a host driven
>>>by ohci1394.
>>
>> This needs some tlc then I assume?
>
>Yes. It's now logged at http://bugzilla.kernel.org/show_bug.cgi?id=7701
>and will probably stay there until I do something about it myself.

Thanks, that pretty well describes what happened here.

>>>The FireWire stack has three layers: Low level (ohci1394 and pcilynx;
>>>control the host bus adapter),
>>
>> The hardware
>
>Yes, or more precisely the built-in hardware, not hardware at the other
>end of the wire.

That was assumed when I wrote it :)

>>>mid level (the ieee1394 core)
>>
>> which I assume (fuggly word) steers the high level stuff to the right
>> entry points in the hardware handler?
>
>Yes. It implements common management functionality and makes actions
>like "write into a register of a remote node" or "receive a stream into
>a buffer" independent of the actual host adapter --- or at least that
>was the intent when Linux' FireWire stack was designed. Years ago there
>was actually another driver for a non-OHCI host adapter chip from
>Adaptec; and there is a mostly functional but unmaintained out-of-tree
>driver for a non-OHCI/ non-PCI adapter from Texas Instruments (TI
> GP2Lynx).
>
>IOW the ieee1394 core provides a platform to stick application-level
>drivers (protocol drivers) like for DV, IPv4 networking, SBP-2 storage
>on top of it without having to care of how particular host adapter chips
>are programmed. raw1394 basically extends ieee1394 to stick userspace
>drivers on it.
>
>But as mentioned, this layering is partly violated in the actual
>implementation. Also, the ieee1394 core is itself needlessly dependent
>on a PCI kernel API, making it harder for embedded developers to add
>their own low-level drivers than it ought to be. (So I was told; I
>actually rarely hear from embedded development projects.)

Typical :-) Don't let me distract you from more important work :)

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-12-18 18:30:30

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Sat, Dec 16 2006, Linus Torvalds wrote:
> That said: Jens - I think 0e75f906 was a mistake. "blk_rq_unmap()" really
> should be passed the "struct bio", not the "struct request *". Right now
> it does something _really_ strange with requests with linked bio's, and I
> don't think your and FUJITA's "leak fix" really works. What happens when
> the bio was a linked list on the request, and you put the old _head_ on
> the request with "rq->bio = bio"? What happens to the other parts of it?

I agree it's fishy and I did think about it. The design isn't exactly
the prettiest, but it should be safe. The reason is that we don't
actually unlink the individual bio from the list, even if we may set
rq->bio to point somewhere further into the list. So as long as the bio
is valid, the bi_next field is still valid as well. We need a reference
on the bio to perform the unmap and blk_rq_unmap_user() drops this
reference on its own, so the bio must be valid.

Taking a rq pointer when we really want a bio is nasty, though. I'll
chance that at least.

> IOW, I think this is broken. I think we should revert 0e75f906. Or at
> least you should explain to me why it's not broken, and why clearly people
> (eg Alistair) still see problems with it?

I'm not so sure it's that patch, the same problem seemed to exist for
some people prior to 2.6.20.

--
Jens Axboe

2006-12-18 18:39:38

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Mon, Dec 18 2006, Jens Axboe wrote:
> On Sat, Dec 16 2006, Linus Torvalds wrote:
> > That said: Jens - I think 0e75f906 was a mistake. "blk_rq_unmap()" really
> > should be passed the "struct bio", not the "struct request *". Right now
> > it does something _really_ strange with requests with linked bio's, and I
> > don't think your and FUJITA's "leak fix" really works. What happens when
> > the bio was a linked list on the request, and you put the old _head_ on
> > the request with "rq->bio = bio"? What happens to the other parts of it?
>
> I agree it's fishy and I did think about it. The design isn't exactly
> the prettiest, but it should be safe. The reason is that we don't
> actually unlink the individual bio from the list, even if we may set
> rq->bio to point somewhere further into the list. So as long as the bio
> is valid, the bi_next field is still valid as well. We need a reference
> on the bio to perform the unmap and blk_rq_unmap_user() drops this
> reference on its own, so the bio must be valid.
>
> Taking a rq pointer when we really want a bio is nasty, though. I'll
> chance that at least.

Something like this. One alternative I did originally consider was to
add a rq->cbio (for lack of a better name) that points to the original
bio and isn't cleared on io completion, just to have the original bio
location stored. That makes the API symmetric and doesn't have any
hidden requirements, at the cost of an extra pointer in struct request.
So I think the included is the better patch, since it's still clear what
needs to be done and it doesn't add extra members to struct request.

-----

The blk_rq_unmap_user() API is not very nice. It expects the caller to
know that rq->bio has to be reset to the original bio, and it will
silently do nothing if that is not done. Instead make it explicit that
we need to pass in the first bio, by expecting a bio argument.

Signed-off-by: Jens Axboe <[email protected]>

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index fb40575..51c828e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2323,6 +2323,7 @@ int blk_rq_map_user(request_queue_t *q, struct request *rq, void __user *ubuf,
unsigned long len)
{
unsigned long bytes_read = 0;
+ struct bio *bio = NULL;
int ret;

if (len > (q->max_hw_sectors << 9))
@@ -2349,6 +2350,8 @@ int blk_rq_map_user(request_queue_t *q, struct request *rq, void __user *ubuf,
ret = __blk_rq_map_user(q, rq, ubuf, map_len);
if (ret < 0)
goto unmap_rq;
+ if (!bio)
+ bio = rq->bio;
bytes_read += ret;
ubuf += ret;
}
@@ -2356,7 +2359,7 @@ int blk_rq_map_user(request_queue_t *q, struct request *rq, void __user *ubuf,
rq->buffer = rq->data = NULL;
return 0;
unmap_rq:
- blk_rq_unmap_user(rq);
+ blk_rq_unmap_user(bio);
return ret;
}

@@ -2413,26 +2416,28 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);

/**
* blk_rq_unmap_user - unmap a request with user data
- * @rq: rq to be unmapped
+ * @bio: start of bio list
*
* Description:
- * Unmap a rq previously mapped by blk_rq_map_user().
- * rq->bio must be set to the original head of the request.
+ * Unmap a rq previously mapped by blk_rq_map_user(). The caller must
+ * supply the original rq->bio from the blk_rq_map_user() return, since
+ * the io completion may have changed rq->bio.
*/
-int blk_rq_unmap_user(struct request *rq)
+int blk_rq_unmap_user(struct bio *bio)
{
- struct bio *bio, *mapped_bio;
+ struct bio *mapped_bio;

- while ((bio = rq->bio)) {
- if (bio_flagged(bio, BIO_BOUNCED))
+ while (bio) {
+ mapped_bio = bio;
+ if (unlikely(bio_flagged(bio, BIO_BOUNCED)))
mapped_bio = bio->bi_private;
- else
- mapped_bio = bio;

__blk_rq_unmap_user(mapped_bio);
- rq->bio = bio->bi_next;
- bio_put(bio);
+ mapped_bio = bio;
+ bio = bio->bi_next;
+ bio_put(mapped_bio);
}
+
return 0;
}

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index f322b6a..2528a0c 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,8 +333,7 @@ static int sg_io(struct file *file, request_queue_t *q,
hdr->sb_len_wr = len;
}

- rq->bio = bio;
- if (blk_rq_unmap_user(rq))
+ if (blk_rq_unmap_user(bio))
ret = -EFAULT;

/* may not have succeeded, but output values written to control
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index e4a2f8f..66d028d 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2139,8 +2139,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
cdi->last_sense = s->sense_key;
}

- rq->bio = bio;
- if (blk_rq_unmap_user(rq))
+ if (blk_rq_unmap_user(bio))
ret = -EFAULT;

if (ret)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0a801cc..d93f8ea 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -710,7 +710,7 @@ extern void __blk_stop_queue(request_queue_t *q);
extern void blk_run_queue(request_queue_t *);
extern void blk_start_queueing(request_queue_t *);
extern int blk_rq_map_user(request_queue_t *, struct request *, void __user *, unsigned long);
-extern int blk_rq_unmap_user(struct request *);
+extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(request_queue_t *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(request_queue_t *, struct request *,
struct sg_iovec *, int, unsigned int);

--
Jens Axboe

2006-12-18 21:51:31

by Bill Davidsen

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Alan wrote:
> On Fri, 15 Dec 2006 11:50:14 -0500
> Bill Davidsen <[email protected]> wrote:
>
>> Did I miss an alternate method of handling ftape devices, or are these
>> old beasts now unsupported? I occasionally have to be able to handle
>> that media, since the industrial device using ftape for control updates
>> cost more than a small house.
>>
>
> Do you have hardware and the time to at least test cleanups ?
>
>
>> I can obviously keep an old slow machine to do the job, but I'd like to
>> know if I need to.
>>
>
> The assumption was that since in 2.6 it was so ancient and unloved that
> nobody had even seen an ftape device this century. If it is still being
> used and you can test cleanups then the removal should be reverted

As much as I have in the past supported keeping useful features in the
kernel, this one can go from 2.6 as far as I'm concerned. I would hate
to see anyone spend any time maintaining something which is so little
used. I can easily move the hardware to a 2.4 machine, or something
running an early 2.6.

I think "ancient and unloved" is an apt description.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2006-12-19 12:39:48

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Thu, Dec 14 2006, Alistair John Strachan wrote:
> On Thursday 14 December 2006 21:50, Jeff Garzik wrote:
> > Alistair John Strachan wrote:
> > > Before I proceed with the horrors of an -rc1 bisection, could somebody
> > > send me the ADMA patches so I can eliminate those first?
> >
> > Run
> >
> > git-whatchanged drivers/ata/sata_nv.c
> >
> > and that will give you a list of recent changes. To obtain the "diff
> > -u" patch for a single commit, run
> >
> > git-diff-tree -p $SHA_HASH > /tmp/patch
>
> I used a variation on this:
>
> git-whatchanged -p v2.6.19.. drivers/ata/sata_nv.c >sata_nv
>
> Reverted it (against v2.6.20-rc1), compiled that kernel, no dice.
>
> [root] 22:32 [~] hddtemp /dev/sda
> /dev/sda: ATA WDC WD2500KS-00M: S.M.A.R.T. not available
>
> I'll start the bisection.

Just noticed that most of the mails I wrote on this thread were
apparently without linux-kernel cc'ed (dunno who removed the cc). So
I'll write a small summary - the problem is that hddtemp includes some
fragile code to check the sense info, and this commit:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=f38621b3109068adc8430bc2d170ccea59df4261

broke it. hddtemp expects 14, but it now sees 12. IMHO hddtemp is buggy
and should be fixed, the best option is simply to kill the sense checks
as I think they have little (if any) value. Patch below for that.

So the problem was never the SG_IO changes, the fact that somebody
noticed the same thing in bugzilla for a 2.6.19-rc6-mm kernel backs that
up.

--- hddtemp-0.3-beta15/src/satacmds.c~ 2006-12-19 12:01:10.000000000 +0100
+++ hddtemp-0.3-beta15/src/satacmds.c 2006-12-19 12:01:27.000000000 +0100
@@ -54,7 +54,6 @@
unsigned char cdb[16];
unsigned char sense[32];
int dxfer_direction;
- int ret;

memset(cdb, 0, sizeof(cdb));
cdb[0] = ATA_16;
@@ -78,13 +77,7 @@
cdb[6] = cmd[1];
cdb[14] = cmd[0];

- ret = scsi_SG_IO(device, cdb, sizeof(cdb), buffer, cmd[3] * 512, sense, sizeof(sense), dxfer_direction);
-
- /* Verify SATA magics */
- if (sense[0] != 0x72 || sense[7] != 0x0e || sense[9] != 0x0e || sense[10] != 0x00)
- return 1;
- else
- return ret;
+ return scsi_SG_IO(device, cdb, sizeof(cdb), buffer, cmd[3] * 512, sense, sizeof(sense), dxfer_direction);
}

void sata_fixstring(unsigned char *s, int bytecount)

--
Jens Axboe

2006-12-19 14:32:18

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

Jens Axboe wrote:
> Just noticed that most of the mails I wrote on this thread were
> apparently without linux-kernel cc'ed (dunno who removed the cc). So
> I'll write a small summary - the problem is that hddtemp includes some
> fragile code to check the sense info, and this commit:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=f38621b3109068adc8430bc2d170ccea59df4261
>
> broke it. hddtemp expects 14, but it now sees 12. IMHO hddtemp is buggy
> and should be fixed, the best option is simply to kill the sense checks
> as I think they have little (if any) value. Patch below for that.
>
> So the problem was never the SG_IO changes, the fact that somebody
> noticed the same thing in bugzilla for a 2.6.19-rc6-mm kernel backs that
> up.

From what I've seen it appears that smartctl has the same problem, it
was also reporting the device didn't support SMART..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/


2006-12-19 14:36:33

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Tue, Dec 19 2006, Robert Hancock wrote:
> Jens Axboe wrote:
> >Just noticed that most of the mails I wrote on this thread were
> >apparently without linux-kernel cc'ed (dunno who removed the cc). So
> >I'll write a small summary - the problem is that hddtemp includes some
> >fragile code to check the sense info, and this commit:
> >
> >http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=f38621b3109068adc8430bc2d170ccea59df4261
> >
> >broke it. hddtemp expects 14, but it now sees 12. IMHO hddtemp is buggy
> >and should be fixed, the best option is simply to kill the sense checks
> >as I think they have little (if any) value. Patch below for that.
> >
> >So the problem was never the SG_IO changes, the fact that somebody
> >noticed the same thing in bugzilla for a 2.6.19-rc6-mm kernel backs that
> >up.
>
> From what I've seen it appears that smartctl has the same problem, it
> was also reporting the device didn't support SMART..

Can you check whether reverting the above commit makes SMART work again?

--
Jens Axboe

2006-12-19 14:49:05

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1

On Tue, Dec 19 2006, Jens Axboe wrote:
> On Tue, Dec 19 2006, Robert Hancock wrote:
> > Jens Axboe wrote:
> > >Just noticed that most of the mails I wrote on this thread were
> > >apparently without linux-kernel cc'ed (dunno who removed the cc). So
> > >I'll write a small summary - the problem is that hddtemp includes some
> > >fragile code to check the sense info, and this commit:
> > >
> > >http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=f38621b3109068adc8430bc2d170ccea59df4261
> > >
> > >broke it. hddtemp expects 14, but it now sees 12. IMHO hddtemp is buggy
> > >and should be fixed, the best option is simply to kill the sense checks
> > >as I think they have little (if any) value. Patch below for that.
> > >
> > >So the problem was never the SG_IO changes, the fact that somebody
> > >noticed the same thing in bugzilla for a 2.6.19-rc6-mm kernel backs that
> > >up.
> >
> > From what I've seen it appears that smartctl has the same problem, it
> > was also reporting the device didn't support SMART..
>
> Can you check whether reverting the above commit makes SMART work again?

smartctl fine for me, with and without the patch.

--
Jens Axboe

2006-12-19 17:50:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc1



On Tue, 19 Dec 2006, Robert Hancock wrote:
>
> From what I've seen it appears that smartctl has the same problem, it was also
> reporting the device didn't support SMART..

No, there were actually two different problems, and to confuse people,
they had the same _symptoms_.

Commit 0e75f9063f5c55fb0b0b546a7c356f8ec186825e introduced a bug where
SG_IO wouldn't copy the data back to user space correctly, which was why
you got various tools like "dvd+rw-mediainfo" (and probably smartctl too)
breaking.

That was also probably why bisection didn't pick out the right commit for
the _other_ bug: it was effectively masked by the same problem, so the
fact that commit f38621b3109068adc8430bc2d170ccea59df4261 fixed the sense
info details and broke hddtemp was not visible as a bug, because the sense
info was _more_ corrupted by the other bug, and thus "git bisect" walked
back to where the _first_ bug was introduced, rather than the second one.

Anyway, sounds like hddtemp was just buggy.

Linus