2007-09-21 08:56:33

by Jens Axboe

[permalink] [raw]
Subject: What's in linux-2.6-block.git for 2.6.24

Hi,

This details the contents of the block git repo of items that are bound
for a 2.6.24 merge. The SCSI data buffer accessor patch from Tomo will
also be going in through the block tree, but it's not merged up yet.
That's mainly due to my laziness, not because the code isn't ready. That
will happen sometime during today.

Misc bits:
- Various bug fixes from Neil Brown, part of his larger patchset for
allowing arbitrarily sized bios.
- Various little bug fixes and documentation updates.

Barriers:
- The empty bio barrier patches from me. These allow sending down a bio
with no data attached, for insertion a barrier in a request queue.
They are useful for dm and md, but also cleans up the sync
blkdev_issue_flush() interface - it's now no longer an addon hack, but
just a natural use of empty bio barriers.

SG chaining bits:
- This is the bulk of the patchset. It consists of three major
components:

- sglist-core, which add helpers for iterating sg lists and
switches the block layer and SCSI to use those. Should not
have any functional changes.
- sglist-drivers, which converts drivers to use the sg list
helpers. Again, should not contain functional changes.
- sglist-arch, which adds support to most architectures and
actually enables sg chaining.

The goal of sg chaining is to allow support for very large sgtables,
without requiring that they be allocated from one contigious piece of
memory.

Shortlog:

Adrian Bunk (1):
remove ide_get_error_location()

Andrew Morton (1):
Fixup u14-34f ENABLE_SG_CHAINING

Dhaval Giani (1):
Corrections in Documentation/block/ioprio.txt

FUJITA Tomonori (9):
ips: sg chaining support
zfcp: sg chaining support
libata sg chaining support fix
qla1280: sg chaining fixes
add use_sg_chaining option to scsi_host_template
qla1280: enable use_sg_chaining option
revert sg segment size ifdefs
remove blk_queue_max_phys_segments in libata
remove sglist_len

Jens Axboe (45):
[BLOCK] Fixup rq_for_each_segment() indentation
Fix warnings with !CONFIG_BLOCK
block: ll_rw_blk.c: cosmetics
bio: use memset() in bio_init()
bio: make freeing of ->bi_io_vec conditional in bio_free()
block: add end_queued_request() and end_dequeued_request() helpers
block: factor our bio_check_eod()
block: Initial support for data-less (or empty) barrier support
block: convert blkdev_issue_flush() to use empty barriers
pktcdvd: don't rely on bio_init() preserving bio->bi_io_vec
crypto: don't pollute the global namespace with sg_next()
Add sg helpers for iterating over a scatterlist table
block: convert to using sg helpers
scsi: convert to using sg helpers
Add chained sg support to linux/scatterlist.h
ll_rw_blk: temporarily enable max_segments tweaking
scsi: simplify scsi_free_sgtable()
SCSI: support for allocating large scatterlists
libata: convert to using sg helpers
scsi_debug: support sg chaining
scsi generic: sg chaining support
qla1280: sg chaining support
aic94xx: sg chaining support
qlogicpti: sg chaining support
ide-scsi: sg chaining support
gdth: sg chaining support
aha1542: convert to use the data buffer accessors
advansys: convert to use the data buffer accessors
infiniband: sg chaining support
USB storage: sg chaining support
Fusion: sg chaining support
i2o: sg chaining support
IDE: sg chaining support
i386 dma_map_sg: convert to using sg helpers
i386: enable sg chaining
swiotlb: sg chaining support
x86-64: update calgary iommu to sg helpers
x86-64: update nommu to sg helpers
x86-64: update pci-gart iommu to sg helpers
x86-64: enable sg chaining
IA64: sg chaining support
PS3: sg chaining support
PPC: sg chaining support
SPARC: sg chaining support
SPARC64: sg chaining support

Laurent Riffard (1):
pktcdvd: don't rely on bio_init() preserving bio->bi_destructor

Lee Schermerhorn (1):
Panic in blk_rq_map_sg() from CCISS driver

Mel Gorman (1):
Build failure on ppc64 drivers/block/ps3disk.c

NeilBrown (6):
Merge blk_recount_segments into blk_recalc_rq_segments
[BLOCK] Introduce rq_for_each_segment replacing rq_for_each_bio
Fix various abuse of bio fields in umem.c
New function blk_req_append_bio
Stop exporting blk_rq_bio_prep
Share code between init_request_from_bio and blk_rq_bio_prep

Satyam Sharma (1):
ll_rw_blk: blk_cpu_notifier should be __cpuinitdata

saeed bishara (1):
use sg helper function in DMA mapping documentation

Documentation/DMA-mapping.txt | 2
Documentation/block/biodoc.txt | 20
Documentation/block/ioprio.txt | 11
arch/ia64/hp/common/sba_iommu.c | 14
arch/ia64/hp/sim/simscsi.c | 1
arch/ia64/sn/pci/pci_dma.c | 11
arch/powerpc/kernel/dma_64.c | 5
arch/powerpc/kernel/ibmebus.c | 11
arch/powerpc/kernel/iommu.c | 23
arch/powerpc/platforms/ps3/system-bus.c | 7
arch/sparc/kernel/ioport.c | 25 -
arch/sparc/mm/io-unit.c | 12
arch/sparc/mm/iommu.c | 10
arch/sparc/mm/sun4c.c | 10
arch/sparc64/kernel/iommu.c | 39 +
arch/sparc64/kernel/pci_sun4v.c | 32 -
arch/x86_64/kernel/pci-calgary.c | 24 -
arch/x86_64/kernel/pci-gart.c | 63 +-
arch/x86_64/kernel/pci-nommu.c | 5
block/elevator.c | 17
block/ll_rw_blk.c | 522 +++++++++++++---------
crypto/digest.c | 2
crypto/scatterwalk.c | 2
crypto/scatterwalk.h | 2
drivers/ata/libata-core.c | 35 -
drivers/ata/libata-scsi.c | 2
drivers/block/cciss.c | 1
drivers/block/floppy.c | 81 +--
drivers/block/lguest_blk.c | 36 -
drivers/block/nbd.c | 59 +-
drivers/block/pktcdvd.c | 7
drivers/block/ps3disk.c | 63 --
drivers/block/umem.c | 38 +
drivers/block/xen-blkfront.c | 32 -
drivers/ide/cris/ide-cris.c | 3
drivers/ide/ide-disk.c | 29 -
drivers/ide/ide-dma.c | 2
drivers/ide/ide-floppy.c | 52 +-
drivers/ide/ide-io.c | 38 -
drivers/ide/ide-probe.c | 2
drivers/ide/ide-taskfile.c | 18
drivers/ide/mips/au1xxx-ide.c | 2
drivers/ide/pci/sgiioc4.c | 3
drivers/ide/ppc/pmac.c | 2
drivers/infiniband/hw/ipath/ipath_dma.c | 10
drivers/infiniband/ulp/iser/iser_memory.c | 76 +--
drivers/md/dm-emc.c | 10
drivers/md/dm-table.c | 28 -
drivers/md/dm.c | 16
drivers/md/dm.h | 1
drivers/md/linear.c | 20
drivers/md/md.c | 1
drivers/md/multipath.c | 30 -
drivers/md/raid0.c | 21
drivers/md/raid1.c | 31 -
drivers/md/raid10.c | 31 -
drivers/md/raid5.c | 31 -
drivers/message/fusion/mptscsih.c | 6
drivers/message/i2o/i2o_block.c | 24 -
drivers/s390/block/dasd_diag.c | 37 -
drivers/s390/block/dasd_eckd.c | 28 -
drivers/s390/block/dasd_fba.c | 28 -
drivers/s390/char/tape_34xx.c | 32 -
drivers/s390/char/tape_3590.c | 37 -
drivers/s390/scsi/zfcp_def.h | 1
drivers/s390/scsi/zfcp_qdio.c | 6
drivers/scsi/3w-9xxx.c | 1
drivers/scsi/3w-xxxx.c | 1
drivers/scsi/BusLogic.c | 1
drivers/scsi/NCR53c406a.c | 3
drivers/scsi/a100u2w.c | 1
drivers/scsi/aacraid/linit.c | 1
drivers/scsi/advansys.c | 12
drivers/scsi/aha1542.c | 32 -
drivers/scsi/aha1740.c | 1
drivers/scsi/aic7xxx/aic79xx_osm.c | 1
drivers/scsi/aic7xxx/aic7xxx_osm.c | 1
drivers/scsi/aic7xxx_old.c | 1
drivers/scsi/aic94xx/aic94xx_task.c | 6
drivers/scsi/arcmsr/arcmsr_hba.c | 1
drivers/scsi/dc395x.c | 1
drivers/scsi/dpt_i2o.c | 1
drivers/scsi/eata.c | 3
drivers/scsi/gdth.c | 45 -
drivers/scsi/hosts.c | 1
drivers/scsi/hptiop.c | 1
drivers/scsi/ibmmca.c | 1
drivers/scsi/ibmvscsi/ibmvscsi.c | 1
drivers/scsi/ide-scsi.c | 31 -
drivers/scsi/initio.c | 1
drivers/scsi/ips.c | 14
drivers/scsi/lpfc/lpfc_scsi.c | 2
drivers/scsi/mac53c94.c | 1
drivers/scsi/megaraid.c | 1
drivers/scsi/megaraid/megaraid_mbox.c | 1
drivers/scsi/megaraid/megaraid_sas.c | 1
drivers/scsi/mesh.c | 1
drivers/scsi/nsp32.c | 1
drivers/scsi/pcmcia/sym53c500_cs.c | 1
drivers/scsi/qla1280.c | 70 +-
drivers/scsi/qla2xxx/qla_os.c | 2
drivers/scsi/qla4xxx/ql4_os.c | 1
drivers/scsi/qlogicfas.c | 1
drivers/scsi/qlogicpti.c | 15
drivers/scsi/scsi_debug.c | 30 -
drivers/scsi/scsi_lib.c | 248 +++++++---
drivers/scsi/scsi_tgt_lib.c | 4
drivers/scsi/sd.c | 23
drivers/scsi/sg.c | 16
drivers/scsi/stex.c | 1
drivers/scsi/sym53c416.c | 1
drivers/scsi/sym53c8xx_2/sym_glue.c | 1
drivers/scsi/u14-34f.c | 3
drivers/scsi/ultrastor.c | 1
drivers/scsi/wd7000.c | 1
drivers/usb/storage/alauda.c | 16
drivers/usb/storage/datafab.c | 10
drivers/usb/storage/jumpshot.c | 10
drivers/usb/storage/protocol.c | 20
drivers/usb/storage/protocol.h | 2
drivers/usb/storage/sddr09.c | 16
drivers/usb/storage/sddr55.c | 16
drivers/usb/storage/shuttle_usbat.c | 17
fs/bio.c | 23
fs/fs-writeback.c | 1
include/asm-i386/dma-mapping.h | 13
include/asm-i386/scatterlist.h | 2
include/asm-ia64/dma-mapping.h | 1
include/asm-ia64/scatterlist.h | 2
include/asm-powerpc/dma-mapping.h | 17
include/asm-powerpc/scatterlist.h | 2
include/asm-sparc/scatterlist.h | 2
include/asm-sparc64/scatterlist.h | 2
include/asm-x86_64/dma-mapping.h | 3
include/asm-x86_64/scatterlist.h | 2
include/linux/bio.h | 19
include/linux/blkdev.h | 32 -
include/linux/i2o.h | 3
include/linux/ide.h | 7
include/linux/libata.h | 16
include/linux/scatterlist.h | 84 +++
include/linux/writeback.h | 1
include/scsi/scsi.h | 7
include/scsi/scsi_cmnd.h | 7
include/scsi/scsi_host.h | 13
kernel/sched.c | 1
lib/swiotlb.c | 19
mm/bounce.c | 6
mm/readahead.c | 1
149 files changed, 1515 insertions(+), 1348 deletions(-)

--
Jens Axboe


2007-09-21 09:15:48

by Andrew Morton

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On Fri, 21 Sep 2007 10:57:11 +0200 Jens Axboe <[email protected]> wrote:

> Hi,
>
> This details the contents of the block git repo of items that are bound
> for a 2.6.24 merge. The SCSI data buffer accessor patch from Tomo will
> also be going in through the block tree, but it's not merged up yet.
> That's mainly due to my laziness, not because the code isn't ready. That
> will happen sometime during today.
>
> Misc bits:
> - Various bug fixes from Neil Brown, part of his larger patchset for
> allowing arbitrarily sized bios.
> - Various little bug fixes and documentation updates.
>
> Barriers:
> - The empty bio barrier patches from me. These allow sending down a bio
> with no data attached, for insertion a barrier in a request queue.
> They are useful for dm and md, but also cleans up the sync
> blkdev_issue_flush() interface - it's now no longer an addon hack, but
> just a natural use of empty bio barriers.

That sounds useful.

> SG chaining bits:
> - This is the bulk of the patchset. It consists of three major
> components:
>
> - sglist-core, which add helpers for iterating sg lists and
> switches the block layer and SCSI to use those. Should not
> have any functional changes.
> - sglist-drivers, which converts drivers to use the sg list
> helpers. Again, should not contain functional changes.
> - sglist-arch, which adds support to most architectures and
> actually enables sg chaining.
>
> The goal of sg chaining is to allow support for very large sgtables,
> without requiring that they be allocated from one contigious piece of
> memory.

Presumably sg chaining means more overhead on the IO submission paths? If
so, has this been quantified?

2007-09-21 09:23:43

by Jens Axboe

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On Fri, Sep 21 2007, Andrew Morton wrote:
> > SG chaining bits:
> > - This is the bulk of the patchset. It consists of three major
> > components:
> >
> > - sglist-core, which add helpers for iterating sg lists and
> > switches the block layer and SCSI to use those. Should not
> > have any functional changes.
> > - sglist-drivers, which converts drivers to use the sg list
> > helpers. Again, should not contain functional changes.
> > - sglist-arch, which adds support to most architectures and
> > actually enables sg chaining.
> >
> > The goal of sg chaining is to allow support for very large sgtables,
> > without requiring that they be allocated from one contigious piece of
> > memory.
>
> Presumably sg chaining means more overhead on the IO submission paths? If
> so, has this been quantified?

Depends on how you look at it. For sizes that are small enough to not
use sg chaining (like we do now), there are no changes. Just cleanups to
drivers to use sg_next() and for_each_sg() and so on. Well there is one
snag and that is sg_last(), since that needs to iterate the list. But
that should not be used in performance critical sections. And we can get
rid of that completely as well should we want to, if we define a
per-arch chain limit so that sg_last() can just index the last segment
even if ARCH_HAS_SG_CHAIN is set but nents <= ARCH_SG_CHAIN_SIZE (or
whatever that define would be).

For actually using the sg chaining, there's some overhead of course. Say
we support 256 entries without chaining, or 1mb with 4kb pages. A
request with 1000 entried would require 4 trips to the allocator to
allocate the chainable lists and 4 trips when freeing that list again.
We don't loop the sg list on setup of freeing, just jump to the correct
locations.

So even for chaining, the cost isn't that big. It enables us to support
much larger IO commands and potentially speed up some devices quite a
lot, so CPU cost is less of a concern. And for small sglists, there
isn't a noticable overhead.

--
Jens Axboe

2007-09-21 09:35:43

by Jens Axboe

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On Fri, Sep 21 2007, Jens Axboe wrote:
> On Fri, Sep 21 2007, Andrew Morton wrote:
> > > SG chaining bits:
> > > - This is the bulk of the patchset. It consists of three major
> > > components:
> > >
> > > - sglist-core, which add helpers for iterating sg lists and
> > > switches the block layer and SCSI to use those. Should not
> > > have any functional changes.
> > > - sglist-drivers, which converts drivers to use the sg list
> > > helpers. Again, should not contain functional changes.
> > > - sglist-arch, which adds support to most architectures and
> > > actually enables sg chaining.
> > >
> > > The goal of sg chaining is to allow support for very large sgtables,
> > > without requiring that they be allocated from one contigious piece of
> > > memory.
> >
> > Presumably sg chaining means more overhead on the IO submission paths? If
> > so, has this been quantified?
>
> Depends on how you look at it. For sizes that are small enough to not
> use sg chaining (like we do now), there are no changes. Just cleanups to
> drivers to use sg_next() and for_each_sg() and so on. Well there is one
> snag and that is sg_last(), since that needs to iterate the list. But
> that should not be used in performance critical sections. And we can get
> rid of that completely as well should we want to, if we define a
> per-arch chain limit so that sg_last() can just index the last segment
> even if ARCH_HAS_SG_CHAIN is set but nents <= ARCH_SG_CHAIN_SIZE (or
> whatever that define would be).
>
> For actually using the sg chaining, there's some overhead of course. Say
> we support 256 entries without chaining, or 1mb with 4kb pages. A
> request with 1000 entried would require 4 trips to the allocator to
> allocate the chainable lists and 4 trips when freeing that list again.
> We don't loop the sg list on setup of freeing, just jump to the correct
> locations.
>
> So even for chaining, the cost isn't that big. It enables us to support
> much larger IO commands and potentially speed up some devices quite a
> lot, so CPU cost is less of a concern. And for small sglists, there
> isn't a noticable overhead.

Forgot one more thing... For embedded systems where RAM is precious, sg
chaining also allows us to optimize for that by trading a little CPU to
free up some memory. SCSI currently has slabs and mempools for 8, 16,
32, 64, and 128 entry scatterlists. With chaining we can get away with
supporting just one sg table size, since we can just chain to reach the
desired size. Some care needs to be taken on the mempool side, but it's
doable.

--
Jens Axboe

2007-09-23 13:19:27

by Torsten Kaiser

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On 9/21/07, Jens Axboe <[email protected]> wrote:
> SG chaining bits:
> - This is the bulk of the patchset. It consists of three major
> components:
>
> - sglist-core, which add helpers for iterating sg lists and
> switches the block layer and SCSI to use those. Should not
> have any functional changes.
> - sglist-drivers, which converts drivers to use the sg list
> helpers. Again, should not contain functional changes.
> - sglist-arch, which adds support to most architectures and
> actually enables sg chaining.

Adding linux-ide and linux-scsi as CC like Andrew did with my last report.

I still have trouble with my Silicon Image, Inc. SiI 3132 Serial ATA
Raid II Controller as reported on 2.6.23-rc4-mm1 on the new
2.6.23-rc6-mm1.

I'm not 100% sure if this caused by the sg chaining, but the patch
from http://lkml.org/lkml/2007/9/10/251 which touches that chaining
makes a difference, so it might be related.

First report: http://lkml.org/lkml/2007/9/1/92
With patch it fails fewer times: http://lkml.org/lkml/2007/9/14/107

To update the statistik:
prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
back to 2.6.23-rc3-mm1: 18x good.
2.6.23-rc4-mm1 with patch: 2 out of 8 bad
after that second mail:
2.6.23-rc4-mm1 with patch: 1 out of 5 bad
2.6.23-rc6-mm1: 1 out of 2 bad
switching back to 2.6.23-rc3-mm1 to rule out the hardware:
2.6.23-rc3-mm1: 6x good

The error messages from the failed 2.6.23-rc6-mm1:
Sep 18 18:50:01 treogen [ 33.340000] md1: bitmap initialized from
disk: read 10/10 pages, set 0 bits
Sep 18 18:50:01 treogen [ 33.340000] created bitmap (145 pages) for device md1
Sep 18 18:50:01 treogen [ 63.440000] ata1.00: exception Emask 0x0
SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 18 18:50:01 treogen [ 63.440000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 18 18:50:01 treogen [ 63.440000] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 18 18:50:01 treogen [ 63.440000] ata1.00: status: {DRDY }
Sep 18 18:50:01 treogen [ 63.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [ 65.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [ 65.740000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 18 18:50:01 treogen [ 73.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [ 75.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [ 75.740000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 18 18:50:01 treogen [ 83.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [ 85.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [ 85.740000] ata1: reset failed (errno=-5),
retrying in 33 secs
Sep 18 18:50:01 treogen [ 118.440000] ata1: limiting SATA link speed
to 1.5 Gbps
Sep 18 18:50:01 treogen [ 118.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [ 120.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [ 120.740000] ata1: reset failed, giving up
Sep 18 18:50:01 treogen [ 120.740000] ata1.00: disabled
Sep 18 18:50:01 treogen [ 120.740000] ata1: EH complete
Sep 18 18:50:01 treogen [ 120.740000] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 18 18:50:01 treogen [ 120.740000] end_request: I/O error, dev
sda, sector 625137161
Sep 18 18:50:01 treogen [ 120.740000] md: super_written gets
error=-5, uptodate=0
Sep 18 18:50:01 treogen [ 120.740000] raid5: Disk failure on sda2,
disabling device. Operation continuing on 2 devices

After that many more errors like this, only differing in the sector number:
Sep 18 18:50:01 treogen [ 120.810000] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 18 18:50:01 treogen [ 120.810000] end_request: I/O error, dev
sda, sector 19550919

Any more infos needed?

Torsten

2007-09-23 13:55:44

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On Sun, 23 Sep 2007 15:19:13 +0200
"Torsten Kaiser" <[email protected]> wrote:

> On 9/21/07, Jens Axboe <[email protected]> wrote:
> > SG chaining bits:
> > - This is the bulk of the patchset. It consists of three major
> > components:
> >
> > - sglist-core, which add helpers for iterating sg lists and
> > switches the block layer and SCSI to use those. Should not
> > have any functional changes.
> > - sglist-drivers, which converts drivers to use the sg list
> > helpers. Again, should not contain functional changes.
> > - sglist-arch, which adds support to most architectures and
> > actually enables sg chaining.
>
> Adding linux-ide and linux-scsi as CC like Andrew did with my last report.
>
> I still have trouble with my Silicon Image, Inc. SiI 3132 Serial ATA
> Raid II Controller as reported on 2.6.23-rc4-mm1 on the new
> 2.6.23-rc6-mm1.
>
> I'm not 100% sure if this caused by the sg chaining, but the patch
> from http://lkml.org/lkml/2007/9/10/251 which touches that chaining
> makes a difference, so it might be related.
>
> First report: http://lkml.org/lkml/2007/9/1/92
> With patch it fails fewer times: http://lkml.org/lkml/2007/9/14/107
>
> To update the statistik:
> prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
> 2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
> back to 2.6.23-rc3-mm1: 18x good.
> 2.6.23-rc4-mm1 with patch: 2 out of 8 bad
> after that second mail:
> 2.6.23-rc4-mm1 with patch: 1 out of 5 bad
> 2.6.23-rc6-mm1: 1 out of 2 bad

git-block.patch in 2.6.23-rc6-mm1 includes my patch that disables sg
chaining for libata but it still includes libata's sg chaining
changes. So these changes breaks libata or libata was broken after
2.6.23-rc3-mm1.

Can you try Jens's sglist-arch branch? If it works, probably libata in
-mm has bugs.

For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:

http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2


> switching back to 2.6.23-rc3-mm1 to rule out the hardware:
> 2.6.23-rc3-mm1: 6x good
>
> The error messages from the failed 2.6.23-rc6-mm1:
> Sep 18 18:50:01 treogen [ 33.340000] md1: bitmap initialized from
> disk: read 10/10 pages, set 0 bits
> Sep 18 18:50:01 treogen [ 33.340000] created bitmap (145 pages) for device md1
> Sep 18 18:50:01 treogen [ 63.440000] ata1.00: exception Emask 0x0
> SAct 0x1 SErr 0x0 action 0x6 frozen
> Sep 18 18:50:01 treogen [ 63.440000] ata1.00: cmd
> 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
> Sep 18 18:50:01 treogen [ 63.440000] res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 18 18:50:01 treogen [ 63.440000] ata1.00: status: {DRDY }
> Sep 18 18:50:01 treogen [ 63.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [ 65.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [ 65.740000] ata1: reset failed (errno=-5),
> retrying in 8 secs
> Sep 18 18:50:01 treogen [ 73.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [ 75.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [ 75.740000] ata1: reset failed (errno=-5),
> retrying in 8 secs
> Sep 18 18:50:01 treogen [ 83.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [ 85.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [ 85.740000] ata1: reset failed (errno=-5),
> retrying in 33 secs
> Sep 18 18:50:01 treogen [ 118.440000] ata1: limiting SATA link speed
> to 1.5 Gbps
> Sep 18 18:50:01 treogen [ 118.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [ 120.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [ 120.740000] ata1: reset failed, giving up
> Sep 18 18:50:01 treogen [ 120.740000] ata1.00: disabled
> Sep 18 18:50:01 treogen [ 120.740000] ata1: EH complete
> Sep 18 18:50:01 treogen [ 120.740000] sd 0:0:0:0: [sda] Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 18 18:50:01 treogen [ 120.740000] end_request: I/O error, dev
> sda, sector 625137161
> Sep 18 18:50:01 treogen [ 120.740000] md: super_written gets
> error=-5, uptodate=0
> Sep 18 18:50:01 treogen [ 120.740000] raid5: Disk failure on sda2,
> disabling device. Operation continuing on 2 devices
>
> After that many more errors like this, only differing in the sector number:
> Sep 18 18:50:01 treogen [ 120.810000] sd 0:0:0:0: [sda] Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 18 18:50:01 treogen [ 120.810000] end_request: I/O error, dev
> sda, sector 19550919
>
> Any more infos needed?
>
> Torsten
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-09-23 14:04:50

by Alan

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

> Sep 18 18:50:01 treogen [ 63.440000] ata1.00: status: {DRDY }
> Sep 18 18:50:01 treogen [ 63.440000] ata1: hard resetting link

Timed out waiting for data transfers to complete that didn't. Does sound
like the device got told the wrong sized transfer.


It then falls off the bus because Jeff hasn't merged Mark Lord's DRQ
draining patch.

Alan

2007-09-23 15:32:19

by Torsten Kaiser

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On 9/23/07, FUJITA Tomonori <[email protected]> wrote:
> On Sun, 23 Sep 2007 15:19:13 +0200
> "Torsten Kaiser" <[email protected]> wrote:
> > To update the statistik:
> > prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
> > 2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
> > back to 2.6.23-rc3-mm1: 18x good.
> > 2.6.23-rc4-mm1 with patch: 2 out of 8 bad
> > after that second mail:
> > 2.6.23-rc4-mm1 with patch: 1 out of 5 bad
> > 2.6.23-rc6-mm1: 1 out of 2 bad
>
> git-block.patch in 2.6.23-rc6-mm1 includes my patch that disables sg
> chaining for libata but it still includes libata's sg chaining
> changes. So these changes breaks libata or libata was broken after
> 2.6.23-rc3-mm1.
>
> Can you try Jens's sglist-arch branch? If it works, probably libata in
> -mm has bugs.
>
> For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:
>
> http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2

Thanks for the patch.
I tried it and 3 out of 3 boot attempts worked without problems.
But I can't rule out that the bug is still there, as I have no way to
trigger it on demand.

Torsten

2007-09-23 15:41:01

by Torsten Kaiser

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On 9/23/07, Alan Cox <[email protected]> wrote:
> > Sep 18 18:50:01 treogen [ 63.440000] ata1.00: status: {DRDY }
> > Sep 18 18:50:01 treogen [ 63.440000] ata1: hard resetting link
>
> Timed out waiting for data transfers to complete that didn't. Does sound
> like the device got told the wrong sized transfer.
>
>
> It then falls off the bus because Jeff hasn't merged Mark Lord's DRQ
> draining patch.

One time the error was different:
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: exception Emask 0x20
SAct 0x1 SErr 0x0 action 0x2
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: irq_stat 0x00020002,
PCI master abort while fetching SGT
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 11 19:19:24 treogen [ 33.340000] res
50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: status: {DRDY }
Sep 11 19:19:24 treogen [ 33.670000] ata1: soft resetting link
Sep 11 19:19:24 treogen [ 33.710000] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 11 19:19:24 treogen [ 33.800000] ata1.00: configured for UDMA/100
Sep 11 19:19:24 treogen [ 33.800000] ata1: EH complete

This was repeated 12 times.
(Diff between a good boot and one with that error is here:
http://lkml.org/lkml/2007/9/14/107 )

Torsten

2007-09-24 18:48:21

by Torsten Kaiser

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On 9/23/07, Torsten Kaiser <[email protected]> wrote:
> On 9/23/07, FUJITA Tomonori <[email protected]> wrote:
> > Can you try Jens's sglist-arch branch? If it works, probably libata in
> > -mm has bugs.
> >
> > For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:
> >
> > http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
>
> Thanks for the patch.
> I tried it and 3 out of 3 boot attempts worked without problems.
> But I can't rule out that the bug is still there, as I have no way to
> trigger it on demand.

Short update:
2 more boots with that kernel did also work.
I have just installed 2.6.23-rc7-mm1 and booted three times.
Also no problems with that version.

I will keep on using 2.6.23-rc7-mm1 and post again, if the error shows up again.

Torsten

2007-09-25 05:53:14

by Torsten Kaiser

[permalink] [raw]
Subject: Re: What's in linux-2.6-block.git for 2.6.24

On 9/24/07, Torsten Kaiser <[email protected]> wrote:
> I will keep on using 2.6.23-rc7-mm1 and post again, if the error shows up again.

On the next boot it did show up again, so 2.6.23-rc7-mm1 still has the bug.

[ 33.810000] md1: bitmap initialized from disk: read 10/10 pages, set 0 bits
[ 33.810000] created bitmap (145 pages) for device md1
[ 63.910000] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 63.910000] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[ 63.910000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 63.910000] ata1.00: status: {DRDY }
[ 63.910000] ata1: hard resetting link
[ 66.210000] ata1: softreset failed (port not ready)
[ 66.210000] ata1: reset failed (errno=-5), retrying in 8 secs
[ 73.910000] ata1: hard resetting link
[ 76.210000] ata1: softreset failed (port not ready)
[ 76.210000] ata1: reset failed (errno=-5), retrying in 8 secs
[ 83.910000] ata1: hard resetting link
[ 86.210000] ata1: softreset failed (port not ready)
[ 86.210000] ata1: reset failed (errno=-5), retrying in 33 secs
[ 118.910000] ata1: limiting SATA link speed to 1.5 Gbps
[ 118.910000] ata1: hard resetting link
[ 121.210000] ata1: softreset failed (port not ready)
[ 121.210000] ata1: reset failed, giving up
[ 121.210000] ata1.00: disabled
[ 121.210000] ata1: EH complete
[ 121.210000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[ 121.210000] end_request: I/O error, dev sda, sector 625137161
[ 121.210000] md: super_written gets error=-5, uptodate=0
[ 121.210000] raid5: Disk failure on sda2, disabling device.
Operation continuing on 2 devices

After that there are many more error like this in the log:
[ 135.760000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[ 135.760000] end_request: I/O error, dev sda, sector 19551113
[ 135.760000] Buffer I/O error on device sda2, logical block 1
or:
[ 135.760000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[ 135.760000] end_request: I/O error, dev sda, sector 19551105

Torsten