2006-05-15 17:00:11

by Jeff Garzik

[permalink] [raw]
Subject: [RFT] major libata update


After much development and review, I merged a massive pile of libata
patches from Tejun Heo and Albert Lee. This update contains the
following major libata

CHANGES:
* Rewritten error handling. This is a major piece of work, even
though it will be rarely seen. The new libata EH provides the
foundation for not only improved error handling, but also new features
such as device hotplug or command queueing. (Tejun Heo)

* PIO-based I/O is now IRQ-driven by default, rather than polled
in a kernel thread. The polling path will continue to exist for
controllers that need it, and other special cases. (Albert Lee)

* Core support for command queueing (Jens Axboe, Tejun Heo)

* Support for NCQ-style command queueing (Jens Axboe, Tejun Heo)

* Increase max-sectors dramatically, for LBA48 devices (Tejun Heo?)

* Other minor changes, from myself and others.

IMPACT:
* If all goes well, this update should improve error handling,
solve several outstanding, difficult-to-solve bugs, and provide a good
foundation for adding some nifty features in the future.

TESTING:
* Although most drivers by count received few operational changes, the
common probe path was updated, so all drivers need fresh "yes, it sees
all my disks" regression testing.

* ahci and sata_sil24 were touched a lot, and so need additional
testing.

* sata_sil and ata_piix also need healthy re-testing of all basic
functionality.

FEEDBACK:
* Please CC [email protected] on all emails and bug reports.

MERGE STATUS:
* Barring major problems in testing, will submit during 2.6.18 merge window.


Patch:
http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.17-rc4-git2-libata1.patch.bz2
(diff'd against 2.6.17-rc4-git2, but should apply to most recent
2.6.17-rcX[-gitY] kernels)

The 'upstream' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

contains the following updates:

drivers/scsi/Makefile | 2
drivers/scsi/ahci.c | 436 ++++---
drivers/scsi/ata_piix.c | 16
drivers/scsi/libata-bmdma.c | 143 ++
drivers/scsi/libata-core.c | 2437 +++++++++++++++++++++++++++++---------------
drivers/scsi/libata-eh.c | 1558 ++++++++++++++++++++++++++++
drivers/scsi/libata-scsi.c | 423 ++++---
drivers/scsi/libata.h | 24
drivers/scsi/pdc_adma.c | 10
drivers/scsi/sata_mv.c | 30
drivers/scsi/sata_nv.c | 6
drivers/scsi/sata_promise.c | 18
drivers/scsi/sata_qstor.c | 13
drivers/scsi/sata_sil.c | 65 -
drivers/scsi/sata_sil24.c | 615 ++++++-----
drivers/scsi/sata_sis.c | 2
drivers/scsi/sata_svw.c | 4
drivers/scsi/sata_sx4.c | 19
drivers/scsi/sata_uli.c | 2
drivers/scsi/sata_via.c | 2
drivers/scsi/sata_vsc.c | 15
drivers/scsi/scsi.c | 18
drivers/scsi/scsi_error.c | 3
drivers/scsi/scsi_lib.c | 2
drivers/scsi/scsi_priv.h | 1
include/linux/ata.h | 34
include/linux/libata.h | 379 ++++--
include/scsi/scsi_cmnd.h | 1
include/scsi/scsi_eh.h | 1
include/scsi/scsi_host.h | 1
30 files changed, 4634 insertions(+), 1646 deletions(-)

Albert Lee:
libata: interrupt driven pio for libata-core
libata: interrupt driven pio for LLD
libata irq-pio: add comments and cleanup
libata irq-pio: rename atapi_packet_task() and comments
libata irq-pio: simplify if condition in ata_dataout_task()
libata irq-pio: cleanup ata_qc_issue_prot()
libata: move atapi_send_cdb() and ata_dataout_task()
[libata irq-pio] reorganize ata_pio_sector() and __atapi_pio_bytes()
[libata irq-pio] reorganize "buf + offset" in ata_pio_sector()
[libata irq-pio] use PageHighMem() to optimize the kmap_atomic() usage
libata irq-pio: misc fixes
libata irq-pio: merge the ata_dataout_task workqueue with ata_pio_task workqueue
libata irq-pio: eliminate unnecessary queuing in ata_pio_first_block()
libata irq-pio: add read/write multiple support
libata-dev: determine err_mask when error is found
libata-dev: filter out noisy ATAPI error messages
libata-dev: Fix array index value in ata_rwcmd_protocol()
libata-dev: Use new ata_queue_pio_task() for PIO polling task
libata-dev: Use new AC_ERR_* flags
libata-dev: Minor comment fix
libata-dev: recognize WRITE_MULTI_FUA_EXT for r/w multiple
libata-dev: Remove trailing whitespaces
libata-dev: Fix merge problem with upstream
libata-dev: Remove atapi_packet_task()
libata-dev: Move out the HSM code from ata_host_intr()
libata-dev: Minor fix for ata_hsm_move() to work with ata_host_intr()
libata-dev: Let ata_hsm_move() work with both irq-pio and polling pio
libata-dev: Convert ata_pio_task() to use the new ata_hsm_move()
libata-dev: Cleanup unused enums/functions
libata-dev: ata_check_atapi_dma() fix for ATA_FLAG_PIO_POLLING LLDDs
libata-dev: Make the the in_wq check as an inline function
libata-dev: irq-pio minor fixes (respin)
libata-dev: fix the device err check sequence (respin)
libata-dev: wait idle after reading the last data block
libata-dev: print out information for ATAPI devices with CDB interrupts
libata-dev: handle DRQ=1 ERR=1 (revised)
libata-dev: irq-pio minor fix
libata-dev: irq-pio minor fix 2
libata: convert ATAPI_ENABLE_DMADIR to module parameter

Bastiaan Jacques:
ahci: add support for VIA VT8251

Jeff Garzik:
[libata irq-pio] build fix
[libata pdc_adma] update for removal of ATA_FLAG_NOINTR
[libata pdc_adma] fix for new irq-driven PIO code
[libata sata_mv] IRQ PIO build fix
[libata] irq-pio: fix breakage related to err_mask merge
[libata sata_promise] irq_pio: fix merge bug
[libata] build fix after merging some pre-packet_task-removal code
[libata irq-pio] s/assert/WARN_ON/
[libata] build fix after cdb_len move
sata_vsc build fix
libata: irq-pio build fixes
[libata] irq-pio: fix build breakage
[libata] irq-pio: Fix merge mistake
[libata] kill bogus cut-n-pasted comments in three drivers
[libata] bump versions
libata: Fix EH merge difference between this branch and upstream.
libata: Add helper ata_shost_to_port()

Luben Tuikov:
SCSI: Introduce scsi_req_abort_cmd (REPOST)

Tejun Heo:
libata: increase LBA48 max sectors to 65535
libata: fix ata_set_mode() return value
libata: make ata_bus_probe() return negative errno on failure
libata: separate out ata_spd_string()
libata: convert do_probe_reset() to ata_do_reset()
libata: implement ata_dev_enabled and disabled()
libata: make ata_set_mode() handle no-device case properly
libata: reorganize ata_set_mode()
libata: don't disable devices from ata_set_mode()
libata: preserve SATA SPD setting over hard resets
libata: implement ata_dev_absent()
libata: implement ap->sata_spd_limit and helpers
libata: use SATA speed down in ata_drive_probe_reset()
libata: add 5s sleep between resets
libata: implement ata_down_xfermask_limit()
libata: improve ata_bus_probe()
libata: consider disabled devices in ata_dev_xfermask()
libata: report device number when PIO fails
libata: ata_dev_revalidate() printk update
libata: ATA_FLAG_IN_EH is not used, kill it
libata: clean up constants
libata: rename ATA_FLAG_PORT_DISABLED to ATA_FLAG_DISABLED
libata: clear only affected flags during ata_dev_configure()
libata: clear ATA_DFLAG_PIO before setting it
libata: add ATA_QCFLAG_IO
libata: pass qc around intead of ap during PIO
libata: always generate sense if qc->err_mask is non-zero
libata: don't read TF directly from sense generation functions
libata: add @cdb to ata_exec_internal()
libata: dec scmd->retries for qcs with zero err_mask
libata: separate out libata-eh.c
libata: make some libata-core routines extern
libata: print SControl in SATA link status info message
ahci: do not fail softreset if PHY reports no device
libata: set default cbl in probeinit
libata: kill @verbose from ata_reset_fn_t
libata: make reset methods complain when they fail
sata_sil24: fix timeout calculation in sil24_softreset
sata_sil24: better error message from softreset
libata: implement ata_wait_register()
ahci: use ata_wait_register()
sata_sil24: use ata_wait_register()
libata: disable failed devices only once in ata_bus_probe()
libata: cosmetic update to ata_bus_probe()
libata: export ata_set_sata_spd()
sata_sil24: typo fix
sata_sil24: rename PORT_IRQ_SDB_FIS to PORT_IRQ_SDB_NOTIFY
sata_sil24: add more constants
sata_sil24: consolidate host flags into SIL24_COMMON_FLAGS
sata_sil24: implement loss of completion interrupt on PCI-X errta fix
sata_sil24: implement sil24_init_port()
sata_sil24: put port into known state before softresetting
sata_sil24: kill 10ms sleep in softreset
sata_sil24: reimplement hardreset
sata_sil24: don't do hardreset during driver initialization
sata_sil24: fix on-memory structure byteorder
sata_sil24: enable 64bit
SCSI: implement shost->host_eh_scheduled
libata: silly fix in ata_scsi_start_stop_xlat()
libata: rename ata_down_sata_spd_limit() and friends
ahci: hardreset classification fix
libata: unexport ata_scsi_error()
libata: kill duplicate prototypes
libata: fix ->phy_reset class code handling in ata_bus_probe()
libata: clear ap->active_tag atomically w.r.t. command completion
libata: hold host_set lock while finishing internal qc
libata: use preallocated buffers
libata: move ->set_mode() handling into ata_set_mode()
libata: remove postreset handling from ata_do_reset()
libata: implement qc->result_tf
sata_sil24: update TF image only when necessary
libata: init ap->cbl to ATA_CBL_SATA early
libata: implement new SCR handling and port on/offline functions
libata: use new SCR and on/offline functions
libata: kill old SCR functions and sata_dev_present()
libata: add dev->ap
libata: use dev->ap
libata: implement ATA printk helpers
libata: use ATA printk helpers
libata-eh-fw: add flags and operations for new EH
libata-eh-fw: clear SError in ata_std_postreset()
libata-eh-fw: use special reserved tag and qc for internal commands
libata-eh-fw: update ata_qc_from_tag() to enforce normal/EH qc ownership
libata-eh-fw: implement new EH scheduling via error completion
libata-eh-fw: implement ata_port_schedule_eh() and ata_port_abort()
libata-eh-fw: implement freeze/thaw
libata-eh-fw: implement new EH scheduling from PIO
libata-eh-fw: update ata_scsi_error() for new EH
libata-eh-fw: update ata_exec_internal() for new EH
libata-eh-fw: update SCSI command completion path for new EH
libata-eh: add ATA and libata flags for new EH
libata-eh: implement dev->ering
libata-eh: implement ata_eh_info and ata_eh_context
libata-eh: implement new EH
libata-eh: implement BMDMA EH
ata_piix: convert to new EH
sata_sil: convert to new EH
ahci: convert to new EH
ahci: add PIOS interim interrupt handling
sata_sil24: convert to new EH
libata: fix irq-pio merge
libata-ncq: add NCQ related ATA/libata constants and macros
libata-ncq: pass ata_scsi_translate() return value to SCSI midlayer
libata-ncq: rename ap->qactive to ap->qc_allocated
libata-ncq: implement ap->qc_active, ap->sactive and complete helper
libata-ncq: implement NCQ command translation and exclusion
libata-ncq: update EH to handle NCQ
libata-ncq: implement NCQ device configuration
ahci: clean up AHCI constants in preparation for NCQ
ahci: add HOST_CAP_NCQ constant
ahci: kill pp->cmd_tbl_sg
ahci: implement NCQ suppport
sata_sil24: implement NCQ support


2006-05-15 17:07:16

by Alan

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Llu, 2006-05-15 at 13:00 -0400, Jeff Garzik wrote:
> * PIO-based I/O is now IRQ-driven by default, rather than polled
> in a kernel thread. The polling path will continue to exist for
> controllers that need it, and other special cases. (Albert Lee)

How will this be selected ? Passing ->irq = 0 ?

For ata_piix given you've destabilized it a bit would now be a good time
to submit the patches to fix the timing, register scribble and incorrect
ATAPI caching ?

2006-05-15 17:13:34

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Alan Cox wrote:
> On Llu, 2006-05-15 at 13:00 -0400, Jeff Garzik wrote:
>> * PIO-based I/O is now IRQ-driven by default, rather than polled
>> in a kernel thread. The polling path will continue to exist for
>> controllers that need it, and other special cases. (Albert Lee)
>
> How will this be selected ? Passing ->irq = 0 ?

It is selected at runtime by passing a polling flag to ata_taskfile.

That flag, in turn can be set by anything -- driver flags (for
controllers that always require polling), user variable requested at
runtime, whatever.


> For ata_piix given you've destabilized it a bit would now be a good time
> to submit the patches to fix the timing, register scribble and incorrect
> ATAPI caching ?

Sure.

Jeff


2006-05-15 17:21:50

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik <[email protected]> wrote:
>
>
> After much development and review, I merged a massive pile of libata
> patches from Tejun Heo and Albert Lee. This update contains the
> following major libata
>
> CHANGES:
> * Rewritten error handling. This is a major piece of work, even
> though it will be rarely seen. The new libata EH provides the
> foundation for not only improved error handling, but also new features
> such as device hotplug or command queueing. (Tejun Heo)
>
> * PIO-based I/O is now IRQ-driven by default, rather than polled
> in a kernel thread. The polling path will continue to exist for
> controllers that need it, and other special cases. (Albert Lee)
>
> * Core support for command queueing (Jens Axboe, Tejun Heo)
>
> * Support for NCQ-style command queueing (Jens Axboe, Tejun Heo)
>
> * Increase max-sectors dramatically, for LBA48 devices (Tejun Heo?)
>
> * Other minor changes, from myself and others.
>
> IMPACT:
> * If all goes well, this update should improve error handling,
> solve several outstanding, difficult-to-solve bugs, and provide a good
> foundation for adding some nifty features in the future.
>
> TESTING:
> * Although most drivers by count received few operational changes, the
> common probe path was updated, so all drivers need fresh "yes, it sees
> all my disks" regression testing.
>
> * ahci and sata_sil24 were touched a lot, and so need additional
> testing.
>
> * sata_sil and ata_piix also need healthy re-testing of all basic
> functionality.

Lots of goodies.

> FEEDBACK:
> * Please CC [email protected] on all emails and bug reports.
>
> MERGE STATUS:
> * Barring major problems in testing, will submit during 2.6.18 merge window.

I'd be a little concerned with that merge plan at this time - we have a lot
of sata bug reports banked up and afaict a pretty low fixup rate. Then
again, these patches might fix some of those bugs...

I guess if we can get it all in early (which is only a couple of weeks
away!) and you and Tejun will have time set aside to work on problems then
OK. But....

http://bugzilla.kernel.org/show_bug.cgi?id=4920
http://bugzilla.kernel.org/show_bug.cgi?id=5533
http://bugzilla.kernel.org/show_bug.cgi?id=5586
http://bugzilla.kernel.org/show_bug.cgi?id=5589
http://bugzilla.kernel.org/show_bug.cgi?id=5798
http://bugzilla.kernel.org/show_bug.cgi?id=5863
http://bugzilla.kernel.org/show_bug.cgi?id=4968
http://bugzilla.kernel.org/show_bug.cgi?id=5047
http://bugzilla.kernel.org/show_bug.cgi?id=5905
http://bugzilla.kernel.org/show_bug.cgi?id=5596
http://bugzilla.kernel.org/show_bug.cgi?id=5654
http://bugzilla.kernel.org/show_bug.cgi?id=5664
http://bugzilla.kernel.org/show_bug.cgi?id=5700
http://bugzilla.kernel.org/show_bug.cgi?id=5709
http://bugzilla.kernel.org/show_bug.cgi?id=5721
http://bugzilla.kernel.org/show_bug.cgi?id=5722
http://bugzilla.kernel.org/show_bug.cgi?id=5922
http://bugzilla.kernel.org/show_bug.cgi?id=5789
http://bugzilla.kernel.org/show_bug.cgi?id=5931
http://bugzilla.kernel.org/show_bug.cgi?id=5969
http://bugzilla.kernel.org/show_bug.cgi?id=5948
http://bugzilla.kernel.org/show_bug.cgi?id=5987
http://bugzilla.kernel.org/show_bug.cgi?id=5995
http://bugzilla.kernel.org/show_bug.cgi?id=6173
http://bugzilla.kernel.org/show_bug.cgi?id=6207
http://bugzilla.kernel.org/show_bug.cgi?id=6240
http://bugzilla.kernel.org/show_bug.cgi?id=6253
http://bugzilla.kernel.org/show_bug.cgi?id=6260
http://bugzilla.kernel.org/show_bug.cgi?id=6272
http://bugzilla.kernel.org/show_bug.cgi?id=6283
http://bugzilla.kernel.org/show_bug.cgi?id=6311
http://bugzilla.kernel.org/show_bug.cgi?id=6317
http://bugzilla.kernel.org/show_bug.cgi?id=6346
http://bugzilla.kernel.org/show_bug.cgi?id=6470
http://bugzilla.kernel.org/show_bug.cgi?id=6056
http://bugzilla.kernel.org/show_bug.cgi?id=6494
http://bugzilla.kernel.org/show_bug.cgi?id=6516
http://bugzilla.kernel.org/show_bug.cgi?id=6521

2006-05-15 18:07:03

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Andrew Morton wrote:

We watch the bug reports, but try to get an overall picture rather than
being good about updating each bugzilla tracker promptly :/ The
tracking is quite useful though.


> http://bugzilla.kernel.org/show_bug.cgi?id=4920

not us

> http://bugzilla.kernel.org/show_bug.cgi?id=5533

addressed here

> http://bugzilla.kernel.org/show_bug.cgi?id=5586

sata_mv still considered highly experimental, as noted in Kconfig. Bugs
deferred to Mark Lord.

> http://bugzilla.kernel.org/show_bug.cgi?id=5589

funny looking backtrace, according to you

> http://bugzilla.kernel.org/show_bug.cgi?id=5798

worth looking into

> http://bugzilla.kernel.org/show_bug.cgi?id=5863

worth looking into, also worth testing with big update.

probable cause, the most recent ata_piix map value stuff.


> http://bugzilla.kernel.org/show_bug.cgi?id=4968

big update should help diagnose. might need 'nv-adma' branch to fix.

> http://bugzilla.kernel.org/show_bug.cgi?id=5047

big update should help.

> http://bugzilla.kernel.org/show_bug.cgi?id=5905

big update should help, or at least help diagnose.

> http://bugzilla.kernel.org/show_bug.cgi?id=5596

ditto sata_mv info above.

> http://bugzilla.kernel.org/show_bug.cgi?id=5654

big update should help.

> http://bugzilla.kernel.org/show_bug.cgi?id=5664

should forward to NVIDIA for help debugging. nv-adma branch may help
diagnose.

> http://bugzilla.kernel.org/show_bug.cgi?id=5700

big update should help

> http://bugzilla.kernel.org/show_bug.cgi?id=5709

trivial patch submission

> http://bugzilla.kernel.org/show_bug.cgi?id=5721

dont care

> http://bugzilla.kernel.org/show_bug.cgi?id=5722

maybe a bug, probably weird drive or media, worth looking into

> http://bugzilla.kernel.org/show_bug.cgi?id=5922

should be using ahci driver

> http://bugzilla.kernel.org/show_bug.cgi?id=5789

include in the "atapi + via problems" bucket; bucket should be looked
into... some people it works great, others not. may need
motherboard-specific (southbridge) tuning.

> http://bugzilla.kernel.org/show_bug.cgi?id=5931

big update will help

> http://bugzilla.kernel.org/show_bug.cgi?id=5969

probably just needs PCI IDs

> http://bugzilla.kernel.org/show_bug.cgi?id=5948

not libata?

> http://bugzilla.kernel.org/show_bug.cgi?id=5987

not libata

> http://bugzilla.kernel.org/show_bug.cgi?id=5995

big update + upcoming hotplug will fix. until then, don't expect
unplugging a drive to DTRT.

> http://bugzilla.kernel.org/show_bug.cgi?id=6173

need NVIDIA help. maybe nv-adma branch will help diagnose.

> http://bugzilla.kernel.org/show_bug.cgi?id=6207

ditto

> http://bugzilla.kernel.org/show_bug.cgi?id=6240

investigate. big update will help diagnose.

> http://bugzilla.kernel.org/show_bug.cgi?id=6253

big update should fix.

> http://bugzilla.kernel.org/show_bug.cgi?id=6260

waiting on SATA ACPI merge.

> http://bugzilla.kernel.org/show_bug.cgi?id=6272

big update should fix.

> http://bugzilla.kernel.org/show_bug.cgi?id=6283

investigate. big update will help.

> http://bugzilla.kernel.org/show_bug.cgi?id=6311

big update should fix.

> http://bugzilla.kernel.org/show_bug.cgi?id=6317

big update may fix.

> http://bugzilla.kernel.org/show_bug.cgi?id=6346

not libata?

> http://bugzilla.kernel.org/show_bug.cgi?id=6470

new hardware, needs driver.

> http://bugzilla.kernel.org/show_bug.cgi?id=6056

bounce to NV

> http://bugzilla.kernel.org/show_bug.cgi?id=6494

waiting on SATA ACPI

> http://bugzilla.kernel.org/show_bug.cgi?id=6516

investigate, low priority

> http://bugzilla.kernel.org/show_bug.cgi?id=6521

big update should fix

2006-05-15 18:15:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Andrew Morton wrote:
> I'd be a little concerned with that merge plan at this time - we have a lot
> of sata bug reports banked up and afaict a pretty low fixup rate. Then
> again, these patches might fix some of those bugs...
>
> I guess if we can get it all in early (which is only a couple of weeks
> away!) and you and Tejun will have time set aside to work on problems then
> OK. But....


As you can see from the list just sent, the improved error handling will
give libata much greater ability to diagnose "controller is being weird"
type situations, which is a lot of what the relevant bug reports need.

After reviewing those bug reports, I see a couple oopses -- caused by
BUG()-style code, and fixed in this update -- and one data corruption
which persists on Sil 311x on rare motherboards. The rest are either
addressed with the improved error handling, or are ATAPI + VIA AFAICS.

Jeff


2006-05-15 18:25:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik <[email protected]> wrote:
>
> Andrew Morton wrote:
> > I'd be a little concerned with that merge plan at this time - we have a lot
> > of sata bug reports banked up and afaict a pretty low fixup rate. Then
> > again, these patches might fix some of those bugs...
> >
> > I guess if we can get it all in early (which is only a couple of weeks
> > away!) and you and Tejun will have time set aside to work on problems then
> > OK. But....
>
>
> As you can see from the list just sent, the improved error handling will
> give libata much greater ability to diagnose "controller is being weird"
> type situations, which is a lot of what the relevant bug reports need.
>
> After reviewing those bug reports, I see a couple oopses -- caused by
> BUG()-style code, and fixed in this update -- and one data corruption
> which persists on Sil 311x on rare motherboards. The rest are either
> addressed with the improved error handling, or are ATAPI + VIA AFAICS.
>

ok, thanks.

Next -mm I'll suck up the libata changes, drop a pile of the hairier stuff
and I'll ask each originator to test that patchset.

2006-05-15 18:25:41

by Alan

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Llu, 2006-05-15 at 14:15 -0400, Jeff Garzik wrote:
> which persists on Sil 311x on rare motherboards. The rest are either
> addressed with the improved error handling, or are ATAPI + VIA AFAICS.

ATAPI + VIA to that pattern is also showing up on pata_via cases as
well, but only on via so far. Its as if there is a case where the IRQ of
the first command is lost sometimes.

2006-05-15 18:29:30

by Tomasz Torcz

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>
> After much development and review, I merged a massive pile of libata
> patches from Tejun Heo and Albert Lee. This update contains the
> following major libata

Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
maybe it's merged already?
Seagate firmware update seems to be available only for OEMs, so this
quirk is pretty helpful for end users.

--
Tomasz Torcz To co nierealne - tutaj jest normalne.
[email protected] Ziomale na ?ycie maj? tu patenty specjalne.


Attachments:
(No filename) (594.00 B)
(No filename) (229.00 B)
Download all attachments

2006-05-15 18:43:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tomasz Torcz wrote:
> On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>> After much development and review, I merged a massive pile of libata
>> patches from Tejun Heo and Albert Lee. This update contains the
>> following major libata
>
> Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
> maybe it's merged already?
> Seagate firmware update seems to be available only for OEMs, so this
> quirk is pretty helpful for end users.

Its a question of staging. This still lives in the 'sii-m15w' branch of
libata-dev.git, but if we throw too many _classes_ of changes into the
same big lump, then it becomes much more difficult to discern which
changes caused which failures.

Since sata_sil has seen several changes, and since the sii-m15w problems
are so difficult to diagnose properly, its easier to separate that out.

Jeff



2006-05-15 18:44:12

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Andrew Morton wrote:
> ok, thanks.
>
> Next -mm I'll suck up the libata changes, drop a pile of the hairier stuff
> and I'll ask each originator to test that patchset.

Cool. FWIW this stuff can be found in libata-dev.git#ALL as usual.

Jeff


2006-05-15 19:06:27

by Arkadiusz Miśkiewicz

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Monday 15 May 2006 20:06, Jeff Garzik wrote:

> > http://bugzilla.kernel.org/show_bug.cgi?id=6260
>
> waiting on SATA ACPI merge.
Is this really a case?

The one (layering breaking; discussed already) patch cures the problem and
nothing sata acpi related is needed, so something else is problematic here I
guess.

--- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000
+0100
+++ linux/drivers/scsi/libata-core.c 2006-04-21 20:55:48.000000000 +0100
@@ -4288,6 +4288,7 @@ int ata_device_resume(struct ata_port *a
{
if (ap->flags & ATA_FLAG_SUSPENDED) {
ap->flags &= ~ATA_FLAG_SUSPENDED;
+ ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
ata_set_mode(ap);
}
if (!ata_dev_present(dev))

--
Arkadiusz Mi?kiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

2006-05-15 19:15:14

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Also, a re-reminder:

At some convenient point, I'm going to move libata core and drivers to
new directory drivers/ata.

The other noticeable change coming down the pipe is iomap support, which
will kill those annoying warnings you see on every build (in addition to
shrinking the driver a bit).

Speak up now if there are complaints...

Jeff



2006-05-15 19:33:14

by Mark Lord

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
> Andrew Morton wrote:
..
>> http://bugzilla.kernel.org/show_bug.cgi?id=5586
>
> sata_mv still considered highly experimental, as noted in Kconfig. Bugs
> deferred to Mark Lord.
..
I think that particular bug has gone away with my internal sata_mv.c version.
I'm updating it for release on top of Jeff/Tejun's patch set, and will likely
backport the bugfixes to 2.6.16.xx as well. Timeline, this week or next.

Cheers

2006-05-15 20:45:46

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Arkadiusz Miskiewicz wrote:
> On Monday 15 May 2006 20:06, Jeff Garzik wrote:
>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=6260
>> waiting on SATA ACPI merge.
> Is this really a case?
>
> The one (layering breaking; discussed already) patch cures the problem and
> nothing sata acpi related is needed, so something else is problematic here I
> guess.
>
> --- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000
> +0100
> +++ linux/drivers/scsi/libata-core.c 2006-04-21 20:55:48.000000000 +0100
> @@ -4288,6 +4288,7 @@ int ata_device_resume(struct ata_port *a
> {
> if (ap->flags & ATA_FLAG_SUSPENDED) {
> ap->flags &= ~ATA_FLAG_SUSPENDED;
> + ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
> ata_set_mode(ap);

This libata update should address issues that the above patch also
addresses. It will be interesting to hear feedback in the coming days
on what issues remain after this big lump.

Jeff



2006-05-15 22:52:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Mark Lord wrote:
> Jeff Garzik wrote:
>> Andrew Morton wrote:
> ..
>>> http://bugzilla.kernel.org/show_bug.cgi?id=5586
>>
>> sata_mv still considered highly experimental, as noted in Kconfig.
>> Bugs deferred to Mark Lord.
> ..
> I think that particular bug has gone away with my internal sata_mv.c
> version.
> I'm updating it for release on top of Jeff/Tejun's patch set, and will
> likely
> backport the bugfixes to 2.6.16.xx as well. Timeline, this week or next.

The hotplug patches will change probing once more. So, I recommend
staying with legacy ->phy_reset mechanism for the time being unless you
are using ->probe_reset() already. However, converting from
->probe_reset() to hotplug should be very easy.

--
tejun

2006-05-15 22:57:34

by Wakko Warner

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
>
> After much development and review, I merged a massive pile of libata
> patches from Tejun Heo and Albert Lee. This update contains the
> following major libata
>
> CHANGES:
> * Rewritten error handling. This is a major piece of work, even
> though it will be rarely seen. The new libata EH provides the
> foundation for not only improved error handling, but also new features
> such as device hotplug or command queueing. (Tejun Heo)
>
> * PIO-based I/O is now IRQ-driven by default, rather than polled
> in a kernel thread. The polling path will continue to exist for
> controllers that need it, and other special cases. (Albert Lee)
>
> * Core support for command queueing (Jens Axboe, Tejun Heo)
>
> * Support for NCQ-style command queueing (Jens Axboe, Tejun Heo)
>
> * Increase max-sectors dramatically, for LBA48 devices (Tejun Heo?)
>
> * Other minor changes, from myself and others.

How about PATA? Specifically intel's IDE chip. I have a machine that I can
blow the hard drive away if I want to.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-15 23:00:50

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Wakko Warner wrote:
> Jeff Garzik wrote:
>> After much development and review, I merged a massive pile of libata
>> patches from Tejun Heo and Albert Lee. This update contains the
>> following major libata
>>
>> CHANGES:
>> * Rewritten error handling. This is a major piece of work, even
>> though it will be rarely seen. The new libata EH provides the
>> foundation for not only improved error handling, but also new features
>> such as device hotplug or command queueing. (Tejun Heo)
>>
>> * PIO-based I/O is now IRQ-driven by default, rather than polled
>> in a kernel thread. The polling path will continue to exist for
>> controllers that need it, and other special cases. (Albert Lee)
>>
>> * Core support for command queueing (Jens Axboe, Tejun Heo)
>>
>> * Support for NCQ-style command queueing (Jens Axboe, Tejun Heo)
>>
>> * Increase max-sectors dramatically, for LBA48 devices (Tejun Heo?)
>>
>> * Other minor changes, from myself and others.
>
> How about PATA? Specifically intel's IDE chip. I have a machine that I can
> blow the hard drive away if I want to.

Always helpful. ata_piix should support Intel PATA controllers, modulo
some bugs that Alan is fixing / has fixed. If your PCI ID isn't listed,
you will have to add it, and an associated info entry. Again, take a
look at Alan's libata PATA patches for guidance.

Jeff



2006-05-15 23:07:44

by Wakko Warner

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
> Wakko Warner wrote:
> >How about PATA? Specifically intel's IDE chip. I have a machine that I
> >can
> >blow the hard drive away if I want to.
>
> Always helpful. ata_piix should support Intel PATA controllers, modulo
> some bugs that Alan is fixing / has fixed. If your PCI ID isn't listed,
> you will have to add it, and an associated info entry. Again, take a
> look at Alan's libata PATA patches for guidance.

Do I need his patches as well? If so, where do I retrieve them? I lost the
url for it.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-15 23:19:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Wakko Warner wrote:
> Jeff Garzik wrote:
>> Wakko Warner wrote:
>>> How about PATA? Specifically intel's IDE chip. I have a machine that I
>>> can
>>> blow the hard drive away if I want to.
>> Always helpful. ata_piix should support Intel PATA controllers, modulo
>> some bugs that Alan is fixing / has fixed. If your PCI ID isn't listed,
>> you will have to add it, and an associated info entry. Again, take a
>> look at Alan's libata PATA patches for guidance.
>
> Do I need his patches as well? If so, where do I retrieve them? I lost the
> url for it.

Me too, hopefully he'll chime in. In any case, it's highly likely that
things will work out of the box.

Jeff



2006-05-15 23:26:40

by Alan

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Llu, 2006-05-15 at 19:02 -0400, Wakko Warner wrote:
> How about PATA? Specifically intel's IDE chip. I have a machine that I can
> blow the hard drive away if I want to.

Give the patch on zeniv.linux.org.uk/~alan/IDE a go in that case and let
me know how it behaves.

Alan

2006-05-15 23:28:04

by Alan

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Llu, 2006-05-15 at 19:00 -0400, Jeff Garzik wrote:
> Always helpful. ata_piix should support Intel PATA controllers, modulo
> some bugs that Alan is fixing / has fixed. If your PCI ID isn't listed,
> you will have to add it, and an associated info entry. Again, take a
> look at Alan's libata PATA patches for guidance.

Without the patches I've got everything non ATAPI should work (ATAPI
will I think 99% work) and anything that is ICH or later (UDMA66 or
higher) should behave correctly.

PIIX/MPIIX won't work with it, and UDMA33 chips may work providing the
scribbles to the wrong register happen to be harmless.

Alan

2006-05-15 23:30:29

by Avuton Olrich

[permalink] [raw]
Subject: Re: [RFT] major libata update

On 5/15/06, Jeff Garzik <[email protected]> wrote:
> * sata_sil and ata_piix also need healthy re-testing of all basic
> functionality.

I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:

May 15 15:42:57 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:42:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:42:57 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:42:57 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:42:57 shapeshifter sda: Current: sense key=0xb
May 15 15:42:57 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:42:57 shapeshifter end_request: I/O error, dev sda, sector 974708575

(sector varies)

After large ssh transfers. I moved to 2.6.17-rc4-git2 because
2.6.16.16 was doing the same. This is a new 500gb sata2 drive on
sata_sil so I guess this could be hardware, but I wanted to make sure
before I go returning this thing. After this obviously I have to sysrq
sync, ro and reboot. This also causes(?) a NETDEV WATCHDOG: eth2:
transmit timed out, sometimes this ata timeout doesn't yet occur and I
just get the netdev watchdog. This has not yet happened with the new
patch, though I'm only 1 hr into testing with it.


If you want to take a peek at my config:
http://olricha.homelinux.net:8080/ss.config
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2006-05-15 23:32:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
> Tomasz Torcz wrote:
>> On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>>> After much development and review, I merged a massive pile of libata
>>> patches from Tejun Heo and Albert Lee. This update contains the
>>> following major libata
>>
>> Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
>> maybe it's merged already?
>> Seagate firmware update seems to be available only for OEMs, so this
>> quirk is pretty helpful for end users.
>
> Its a question of staging. This still lives in the 'sii-m15w' branch of
> libata-dev.git, but if we throw too many _classes_ of changes into the
> same big lump, then it becomes much more difficult to discern which
> changes caused which failures.
>
> Since sata_sil has seen several changes, and since the sii-m15w problems
> are so difficult to diagnose properly, its easier to separate that out.

Are you planning on merging sil_m15w workaround?

FYI, from the first time it was submitted (last summer) till 2.6.16, it
took very little effort to maintain it. The current big update would
necessitate some changes to it but I don't think it will be too much
work. My experience says m15w doesn't add too much maintenance overhead.

Also, what's the merge plan for hotplug/PM? Together into 2.6.18? Or
are we looking further down?

--
tejun

2006-05-15 23:36:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>> * sata_sil and ata_piix also need healthy re-testing of all basic
>> functionality.
>
> I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:
>
> May 15 15:42:57 shapeshifter ata2: command 0x25 timeout, stat 0x58
> host_stat 0x1
> May 15 15:42:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
> SCSI SK/ASC/ASCQ 0xb/47/00
> May 15 15:42:57 shapeshifter ata2: status=0x58 { DriveReady
> SeekComplete DataRequest }
> May 15 15:42:57 shapeshifter sd 1:0:0:0: SCSI error: return code =
> 0x8000002
> May 15 15:42:57 shapeshifter sda: Current: sense key=0xb
> May 15 15:42:57 shapeshifter ASC=0x47 ASCQ=0x0
> May 15 15:42:57 shapeshifter end_request: I/O error, dev sda, sector
> 974708575

2.6.17-rc4-git2 doesn't contain the changes. You're still using old EH.
To test the updates, pull #upstream from libata-dev git tree which can
be found on http://kernel.org/git. Otherwise, you can try
libata-tj-stable patch over 2.6.16.16 located at

http://home-tj.org/wiki/index.php/Libata-tj-stable.

--
tejun

2006-05-15 23:42:29

by Wakko Warner

[permalink] [raw]
Subject: Re: [RFT] major libata update

Alan Cox wrote:
> On Llu, 2006-05-15 at 19:02 -0400, Wakko Warner wrote:
> > How about PATA? Specifically intel's IDE chip. I have a machine that I can
> > blow the hard drive away if I want to.
>
> Give the patch on zeniv.linux.org.uk/~alan/IDE a go in that case and let
> me know how it behaves.

I noticed one hunk failed with 2.6.17-rc4 when using
patch-2.6.17-rc3-ide2.gz

It was only the version string so I should be ok. As I said, if it blows up
on me, that's ok.

I attempted to patch Jeff's libata1 over top of this, it failed miserably.

When I patched Jeff's libata1 over 2.6.17-rc4, it was ok, except for 2
files:
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n]
Skipping patch.
32 out of 32 hunks ignored -- saving rejects to file drivers/scsi/ahci.c.rej
patching file drivers/scsi/ata_piix.c
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n]
Skipping patch.
5 out of 5 hunks ignored -- saving rejects to file
drivers/scsi/ata_piix.c.rej




If you're curious, alan's patch + jeff's patch:
patching file drivers/scsi/Makefile
Hunk #1 succeeded at 201 (offset 37 lines).
patching file drivers/scsi/ahci.c
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n]
Skipping patch.
32 out of 32 hunks ignored -- saving rejects to file drivers/scsi/ahci.c.rej
patching file drivers/scsi/ata_piix.c
Hunk #1 FAILED at 93.
Hunk #2 succeeded at 299 with fuzz 2 (offset 56 lines).
Hunk #3 succeeded at 335 with fuzz 2 (offset 61 lines).
Hunk #4 succeeded at 700 (offset 210 lines).
Hunk #5 succeeded at 805 (offset 234 lines).
1 out of 5 hunks FAILED -- saving rejects to file drivers/scsi/ata_piix.c.rej
patching file drivers/scsi/libata-bmdma.c
patching file drivers/scsi/libata-core.c
Hunk #23 succeeded at 1503 (offset 6 lines).
Hunk #24 succeeded at 1573 (offset 6 lines).
Hunk #25 succeeded at 1587 (offset 6 lines).
Hunk #26 succeeded at 1624 (offset 6 lines).
Hunk #27 succeeded at 1644 (offset 6 lines).
Hunk #28 succeeded at 1674 (offset 6 lines).
Hunk #29 succeeded at 1713 (offset 6 lines).
Hunk #30 succeeded at 1979 (offset 6 lines).
Hunk #31 succeeded at 2218 (offset 6 lines).
Hunk #32 succeeded at 2228 (offset 6 lines).
Hunk #33 succeeded at 2321 (offset 6 lines).
Hunk #34 succeeded at 2348 (offset 6 lines).
Hunk #35 succeeded at 2416 (offset 6 lines).
Hunk #36 succeeded at 2425 (offset 6 lines).
Hunk #37 succeeded at 2463 (offset 6 lines).
Hunk #38 succeeded at 2493 (offset 6 lines).
Hunk #39 succeeded at 2501 (offset 6 lines).
Hunk #40 succeeded at 2519 (offset 6 lines).
Hunk #41 succeeded at 2537 (offset 6 lines).
Hunk #42 succeeded at 2549 (offset 6 lines).
Hunk #43 succeeded at 2628 (offset 6 lines).
Hunk #44 succeeded at 2687 (offset 6 lines).
Hunk #45 succeeded at 2695 (offset 6 lines).
Hunk #46 succeeded at 2720 (offset 6 lines).
Hunk #47 succeeded at 2744 (offset 6 lines).
Hunk #48 succeeded at 2759 (offset 6 lines).
Hunk #49 succeeded at 2830 (offset 6 lines).
Hunk #50 succeeded at 2850 (offset 6 lines).
Hunk #51 succeeded at 2874 (offset 6 lines).
Hunk #52 succeeded at 2886 (offset 6 lines).
Hunk #53 succeeded at 2992 (offset 6 lines).
Hunk #54 FAILED at 3006.
Hunk #55 succeeded at 3068 (offset 1 line).
Hunk #56 succeeded at 3076 (offset 1 line).
Hunk #57 succeeded at 3091 (offset 1 line).
Hunk #58 succeeded at 3101 (offset 1 line).
Hunk #59 succeeded at 3114 (offset 1 line).
Hunk #60 succeeded at 3259 (offset 1 line).
Hunk #61 succeeded at 3496 (offset 1 line).
Hunk #62 succeeded at 3634 (offset -28 lines).
Hunk #63 FAILED at 3659.
Hunk #64 succeeded at 3761 with fuzz 1 (offset -28 lines).
Hunk #65 succeeded at 3786 (offset -28 lines).
Hunk #66 FAILED at 3813.
Hunk #67 FAILED at 3849.
Hunk #68 succeeded at 4207 (offset -27 lines).
Hunk #69 succeeded at 4226 (offset -27 lines).
Hunk #70 succeeded at 4269 (offset -27 lines).
Hunk #71 succeeded at 4450 (offset -27 lines).
Hunk #72 succeeded at 4515 (offset -27 lines).
Hunk #73 succeeded at 4643 (offset -27 lines).
Hunk #74 succeeded at 4711 (offset -27 lines).
Hunk #75 succeeded at 4749 (offset -27 lines).
Hunk #76 succeeded at 4764 (offset -27 lines).
Hunk #77 succeeded at 4937 (offset -27 lines).
Hunk #78 succeeded at 4959 (offset -27 lines).
Hunk #79 succeeded at 5112 (offset -23 lines).
Hunk #80 succeeded at 5126 (offset -23 lines).
Hunk #81 succeeded at 5193 (offset -23 lines).
Hunk #82 succeeded at 5266 (offset -21 lines).
Hunk #83 succeeded at 5318 (offset -15 lines).
Hunk #84 succeeded at 5413 (offset -15 lines).
Hunk #85 succeeded at 5577 (offset -15 lines).
Hunk #86 succeeded at 5636 (offset -15 lines).
Hunk #87 succeeded at 5661 (offset -13 lines).
Hunk #88 succeeded at 5682 (offset -13 lines).
Hunk #89 succeeded at 5721 (offset -13 lines).
4 out of 89 hunks FAILED -- saving rejects to file drivers/scsi/libata-core.c.rej
patching file drivers/scsi/libata-eh.c
patching file drivers/scsi/libata-scsi.c
patching file drivers/scsi/libata.h
patching file drivers/scsi/pdc_adma.c
patching file drivers/scsi/sata_mv.c
Hunk #2 succeeded at 682 (offset 2 lines).
Hunk #3 succeeded at 1311 (offset 2 lines).
Hunk #4 succeeded at 1398 (offset 2 lines).
Hunk #5 succeeded at 1419 (offset 2 lines).
Hunk #6 succeeded at 1936 (offset 2 lines).
Hunk #7 succeeded at 1962 (offset 2 lines).
Hunk #8 succeeded at 1994 (offset 2 lines).
Hunk #9 succeeded at 2025 (offset 2 lines).
patching file drivers/scsi/sata_nv.c
Hunk #2 succeeded at 280 (offset 1 line).
patching file drivers/scsi/sata_promise.c
Hunk #2 succeeded at 438 (offset 2 lines).
Hunk #3 succeeded at 446 (offset 2 lines).
Hunk #4 succeeded at 537 (offset 2 lines).
Hunk #5 succeeded at 680 (offset 2 lines).
patching file drivers/scsi/sata_qstor.c
Hunk #2 succeeded at 176 (offset 1 line).
Hunk #3 succeeded at 395 (offset 1 line).
Hunk #4 succeeded at 428 (offset 1 line).
patching file drivers/scsi/sata_sil.c
Hunk #3 FAILED at 176.
Hunk #4 succeeded at 269 (offset 1 line).
Hunk #5 succeeded at 320 (offset 1 line).
Hunk #6 succeeded at 393 (offset 1 line).
Hunk #7 succeeded at 417 (offset 1 line).
Hunk #8 succeeded at 507 (offset 1 line).
1 out of 8 hunks FAILED -- saving rejects to file drivers/scsi/sata_sil.c.rej
patching file drivers/scsi/sata_sil24.c
Hunk #9 succeeded at 388 with fuzz 2 (offset 1 line).
Hunk #10 succeeded at 415 (offset 1 line).
Hunk #11 succeeded at 425 (offset 1 line).
Hunk #12 succeeded at 434 (offset 1 line).
Hunk #13 succeeded at 442 (offset 1 line).
Hunk #14 succeeded at 510 (offset 1 line).
Hunk #15 succeeded at 580 (offset 1 line).
Hunk #16 succeeded at 661 (offset 1 line).
Hunk #17 succeeded at 687 (offset 1 line).
Hunk #18 succeeded at 699 (offset 1 line).
Hunk #19 succeeded at 709 (offset 1 line).
Hunk #20 succeeded at 729 (offset 1 line).
Hunk #21 succeeded at 890 (offset 1 line).
Hunk #22 succeeded at 903 (offset 1 line).
Hunk #23 succeeded at 940 (offset 1 line).
Hunk #24 succeeded at 1004 (offset 1 line).
Hunk #25 succeeded at 1057 (offset 1 line).
Hunk #26 succeeded at 1115 (offset 1 line).
Hunk #27 succeeded at 1137 (offset 1 line).
patching file drivers/scsi/sata_sis.c
patching file drivers/scsi/sata_svw.c
patching file drivers/scsi/sata_sx4.c
Hunk #2 succeeded at 219 (offset 1 line).
Hunk #3 succeeded at 834 (offset 1 line).
Hunk #4 succeeded at 869 (offset 1 line).
Hunk #5 succeeded at 1377 (offset 1 line).
patching file drivers/scsi/sata_uli.c
patching file drivers/scsi/sata_via.c
patching file drivers/scsi/sata_vsc.c
patching file drivers/scsi/scsi.c
patching file drivers/scsi/scsi_error.c
patching file drivers/scsi/scsi_lib.c
patching file drivers/scsi/scsi_priv.h
patching file include/linux/ata.h
patching file include/linux/libata.h
Hunk #2 succeeded at 45 with fuzz 2.
Hunk #4 FAILED at 123.
Hunk #5 succeeded at 140 (offset 1 line).
Hunk #6 succeeded at 223 (offset 1 line).
Hunk #7 succeeded at 284 (offset 1 line).
Hunk #8 succeeded at 375 (offset 1 line).
Hunk #9 succeeded at 387 (offset 1 line).
Hunk #10 succeeded at 424 (offset 1 line).
Hunk #11 succeeded at 466 (offset 1 line).
Hunk #12 succeeded at 489 (offset 1 line).
Hunk #13 succeeded at 531 (offset 3 lines).
Hunk #14 succeeded at 585 (offset 3 lines).
Hunk #15 succeeded at 610 (offset 3 lines).
Hunk #16 FAILED at 666.
Hunk #17 succeeded at 735 (offset 7 lines).
Hunk #18 succeeded at 821 (offset 7 lines).
Hunk #19 succeeded at 934 (offset 7 lines).
Hunk #20 succeeded at 977 (offset 7 lines).
Hunk #21 succeeded at 1060 (offset 7 lines).
Hunk #22 succeeded at 1097 (offset 7 lines).
2 out of 22 hunks FAILED -- saving rejects to file include/linux/libata.h.rej
patching file include/scsi/scsi_cmnd.h
patching file include/scsi/scsi_eh.h
patching file include/scsi/scsi_host.h


--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-15 23:45:09

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Wakko Warner wrote:
> Alan Cox wrote:
>> On Llu, 2006-05-15 at 19:02 -0400, Wakko Warner wrote:
>>> How about PATA? Specifically intel's IDE chip. I have a machine that I can
>>> blow the hard drive away if I want to.
>> Give the patch on zeniv.linux.org.uk/~alan/IDE a go in that case and let
>> me know how it behaves.
>
> I noticed one hunk failed with 2.6.17-rc4 when using
> patch-2.6.17-rc3-ide2.gz
>
> It was only the version string so I should be ok. As I said, if it blows up
> on me, that's ok.
>
> I attempted to patch Jeff's libata1 over top of this, it failed miserably.
>
> When I patched Jeff's libata1 over 2.6.17-rc4, it was ok, except for 2
> files:

oh, its almost certain the patches will conflict. My suggestion was
more of a reference-<here>-if-things-break.

Jeff



2006-05-15 23:45:26

by Wakko Warner

[permalink] [raw]
Subject: Re: [RFT] major libata update

Alan Cox wrote:
> On Llu, 2006-05-15 at 19:00 -0400, Jeff Garzik wrote:
> > Always helpful. ata_piix should support Intel PATA controllers, modulo
> > some bugs that Alan is fixing / has fixed. If your PCI ID isn't listed,
> > you will have to add it, and an associated info entry. Again, take a
> > look at Alan's libata PATA patches for guidance.
>
> Without the patches I've got everything non ATAPI should work (ATAPI
> will I think 99% work) and anything that is ICH or later (UDMA66 or
> higher) should behave correctly.
>
> PIIX/MPIIX won't work with it, and UDMA33 chips may work providing the
> scribbles to the wrong register happen to be harmless.

The test machine I have is this:
# lspci -v -s 7.1
0000:00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 80 [Master])
Flags: bus master, medium devsel, latency 64
I/O ports at 10a0 [size=16]

# lspci -v -s 7.1 -n
0000:00:07.1 0101: 8086:7111 (rev 01) (prog-if 80 [Master])
Flags: bus master, medium devsel, latency 64
I/O ports at 10a0 [size=16]

#

If this won't work, let me know.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-15 23:49:42

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> Jeff Garzik wrote:
>> Tomasz Torcz wrote:
>>> On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>>>> After much development and review, I merged a massive pile of libata
>>>> patches from Tejun Heo and Albert Lee. This update contains the
>>>> following major libata
>>>
>>> Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
>>> maybe it's merged already?
>>> Seagate firmware update seems to be available only for OEMs, so this
>>> quirk is pretty helpful for end users.
>>
>> Its a question of staging. This still lives in the 'sii-m15w' branch
>> of libata-dev.git, but if we throw too many _classes_ of changes into
>> the same big lump, then it becomes much more difficult to discern
>> which changes caused which failures.
>>
>> Since sata_sil has seen several changes, and since the sii-m15w
>> problems are so difficult to diagnose properly, its easier to separate
>> that out.
>
> Are you planning on merging sil_m15w workaround?

Yes, but after 2.6.18.


> FYI, from the first time it was submitted (last summer) till 2.6.16, it
> took very little effort to maintain it. The current big update would
> necessitate some changes to it but I don't think it will be too much
> work. My experience says m15w doesn't add too much maintenance overhead.

Its actively maintained in the 'sii-m15w' branch of libata-dev.git.


> Also, what's the merge plan for hotplug/PM? Together into 2.6.18? Or
> are we looking further down?

Hotplug is reasonable for 2.6.18, but after that its getting a bit much.
We need to have some reasonable testing points in the midst of all
this development :) I'm happy to maintain an upstream-2.6.19 branch for
such things, though. I use tiered branches anyway.

Jeff



2006-05-15 23:54:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>> * sata_sil and ata_piix also need healthy re-testing of all basic
>> functionality.
>
> I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:

Testing what? sata_sil? Please provide full dmesg, there's a lot of
missing information.


> May 15 15:42:57 shapeshifter ata2: command 0x25 timeout, stat 0x58
> host_stat 0x1
> May 15 15:42:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
> SCSI SK/ASC/ASCQ 0xb/47/00
> May 15 15:42:57 shapeshifter ata2: status=0x58 { DriveReady
> SeekComplete DataRequest }
> May 15 15:42:57 shapeshifter sd 1:0:0:0: SCSI error: return code =
> 0x8000002
> May 15 15:42:57 shapeshifter sda: Current: sense key=0xb
> May 15 15:42:57 shapeshifter ASC=0x47 ASCQ=0x0
> May 15 15:42:57 shapeshifter end_request: I/O error, dev sda, sector
> 974708575
>
> (sector varies)
>
> After large ssh transfers. I moved to 2.6.17-rc4-git2 because
> 2.6.16.16 was doing the same. This is a new 500gb sata2 drive on
> sata_sil so I guess this could be hardware, but I wanted to make sure
> before I go returning this thing. After this obviously I have to sysrq
> sync, ro and reboot. This also causes(?) a NETDEV WATCHDOG: eth2:
> transmit timed out, sometimes this ata timeout doesn't yet occur and I
> just get the netdev watchdog. This has not yet happened with the new
> patch, though I'm only 1 hr into testing with it.

Yes, its entirely possible that the new patch will address this. Please
do keep us posted.

Thanks,

Jeff



2006-05-16 00:04:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
> Tejun Heo wrote:
>> Jeff Garzik wrote:
>>> Tomasz Torcz wrote:
>>>> On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>>>>> After much development and review, I merged a massive pile of libata
>>>>> patches from Tejun Heo and Albert Lee. This update contains the
>>>>> following major libata
>>>>
>>>> Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
>>>> maybe it's merged already?
>>>> Seagate firmware update seems to be available only for OEMs, so this
>>>> quirk is pretty helpful for end users.
>>>
>>> Its a question of staging. This still lives in the 'sii-m15w' branch
>>> of libata-dev.git, but if we throw too many _classes_ of changes into
>>> the same big lump, then it becomes much more difficult to discern
>>> which changes caused which failures.
>>>
>>> Since sata_sil has seen several changes, and since the sii-m15w
>>> problems are so difficult to diagnose properly, its easier to
>>> separate that out.
>>
>> Are you planning on merging sil_m15w workaround?
>
> Yes, but after 2.6.18.

Cool.

>> FYI, from the first time it was submitted (last summer) till 2.6.16,
>> it took very little effort to maintain it. The current big update
>> would necessitate some changes to it but I don't think it will be too
>> much work. My experience says m15w doesn't add too much maintenance
>> overhead.
>
> Its actively maintained in the 'sii-m15w' branch of libata-dev.git.
>

I have been maintaining my own. :) BTW, with 2.6.16, m15_cxt has to
move from qc->private_data to ap->private_data.

>
>> Also, what's the merge plan for hotplug/PM? Together into 2.6.18? Or
>> are we looking further down?
>
> Hotplug is reasonable for 2.6.18, but after that its getting a bit much.
> We need to have some reasonable testing points in the midst of all this
> development :) I'm happy to maintain an upstream-2.6.19 branch for such
> things, though. I use tiered branches anyway.

Good enough for me. I want to see hotplug in 2.6.18 but link/PM stuff
can definitely wait for 2.6.19.

--
tejun

2006-05-16 00:08:23

by Avuton Olrich

[permalink] [raw]
Subject: Re: [RFT] major libata update

On 5/15/06, Jeff Garzik <[email protected]> wrote:
> Avuton Olrich wrote:
> > On 5/15/06, Jeff Garzik <[email protected]> wrote:
> >> * sata_sil and ata_piix also need healthy re-testing of all basic
> >> functionality.
> >
> > I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:
>
> Testing what? sata_sil? Please provide full dmesg, there's a lot of
> missing information.

sata_sil, sorry, I thought I provided a good subset of the timeout message:

May 15 15:41:27 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:41:27 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:41:27 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:41:27 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:41:27 shapeshifter sda: Current: sense key=0xb
May 15 15:41:27 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:41:27 shapeshifter end_request: I/O error, dev sda, sector 974708551
May 15 15:41:57 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:41:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:41:57 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:41:57 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:41:57 shapeshifter sda: Current: sense key=0xb
May 15 15:41:57 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:41:57 shapeshifter end_request: I/O error, dev sda, sector 974708559
May 15 15:42:27 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:42:27 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:42:27 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:42:27 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:42:27 shapeshifter sda: Current: sense key=0xb
May 15 15:42:27 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:42:27 shapeshifter end_request: I/O error, dev sda, sector 974708567
May 15 15:42:57 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:42:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:42:57 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:42:57 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:42:57 shapeshifter sda: Current: sense key=0xb
May 15 15:42:57 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:42:57 shapeshifter end_request: I/O error, dev sda, sector 974708575
May 15 15:43:27 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:43:27 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:43:27 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:43:27 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:43:27 shapeshifter sda: Current: sense key=0xb
May 15 15:43:27 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:43:27 shapeshifter end_request: I/O error, dev sda, sector 974708583
May 15 15:43:57 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:43:57 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:43:57 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:43:57 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:43:57 shapeshifter sda: Current: sense key=0xb
May 15 15:43:57 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:43:57 shapeshifter end_request: I/O error, dev sda, sector 974708591
May 15 15:44:02 shapeshifter SysRq : Emergency Sync
May 15 15:44:02 shapeshifter Emergency Sync complete
May 15 15:44:27 shapeshifter ata2: command 0x25 timeout, stat 0x58 host_stat 0x1
May 15 15:44:27 shapeshifter ata2: translated ATA stat/err 0x58/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
May 15 15:44:27 shapeshifter ata2: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 15:44:27 shapeshifter sd 1:0:0:0: SCSI error: return code = 0x8000002
May 15 15:44:27 shapeshifter sda: Current: sense key=0xb
May 15 15:44:27 shapeshifter ASC=0x47 ASCQ=0x0
May 15 15:44:27 shapeshifter end_request: I/O error, dev sda, sector 974708599
May 15 15:44:35 shapeshifter NETDEV WATCHDOG: eth2: transmit timed out

> > After large ssh transfers. I moved to 2.6.17-rc4-git2 because
> > 2.6.16.16 was doing the same. This is a new 500gb sata2 drive on
> > sata_sil so I guess this could be hardware, but I wanted to make sure
> > before I go returning this thing. After this obviously I have to sysrq
> > sync, ro and reboot. This also causes(?) a NETDEV WATCHDOG: eth2:
> > transmit timed out, sometimes this ata timeout doesn't yet occur and I
> > just get the netdev watchdog. This has not yet happened with the new
> > patch, though I'm only 1 hr into testing with it.
>
> Yes, its entirely possible that the new patch will address this. Please
> do keep us posted.
>
> Thanks,
>
> Jeff

OK, upon further testing I believe the patch helps out tremendously, I
don't get the timeout message (yet), though I still get a netdev
watchdog. I've gotten this with two different ethernet ports/drivers
so I believe this not to be due to the ethernet driver.

Sample dmesg output:
NETDEV WATCHDOG: eth2: transmit timed out

The time that it actually takes to happen is variable, though it
hasn't happened to me in under 20 minutes yet.

--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2006-05-16 02:15:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> Jeff Garzik wrote:
>>>> Tomasz Torcz wrote:
>>>>> On Mon, May 15, 2006 at 01:00:06PM -0400, Jeff Garzik wrote:
>>>>>> After much development and review, I merged a massive pile of libata
>>>>>> patches from Tejun Heo and Albert Lee. This update contains the
>>>>>> following major libata
>>>>>
>>>>> Any plans to merge http://home-tj.org/wiki/index.php/Sil_m15w ? Or
>>>>> maybe it's merged already?
>>>>> Seagate firmware update seems to be available only for OEMs, so this
>>>>> quirk is pretty helpful for end users.
>>>>
>>>> Its a question of staging. This still lives in the 'sii-m15w'
>>>> branch of libata-dev.git, but if we throw too many _classes_ of
>>>> changes into the same big lump, then it becomes much more difficult
>>>> to discern which changes caused which failures.
>>>>
>>>> Since sata_sil has seen several changes, and since the sii-m15w
>>>> problems are so difficult to diagnose properly, its easier to
>>>> separate that out.
>>>
>>> Are you planning on merging sil_m15w workaround?
>>
>> Yes, but after 2.6.18.
>
> Cool.
>
>>> FYI, from the first time it was submitted (last summer) till 2.6.16,
>>> it took very little effort to maintain it. The current big update
>>> would necessitate some changes to it but I don't think it will be too
>>> much work. My experience says m15w doesn't add too much maintenance
>>> overhead.
>>
>> Its actively maintained in the 'sii-m15w' branch of libata-dev.git.
>>
>
> I have been maintaining my own. :) BTW, with 2.6.16, m15_cxt has to
> move from qc->private_data to ap->private_data.

Okay, we've been talking about different things. You're talking about
excluding non-affected drives from m15w blacklist while I'm talking
about the handle-large-writes-by-qc-rewrite m15w workaround. The URL
Tomasz Torcz wrote contains the workaround.

--
tejun

2006-05-16 03:36:53

by Avuton Olrich

[permalink] [raw]
Subject: Re: [RFT] major libata update

On 5/15/06, Jeff Garzik <[email protected]> wrote:
> Avuton Olrich wrote:
> > On 5/15/06, Jeff Garzik <[email protected]> wrote:
> >> * sata_sil and ata_piix also need healthy re-testing of all basic
> >> functionality.
> >
> > I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:
>
> Testing what? sata_sil? Please provide full dmesg, there's a lot of
> missing information.

More followup, it did finally error out on me:

Not sure if it helps any, but this is a sata2 disk with a sata
interface. This is rc4-git2 with the libata patch from the beginning
of this thread, using sata_sil.

dmesg:
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: limiting speed to UDMA/66
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: limiting speed to UDMA/44
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/44
ata1: EH complete
ata1.00: limiting speed to UDMA/33
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/33
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 703661647
ata1: EH complete
ata1.00: limiting speed to UDMA/25
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/25
ata1: EH complete
NETDEV WATCHDOG: eth2: transmit timed out
ata1.00: limiting speed to UDMA/16
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/16
ata1: EH complete
ata1.00: limiting speed to PIO4
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for PIO4
ata1: EH complete
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
Flags; bus-master 1, dirty 18790(6) current 18806(6)
Transmit list 37e3c5c0 vs. f7e3c5c0.
0: @f7e3c200 length 8000002a status 0000002a
1: @f7e3c2a0 length 8000002a status 0000002a
2: @f7e3c340 length 8000002a status 0000002a
3: @f7e3c3e0 length 8000002a status 0000002a
4: @f7e3c480 length 8000002a status 8000002a
5: @f7e3c520 length 8000002a status 8000002a
6: @f7e3c5c0 length 8000005f status 0000005f
7: @f7e3c660 length 8000005f status 0000005f
8: @f7e3c700 length 8000002a status 0000002a
9: @f7e3c7a0 length 8000002a status 0000002a
10: @f7e3c840 length 8000002a status 0000002a
11: @f7e3c8e0 length 8000002a status 0000002a
12: @f7e3c980 length 8000002a status 0000002a
13: @f7e3ca20 length 8000002a status 0000002a
14: @f7e3cac0 length 8000002a status 0000002a
15: @f7e3cb60 length 8000002a status 0000002a
eth0: Resetting the Tx ring pointer.

lspci -vvv:
01:0b.0 RAID bus controller: Silicon Image, Inc. SiI 3112
[SATALink/SATARaid] Serial ATA Controller (rev 01)
Subsystem: Silicon Image, Inc. SiI 3112 SATARaid Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size 08
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at 9c00 [size=8]
Region 1: I/O ports at a000 [size=4]
Region 2: I/O ports at a400 [size=8]
Region 3: I/O ports at a800 [size=4]
Region 4: I/O ports at ac00 [size=16]
Region 5: Memory at e5006000 (32-bit, non-prefetchable)
[size=512]
[virtual] Expansion ROM at e6800000 [disabled] [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2006-05-16 03:51:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>> Avuton Olrich wrote:
>> > On 5/15/06, Jeff Garzik <[email protected]> wrote:
>> >> * sata_sil and ata_piix also need healthy re-testing of all basic
>> >> functionality.
>> >
>> > I'm testing it right now, but with 2.6.17-rc4-git2 I was getting:
>>
>> Testing what? sata_sil? Please provide full dmesg, there's a lot of
>> missing information.
>
> More followup, it did finally error out on me:
>
> Not sure if it helps any, but this is a sata2 disk with a sata
> interface. This is rc4-git2 with the libata patch from the beginning
> of this thread, using sata_sil.

Can you configure your interrupts so that ethernet and SATA are not on
the same irq?

Also, please provide _full_ dmesg and _full_ lspci, not just the
SATA-related stuff. This looks motherboard- or hardware-related.

Jeff



2006-05-16 03:55:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
[--snip--]
> ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for UDMA/25
> ata1: EH complete
> NETDEV WATCHDOG: eth2: transmit timed out
> ata1.00: limiting speed to UDMA/16
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
> ata1.00: (BMDMA stat 0x1)
> ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for UDMA/16
> ata1: EH complete
> ata1.00: limiting speed to PIO4
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x3 frozen
> ata1.00: (BMDMA stat 0x1)
> ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for PIO4
> ata1: EH complete

Are those timeouts back-to-back? Can you post dmesg w/ timestamp
(either turn on kernel message timestamping or simply post relevant part
from /var/log/kern.log). The drive thinks the command is complete. You
might be losing interrupts (you might want to diddle with acpi/irq
routing stuff) or it could be some other hardware problem.

Does the drive + controller work okay on Windows? I know people don't
like this question so much but it's a great way to isolate hardware
problems as they use completely different driver stack.

And, as show above, currently implemented speed down is way to
simplistic. We need a better speed-down sequence, but I guess that can
wait for a bit.

> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: transmit timed out, tx_status 00 status e000.
> diagnostics: net 0cc0 media 8080 dma 000000a0 fifo 8800
> Flags; bus-master 1, dirty 18790(6) current 18806(6)
> Transmit list 37e3c5c0 vs. f7e3c5c0.
> 0: @f7e3c200 length 8000002a status 0000002a
> 1: @f7e3c2a0 length 8000002a status 0000002a
> 2: @f7e3c340 length 8000002a status 0000002a
> 3: @f7e3c3e0 length 8000002a status 0000002a
> 4: @f7e3c480 length 8000002a status 8000002a
> 5: @f7e3c520 length 8000002a status 8000002a
> 6: @f7e3c5c0 length 8000005f status 0000005f
> 7: @f7e3c660 length 8000005f status 0000005f
> 8: @f7e3c700 length 8000002a status 0000002a
> 9: @f7e3c7a0 length 8000002a status 0000002a
> 10: @f7e3c840 length 8000002a status 0000002a
> 11: @f7e3c8e0 length 8000002a status 0000002a
> 12: @f7e3c980 length 8000002a status 0000002a
> 13: @f7e3ca20 length 8000002a status 0000002a
> 14: @f7e3cac0 length 8000002a status 0000002a
> 15: @f7e3cb60 length 8000002a status 0000002a
> eth0: Resetting the Tx ring pointer.
> NETDEV WATCHDOG: eth0: transmit timed out

Increased transmit timeout is probably because the CPU is locked up
performing PIOs. I worry about this. With irq-pio, the system stutters
much more. It might be better to perform the actual PIO part from a
workqueue. But then there are controllers which can't stand when CPU
leaves it unattended while PIO is in progress...

--
tejun

2006-05-16 04:33:05

by Avuton Olrich

[permalink] [raw]
Subject: Re: [RFT] major libata update

On 5/15/06, Jeff Garzik <[email protected]> wrote:


> Can you configure your interrupts so that ethernet and SATA are not on
> the same irq?

Sorry, need a little hand holding here. I'm unsure how to do such a
thing, and can't really google that.

> Also, please provide _full_ dmesg and _full_ lspci, not just the
> SATA-related stuff. This looks motherboard- or hardware-related.

kern.log is my last 24 hrs, contains everything that came into the kern.log
http://olricha.homelinux.net:8080/dump/kern.log
http://olricha.homelinux.net:8080/dump/lspci-vvv

If there's anything else I can do please let me know
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2006-05-16 04:37:54

by Avuton Olrich

[permalink] [raw]
Subject: Re: [RFT] major libata update

On 5/15/06, Tejun Heo <[email protected]> wrote:
> Avuton Olrich wrote:

> Are those timeouts back-to-back? Can you post dmesg w/ timestamp
> (either turn on kernel message timestamping or simply post relevant part
> from /var/log/kern.log). The drive thinks the command is complete. You
> might be losing interrupts (you might want to diddle with acpi/irq
> routing stuff) or it could be some other hardware problem.

here's the kern log for the last 24 hours or so:
http://olricha.homelinux.net:8080/dump/kern.log

As I told Jeff, I'm not sure how to diddle with the irq stuff, pointers?

> Does the drive + controller work okay on Windows? I know people don't
> like this question so much but it's a great way to isolate hardware
> problems as they use completely different driver stack.

Sorry, I haven't owned a copy of Windows in >5 yrs, I would be willing
to try otherwise. This computer worked with a 2 160gb sata drives,
when I traded 2 160gb drives with 2 500gb sata2 drives and started
making heavy use of them this happened, although I haven't had any
issue with the other hard drive yet (I don't think, I need to look
over the logs again to make sure I'm not saying that in err).

...snipped more stuff way above my head...
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2006-05-16 05:34:19

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>
>> Can you configure your interrupts so that ethernet and SATA are not on
>> the same irq?
>
> Sorry, need a little hand holding here. I'm unsure how to do such a
> thing, and can't really google that.

If they are both onboard devices, there is probably no way to do this.
However, from the lspci output it doesn't appear they are on the same
IRQ anyway.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-05-16 14:57:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFT] major libata update



On Mon, 15 May 2006, Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>
> > Can you configure your interrupts so that ethernet and SATA are not on
> > the same irq?
>
> Sorry, need a little hand holding here. I'm unsure how to do such a
> thing, and can't really google that.

Before you do that, try this patch (that I suggested to Neil Brown in a
totally unrelated thread) just for fun.

Linus

----
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index 06dab00..49b9fea 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -880,6 +880,7 @@ static int pcibios_lookup_irq(struct pci
((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask)) ) {
DBG(" -> got IRQ %d\n", irq);
msg = "Found";
+ eisa_set_level_irq(irq);
} else if (newirq && r->set && (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
DBG(" -> assigning IRQ %d", newirq);
if (r->set(pirq_router_dev, dev, pirq, newirq)) {

2006-05-16 15:02:52

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Avuton Olrich wrote:
> On 5/15/06, Jeff Garzik <[email protected]> wrote:
>
>
>> Can you configure your interrupts so that ethernet and SATA are not on
>> the same irq?
>
> Sorry, need a little hand holding here. I'm unsure how to do such a
> thing, and can't really google that.

It will be in your BIOS setup somewhere. Hopefully.

Jeff



2006-05-17 02:08:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik <[email protected]> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

This adds a minute and a half to my bootup times :(


[ 43.275044] SCSI subsystem initialized
[ 43.335092] libata version 1.30 loaded.
[ 43.336142] ata_piix 0000:00:1f.2: version 1.10
[ 43.336146] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
[ 43.426691] ACPI (acpi_bus-0191): Device is not power manageable [20060310]
[ 43.559872] ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
[ 43.681530] PCI: Setting latency timer of device 0000:00:1f.2 to 64
[ 43.681564] ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
[ 43.777123] ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
[ 44.123919] ata1.00: cfg 49:2f00 82:346b 83:7fe9 84:4773 85:3469 86:3c01 87:4763 88:407f
[ 44.123924] ata1.00: ATA-7, max UDMA/133, 321672960 sectors: LBA48 NCQ (depth 0/32)
[ 44.216325] ata1.00: configured for UDMA/133
[ 44.267296] scsi0 : ata_piix
[ 44.301660] PM: Adding info for No Bus:host0
[ 44.719422] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
[ 44.719425] ata2.00: ATAPI, max UDMA/66
[ 44.765263] ata2.00: applying bridge limits
[ 74.928836] ata2.01: qc timeout (cmd 0xa1)
[ 74.977811] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 75.468853] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
[ 75.468856] ata2.00: ATAPI, max UDMA/66
[ 75.514678] ata2.00: applying bridge limits
[ 105.674130] ata2.01: qc timeout (cmd 0xa1)
[ 105.723044] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 106.210113] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
[ 106.210117] ata2.00: ATAPI, max UDMA/66
[ 106.255906] ata2.00: applying bridge limits
[ 136.415532] ata2.01: qc timeout (cmd 0xa1)
[ 136.464452] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 136.537214] ata2.01: limiting speed to PIO0
[ 136.587141] ata2.01: disabled
[ 137.039723] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
[ 137.039726] ata2.00: ATAPI, max UDMA/66
[ 137.085492] ata2.00: applying bridge limits
[ 137.292543] ata2.00: configured for UDMA/66
[ 137.342488] scsi1 : ata_piix
[ 137.376847] PM: Adding info for No Bus:host1
[ 137.376892] PM: Adding info for No Bus:target0:0:0
[ 137.376948] Vendor: ATA Model: HDT722516DLA380 Rev: V43O
[ 137.453962] Type: Direct-Access ANSI SCSI revision: 05
[ 137.541468] PM: Adding info for scsi:0:0:0:0
[ 137.541547] SCSI device sda: 321672960 512-byte hdwr sectors (164697 MB)
[ 137.621598] sda: Write Protect is off
[ 137.665289] sda: Mode Sense: 00 3a 00 00
[ 137.665306] SCSI device sda: drive cache: write back
[ 137.724592] SCSI device sda: 321672960 512-byte hdwr sectors (164697 MB)
[ 137.804697] sda: Write Protect is off
[ 137.848386] sda: Mode Sense: 00 3a 00 00
[ 137.848398] SCSI device sda: drive cache: write back
[ 137.907659] sda: sda1 sda2 sda3
[ 137.957184] sd 0:0:0:0: Attached scsi disk sda
[ 138.010252] PM: Adding info for No Bus:target1:0:0
[ 138.010869] Vendor: PLEXTOR Model: DVDR PX-716A Rev: 1.09
[ 138.087949] Type: CD-ROM ANSI SCSI revision: 05
[ 138.175466] PM: Adding info for scsi:1:0:0:0
[ 138.175531] ata_piix 0000:00:1f.5: MAP [ P0 P2 P1 P3 ]
[ 138.237446] ACPI (acpi_bus-0191): Device is not power manageable [20060310]
[ 138.320799] ACPI: PCI Interrupt 0000:00:1f.5[A] -> GSI 19 (level, low) -> IRQ 19
[ 138.409254] PCI: Setting latency timer of device 0000:00:1f.5 to 64
[ 138.409276] ata3: SATA max UDMA/133 cmd 0x2138 ctl 0x2176 bmdma 0x20F0 irq 19
[ 138.494586] ata4: SATA max UDMA/133 cmd 0x2130 ctl 0x2172 bmdma 0x20F8 irq 19
[ 139.899612] scsi2 : ata_piix
[ 139.933979] PM: Adding info for No Bus:host2
[ 141.251424] scsi3 : ata_piix
[ 141.285781] PM: Adding info for No Bus:host3
[ 141.366280] EXT3-fs: INFO: recovery required on readonly filesystem.

2006-05-17 04:49:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Hello, Andrew.

Andrew Morton wrote:
[--snip--]
> [ 44.719422] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
> [ 44.719425] ata2.00: ATAPI, max UDMA/66
> [ 44.765263] ata2.00: applying bridge limits
> [ 74.928836] ata2.01: qc timeout (cmd 0xa1)
> [ 74.977811] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 75.468853] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
> [ 75.468856] ata2.00: ATAPI, max UDMA/66
> [ 75.514678] ata2.00: applying bridge limits
> [ 105.674130] ata2.01: qc timeout (cmd 0xa1)

Did this device work with previous versions of kernel?

libata used to give up on the first failure during probe, so the boot
time would have been shorter in failure cases. I think controlled
retries during boot probe is a good thing, but the timeout of 30s for
IDENTIFY commands can be shortened, I guess.

--
tejun

2006-05-17 04:59:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo <[email protected]> wrote:
>
> Hello, Andrew.
>
> Andrew Morton wrote:
> [--snip--]
> > [ 44.719422] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
> > [ 44.719425] ata2.00: ATAPI, max UDMA/66
> > [ 44.765263] ata2.00: applying bridge limits
> > [ 74.928836] ata2.01: qc timeout (cmd 0xa1)
> > [ 74.977811] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [ 75.468853] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
> > [ 75.468856] ata2.00: ATAPI, max UDMA/66
> > [ 75.514678] ata2.00: applying bridge limits
> > [ 105.674130] ata2.01: qc timeout (cmd 0xa1)
>
> Did this device work with previous versions of kernel?

No. In fact, it doesn't even work with the 2.6.17-rc4-mm1 lineup plus the
latest git-libata-all. It needs this tweak:

--- devel/drivers/scsi/ata_piix.c~2.6.17-rc4-mm1-ich8-fix 2006-05-16 18:36:12.000000000 -0700
+++ devel-akpm/drivers/scsi/ata_piix.c 2006-05-16 18:36:12.000000000 -0700
@@ -542,6 +542,14 @@ static unsigned int piix_sata_probe (str
port = map[base + i];
if (port < 0)
continue;
+ if (ap->flags & PIIX_FLAG_AHCI) {
+ /* FIXME: Port status of AHCI controllers
+ * should be accessed in AHCI memory space. */
+ if (pcs & 1 << port)
+ present_mask |= 1 << i;
+ else
+ pcs &= ~(1 << port);
+ }
if (ap->flags & PIIX_FLAG_IGNORE_PCS || pcs & 1 << (4 + port))
present_mask |= 1 << i;
else
_


> libata used to give up on the first failure during probe, so the boot
> time would have been shorter in failure cases.

I don't recall anyone complaining?

> I think controlled
> retries during boot probe is a good thing, but the timeout of 30s for
> IDENTIFY commands can be shortened, I guess.

We should do something, please. It'll hurt kernel developers the most.

2006-05-17 05:14:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Andrew Morton wrote:
> Tejun Heo <[email protected]> wrote:
>> Hello, Andrew.
>>
>> Andrew Morton wrote:
>> [--snip--]
>>> [ 44.719422] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
>>> [ 44.719425] ata2.00: ATAPI, max UDMA/66
>>> [ 44.765263] ata2.00: applying bridge limits
>>> [ 74.928836] ata2.01: qc timeout (cmd 0xa1)
>>> [ 74.977811] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 75.468853] ata2.00: cfg 49:0f00 82:0000 83:0000 84:0000 85:0000 86:0000 87:0000 88:101f
>>> [ 75.468856] ata2.00: ATAPI, max UDMA/66
>>> [ 75.514678] ata2.00: applying bridge limits
>>> [ 105.674130] ata2.01: qc timeout (cmd 0xa1)
>> Did this device work with previous versions of kernel?
>
> No. In fact, it doesn't even work with the 2.6.17-rc4-mm1 lineup plus the
> latest git-libata-all. It needs this tweak:
>
> --- devel/drivers/scsi/ata_piix.c~2.6.17-rc4-mm1-ich8-fix 2006-05-16 18:36:12.000000000 -0700
> +++ devel-akpm/drivers/scsi/ata_piix.c 2006-05-16 18:36:12.000000000 -0700
> @@ -542,6 +542,14 @@ static unsigned int piix_sata_probe (str
> port = map[base + i];
> if (port < 0)
> continue;
> + if (ap->flags & PIIX_FLAG_AHCI) {
> + /* FIXME: Port status of AHCI controllers
> + * should be accessed in AHCI memory space. */
> + if (pcs & 1 << port)
> + present_mask |= 1 << i;
> + else
> + pcs &= ~(1 << port);
> + }
> if (ap->flags & PIIX_FLAG_IGNORE_PCS || pcs & 1 << (4 + port))
> present_mask |= 1 << i;
> else
> _

Ah.. I see. This is the ata_piix ghosting problem where signature of
the first device is duplicated in the second device causing libata to
probe the second non-existent device.

>> libata used to give up on the first failure during probe, so the boot
>> time would have been shorter in failure cases.
>
> I don't recall anyone complaining?

One of sata_via + ATAPI probing problem might have been fixed by this.
It still needs to be investigated further though.

>> I think controlled
>> retries during boot probe is a good thing, but the timeout of 30s for
>> IDENTIFY commands can be shortened, I guess.
>
> We should do something, please. It'll hurt kernel developers the most.

I think the correct solution would be fixing the ghosting problem of the
controller. I'll look into it.

--
tejun

2006-05-17 06:35:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> Andrew Morton wrote:
>> No. In fact, it doesn't even work with the 2.6.17-rc4-mm1 lineup plus
>> the
>> latest git-libata-all. It needs this tweak:
>>
>> --- devel/drivers/scsi/ata_piix.c~2.6.17-rc4-mm1-ich8-fix
>> 2006-05-16 18:36:12.000000000 -0700
>> +++ devel-akpm/drivers/scsi/ata_piix.c 2006-05-16
>> 18:36:12.000000000 -0700
>> @@ -542,6 +542,14 @@ static unsigned int piix_sata_probe (str
>> port = map[base + i];
>> if (port < 0)
>> continue;
>> + if (ap->flags & PIIX_FLAG_AHCI) {
>> + /* FIXME: Port status of AHCI controllers
>> + * should be accessed in AHCI memory space. */
>> + if (pcs & 1 << port)
>> + present_mask |= 1 << i;
>> + else
>> + pcs &= ~(1 << port);
>> + }
>> if (ap->flags & PIIX_FLAG_IGNORE_PCS || pcs & 1 << (4 + port))
>> present_mask |= 1 << i;
>> else

The above patch doesn't do anything. The only effect it has is setting
present_mask according to enabled bits instead of present bits. I think
this patch might have helped with probing before the MAP tables for
ICH6/7 are fixed.

I've done further testing.

* Symptom

ata_piix tries to probe non-existing slave device resulting in timeouts
during boot probing. This problem is aggravated by new probing updates
as it retries two more times before giving up.

* Test results

PATA never has any problem with device detection via signature. Only
SATA is affected and interestingly only ATAPI device. The following is
the test result on my machine (ICH7R + PX716SA).

1. combined mode : MAP [IDE IDE P1 P3]

P1 P3
-----------------------------
PX716-SA empty P3 ghosted as ATAPI device
empty PX716-SA okay
PX716-SA HDD okay
HDD PX716-SA okay

2. SATA-only mode : MAP [P0 P2 P1 P3]

P0 P2
-----------------------------
PX716-SA empty P2 ghosted as ATAPI device
empty PX716-SA okay
PX716-SA HDD okay
HDD PX716-SA okay

P1 P3
-----------------------------
Identical to #1.

To sum up, it happens when the master slot is occupied by an ATAPI
device and the corresponding slave slot is empty. The slave slot
reports ATAPI signature (probably duplicated from the master) and passes
all legacy presence test thus resulting in timeout on IDENTIFY.

In all above cases, the PCS register reported correct presence masks.

* Proposed solution

It seems that the only solution is to make use of the PCS presence bits
somehow. It is know that 6300ESB family of controllers have flaky
presence bits (ata_piix marks them with PIIX_FLAG_IGNORE_PCS), but I
couldn't find any document/errata for PCS bits for any other
controllers. So, we can use PCS for all !PIIX_FLAG_IGNORE_PCS
controllers or take a conservative approach and make use of it only on
cases where ghosting problem is reported (ICH7 and 8, I guess. Can
anyone test 6?).

Please note that we already use some use of the PCS value when probing
SATA port. If its value is zero, we skip the port. It's done this way
mainly due to historical reasons - until recently ata_piix didn't have
MAP tables to map PM/PS/SM/SS to specific ports thus used the PCS values
in rougher form.

Jeff, what do you think?

--
tejun

2006-05-17 07:35:45

by Matthieu CASTET

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Llu, 2006-05-15 at 14:15 -0400, Jeff Garzik wrote:
> > which persists on Sil 311x on rare motherboards. The rest are either
> > addressed with the improved error handling, or are ATAPI + VIA AFAICS.

> ATAPI + VIA to that pattern is also showing up on pata_via cases as
> well, but only on via so far. Its as if there is a case where the IRQ of
> the first command is lost sometimes.
I investigated further the problem. For my pata case the interrupt go
through idle_irq in ata_host_intr.
If I put a printk before ata_host_intr in ata_interrupt there no problem.

I bet "altstatus" or "main status" return ATA_BUSY and are not cleared
enough fast by the hardware.

I will try others tests this evening.


Matthieu


2006-05-17 15:25:52

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [RFT] major libata update

Linus Torvalds <[email protected]> writes:

> On Mon, 15 May 2006, Avuton Olrich wrote:
>
> diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
> index 06dab00..49b9fea 100644
> --- a/arch/i386/pci/irq.c
> +++ b/arch/i386/pci/irq.c
> @@ -880,6 +880,7 @@ static int pcibios_lookup_irq(struct pci
> ((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask)) ) {
> DBG(" -> got IRQ %d\n", irq);
> msg = "Found";
> + eisa_set_level_irq(irq);
> } else if (newirq && r->set && (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
> DBG(" -> assigning IRQ %d", newirq);
> if (r->set(pirq_router_dev, dev, pirq, newirq)) {

I like it. I'd like to put this type stuff (fixes setting of 8259,
APIC, chipset, etc.) into pci...
--
OGAWA Hirofumi <[email protected]>

2006-05-17 23:40:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFT] major libata update



On Thu, 18 May 2006, OGAWA Hirofumi wrote:

> Linus Torvalds <[email protected]> writes:
>
> > On Mon, 15 May 2006, Avuton Olrich wrote:
> >
> > diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
> > index 06dab00..49b9fea 100644
> > --- a/arch/i386/pci/irq.c
> > +++ b/arch/i386/pci/irq.c
> > @@ -880,6 +880,7 @@ static int pcibios_lookup_irq(struct pci
> > ((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask)) ) {
> > DBG(" -> got IRQ %d\n", irq);
> > msg = "Found";
> > + eisa_set_level_irq(irq);
> > } else if (newirq && r->set && (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
> > DBG(" -> assigning IRQ %d", newirq);
> > if (r->set(pirq_router_dev, dev, pirq, newirq)) {
>
> I like it. I'd like to put this type stuff (fixes setting of 8259,
> APIC, chipset, etc.) into pci...

Andrew, can you put the one-liner into -mm and see if it gathers any
reports?

I think Neil already reported that it fixed a lost interrupt problem for
him, but I'm worried that it might result in interrupt storms for others.

In particular, I have this pretty strong memory that we tried to do
something like this a long time ago, and it caused problems at least
with the legacy ISA/ATA interrupts (irq 14/15).

On the other hand, my memory is pretty damn bad at times, and besides, I
hope that that "hardcoded" case just above it is the one that takes care
of the old ATA interrupts. This is one of those times when the only
guaranteed right thing to do would be to be bug-for-bug compatible with
whatever crud MS-Win does..

Linus

2006-05-17 23:48:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Linus Torvalds wrote:
> In particular, I have this pretty strong memory that we tried to do
> something like this a long time ago, and it caused problems at least
> with the legacy ISA/ATA interrupts (irq 14/15).
>
> On the other hand, my memory is pretty damn bad at times, and besides, I
> hope that that "hardcoded" case just above it is the one that takes care
> of the old ATA interrupts. This is one of those times when the only
> guaranteed right thing to do would be to be bug-for-bug compatible with
> whatever crud MS-Win does..

Many BIOS ACPI tables from years ago simply _assumed_ that you have
hardcoded irq 14/15, even... Their irq descriptors for 14/15 would be
absent or completely non-functional.

Or maybe its the $pirq table I'm recalling. One of the two, anyway.

Jeff



2006-05-17 23:50:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFT] major libata update



On Wed, 17 May 2006, Linus Torvalds wrote:
>
> I think Neil already reported that it fixed a lost interrupt problem for
> him, but I'm worried that it might result in interrupt storms for others.

Of course, we could just decide to try to switch from level to
edge-triggered if the irq storm thing ever triggers. Right now we disable
the irq entirely, which means that _if_ it was just due to a polarity
error, we're screwed even if it should have been easy to fix by just
turning it into edge-high.

The code to do that should be trivial: make __report_bad_irq() try to
switch to edge mode if it's not edge already. Hmm?

Linus

2006-05-18 00:36:35

by Brown, Len

[permalink] [raw]
Subject: RE: [RFT] major libata update


>Many BIOS ACPI tables from years ago simply _assumed_ that you have
>hardcoded irq 14/15, even... Their irq descriptors for 14/15 would be
>absent or completely non-functional.
>
>Or maybe its the $pirq table I'm recalling. One of the two, anyway.

For x86, the ACPI interrupt configuration process is to identity-map
the IOAPIC entries below 16 1:1 PIC:IOAPIC,
unless there are interrupt source overrides
(such as commonly done to swizzle IRQ0 from a different pin)

This makes legacy-mode ATA happy. Hard code ATA to 14/15 and
off you go.

But there is a gray area where the ATA controller registers
as a PCI device, but Linux goes off and looks in the ACPI PRT
for that PCI-dev and finds no entry. So if you didn't
have the hard-coded 14/15, you'd be dead.

Then there are cases where the PRT specifies something
_other_ than 14/15 for ATA, and in that cases the hard-coded
default is the wrong thing to do; and the workaround is
to use BIOS SETUP options to be sure that ATA is set up
in legacy mode.

I suspect Linux could be smarter here. The 14/15 should be
the backup default for when the tables
don't give us anything else; not the only option.

-Len

2006-05-18 01:36:35

by Alan

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Mer, 2006-05-17 at 19:48 -0400, Jeff Garzik wrote:
> Many BIOS ACPI tables from years ago simply _assumed_ that you have
> hardcoded irq 14/15, even... Their irq descriptors for 14/15 would be
> absent or completely non-functional.

For $PIR this is correct the IRQ 14/15 from the IDE controllers in
legacy mode is an ISA IRQ not a PCI one. Welcome to the happy fun
compatibility factory.

2006-05-18 11:24:14

by Albert Lee

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> Tejun Heo wrote:
>
>> Andrew Morton wrote:
>>
>>> No. In fact, it doesn't even work with the 2.6.17-rc4-mm1 lineup
>>> plus the
>>> latest git-libata-all. It needs this tweak:
>>>
>>> --- devel/drivers/scsi/ata_piix.c~2.6.17-rc4-mm1-ich8-fix
>>> 2006-05-16 18:36:12.000000000 -0700
>>> +++ devel-akpm/drivers/scsi/ata_piix.c 2006-05-16
>>> 18:36:12.000000000 -0700
>>> @@ -542,6 +542,14 @@ static unsigned int piix_sata_probe (str
>>> port = map[base + i];
>>> if (port < 0)
>>> continue;
>>> + if (ap->flags & PIIX_FLAG_AHCI) {
>>> + /* FIXME: Port status of AHCI controllers
>>> + * should be accessed in AHCI memory space. */
>>> + if (pcs & 1 << port)
>>> + present_mask |= 1 << i;
>>> + else
>>> + pcs &= ~(1 << port);
>>> + }
>>> if (ap->flags & PIIX_FLAG_IGNORE_PCS || pcs & 1 << (4 + port))
>>> present_mask |= 1 << i;
>>> else
>
>
> The above patch doesn't do anything. The only effect it has is setting
> present_mask according to enabled bits instead of present bits. I think
> this patch might have helped with probing before the MAP tables for
> ICH6/7 are fixed.
>
> I've done further testing.
>
> * Symptom
>
> ata_piix tries to probe non-existing slave device resulting in timeouts
> during boot probing. This problem is aggravated by new probing updates
> as it retries two more times before giving up.
>
> * Test results
>
> PATA never has any problem with device detection via signature. Only
> SATA is affected and interestingly only ATAPI device. The following is
> the test result on my machine (ICH7R + PX716SA).
>
> 1. combined mode : MAP [IDE IDE P1 P3]
>
> P1 P3
> -----------------------------
> PX716-SA empty P3 ghosted as ATAPI device
> empty PX716-SA okay
> PX716-SA HDD okay
> HDD PX716-SA okay
>
> 2. SATA-only mode : MAP [P0 P2 P1 P3]
>
> P0 P2
> -----------------------------
> PX716-SA empty P2 ghosted as ATAPI device
> empty PX716-SA okay
> PX716-SA HDD okay
> HDD PX716-SA okay
>
> P1 P3
> -----------------------------
> Identical to #1.
>
> To sum up, it happens when the master slot is occupied by an ATAPI
> device and the corresponding slave slot is empty. The slave slot
> reports ATAPI signature (probably duplicated from the master) and passes
> all legacy presence test thus resulting in timeout on IDENTIFY.
>

This problem was seen with PATA Promise 20275 adapter + IBM DVD-RAM drive.
Single master device configuration, no slave device.
The master device acts as slave and creates a phantom slave device.
(http://marc.theaimsgroup.com/?l=linux-ide&m=113151315602979&w=2)

The problem was later fixed by Tejun's ata_exec_internal() patch:
(http://marc.theaimsgroup.com/?l=linux-ide&m=113455450809405&w=2)
After the patch, the phantom device is finally detected by ata_dev_identify().

Libata uses polling PIO for IDENTIFY DEVICE before this major update.
The polling PIO finds something wrong when it reads a 0x00 device status.
So, the phantom device is detected quite quickly.

With irq-driven PIO, maybe the phantom device is only detected after time-out.
So it takes longer (30 secs) to detect the phantom device.

No good idea how to fix this. Maybe read more registers to see whether the
phantom device can be detected early before the IDENTIFY DEVICE.

--
albert


> In all above cases, the PCS register reported correct presence masks.
>
> * Proposed solution
>
> It seems that the only solution is to make use of the PCS presence bits
> somehow. It is know that 6300ESB family of controllers have flaky
> presence bits (ata_piix marks them with PIIX_FLAG_IGNORE_PCS), but I
> couldn't find any document/errata for PCS bits for any other
> controllers. So, we can use PCS for all !PIIX_FLAG_IGNORE_PCS
> controllers or take a conservative approach and make use of it only on
> cases where ghosting problem is reported (ICH7 and 8, I guess. Can
> anyone test 6?).
>
> Please note that we already use some use of the PCS value when probing
> SATA port. If its value is zero, we skip the port. It's done this way
> mainly due to historical reasons - until recently ata_piix didn't have
> MAP tables to map PM/PS/SM/SS to specific ports thus used the PCS values
> in rougher form.
>
> Jeff, what do you think?
>


2006-05-18 11:33:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Albert Lee wrote:
>> To sum up, it happens when the master slot is occupied by an ATAPI
>> device and the corresponding slave slot is empty. The slave slot
>> reports ATAPI signature (probably duplicated from the master) and passes
>> all legacy presence test thus resulting in timeout on IDENTIFY.
>>
>
> This problem was seen with PATA Promise 20275 adapter + IBM DVD-RAM drive.
> Single master device configuration, no slave device.
> The master device acts as slave and creates a phantom slave device.
> (http://marc.theaimsgroup.com/?l=linux-ide&m=113151315602979&w=2)
>
> The problem was later fixed by Tejun's ata_exec_internal() patch:
> (http://marc.theaimsgroup.com/?l=linux-ide&m=113455450809405&w=2)
> After the patch, the phantom device is finally detected by ata_dev_identify().
>
> Libata uses polling PIO for IDENTIFY DEVICE before this major update.
> The polling PIO finds something wrong when it reads a 0x00 device status.
> So, the phantom device is detected quite quickly.
>
> With irq-driven PIO, maybe the phantom device is only detected after time-out.
> So it takes longer (30 secs) to detect the phantom device.
>
> No good idea how to fix this. Maybe read more registers to see whether the
> phantom device can be detected early before the IDENTIFY DEVICE.
>

Does the Promise controller show the ghosting problem again with the
recent updates? ata_piix can be fixed by using PCS present bits. I
don't know about Promise though.

--
tejun

2006-05-18 23:08:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo <[email protected]> wrote:
>
> Tejun Heo wrote:
> > Andrew Morton wrote:
> >> No. In fact, it doesn't even work with the 2.6.17-rc4-mm1 lineup plus
> >> the
> >> latest git-libata-all. It needs this tweak:
> >>
> >> --- devel/drivers/scsi/ata_piix.c~2.6.17-rc4-mm1-ich8-fix
> >> 2006-05-16 18:36:12.000000000 -0700
> >> +++ devel-akpm/drivers/scsi/ata_piix.c 2006-05-16
> >> 18:36:12.000000000 -0700
> >> @@ -542,6 +542,14 @@ static unsigned int piix_sata_probe (str
> >> port = map[base + i];
> >> if (port < 0)
> >> continue;
> >> + if (ap->flags & PIIX_FLAG_AHCI) {
> >> + /* FIXME: Port status of AHCI controllers
> >> + * should be accessed in AHCI memory space. */
> >> + if (pcs & 1 << port)
> >> + present_mask |= 1 << i;
> >> + else
> >> + pcs &= ~(1 << port);
> >> + }
> >> if (ap->flags & PIIX_FLAG_IGNORE_PCS || pcs & 1 << (4 + port))
> >> present_mask |= 1 << i;
> >> else
>
> The above patch doesn't do anything.

Yes it does. I dropped it and got

SCSI subsystem initialized
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI (acpi_bus-0191): Device is not power manageable [20060310]
ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
ata1: SATA port has no device.

Then I undropped it and got

SCSI subsystem initialized
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI (acpi_bus-0191): Device is not power manageable [20060310]
ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
ata1.00: ATA-7, max UDMA/133, 321672960 sectors: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
scsi0 : ata_piix

and a computer which boots.

Look closer, please ;)

2006-05-19 01:14:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

On Thu, May 18, 2006 at 04:07:58PM -0700, Andrew Morton wrote:
>
> Yes it does. I dropped it and got
>
> SCSI subsystem initialized
> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
> ata1: SATA port has no device.
>
> Then I undropped it and got
>
> SCSI subsystem initialized
> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
> ata1.00: ATA-7, max UDMA/133, 321672960 sectors: LBA48 NCQ (depth 0/32)
> ata1.00: configured for UDMA/133
> scsi0 : ata_piix
>
> and a computer which boots.
>
> Look closer, please ;)

Hello, Andrew.

I see. It seems that you're reporting two separate problems - your
PCS register doesn't report presence properly && the TF registers
report ghost device if the first device is ATAPI. I can reproduce the
second here, but AFAIK the only controller which had problem with PCS
persence bits was ESB6300 until now.

Can you post the result of 'lspci -n' and ata_piix boot probing
messages with the following patch applied? It would be helpful if you
tell us how devices are actually connected. Also, where did the patch
come from? With what comment?

Thanks.

diff --git a/drivers/scsi/ata_piix.c b/drivers/scsi/ata_piix.c
index e3184a7..4ba943e 100644
--- a/drivers/scsi/ata_piix.c
+++ b/drivers/scsi/ata_piix.c
@@ -523,7 +523,7 @@ static unsigned int piix_sata_probe (str
u8 pcs;

pci_read_config_byte(pdev, ICH5_PCS, &pcs);
- DPRINTK("ata%u: ENTER, pcs=0x%x base=%d\n", ap->id, pcs, base);
+ printk("ata%u: ENTER, pcs=0x%x base=%d\n", ap->id, pcs, base);

/* enable all ports on this ap and wait for them to settle */
for (i = 0; i < 2; i++) {
@@ -552,7 +552,7 @@ static unsigned int piix_sata_probe (str
if (!(ap->flags & PIIX_FLAG_AHCI))
pci_write_config_byte(pdev, ICH5_PCS, pcs);

- DPRINTK("ata%u: LEAVE, pcs=0x%x present_mask=0x%x\n",
+ printk("ata%u: LEAVE, pcs=0x%x present_mask=0x%x\n",
ap->id, pcs, present_mask);

return present_mask;

2006-05-19 02:06:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> On Thu, May 18, 2006 at 04:07:58PM -0700, Andrew Morton wrote:
>> Yes it does. I dropped it and got
>>
>> SCSI subsystem initialized
>> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
>> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
>> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
>> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
>> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
>> ata1: SATA port has no device.
>>
>> Then I undropped it and got
>>
>> SCSI subsystem initialized
>> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
>> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
>> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
>> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
>> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
>> ata1.00: ATA-7, max UDMA/133, 321672960 sectors: LBA48 NCQ (depth 0/32)
>> ata1.00: configured for UDMA/133
>> scsi0 : ata_piix
>>
>> and a computer which boots.
>>
>> Look closer, please ;)
>
> Hello, Andrew.
>
> I see. It seems that you're reporting two separate problems - your
> PCS register doesn't report presence properly && the TF registers
> report ghost device if the first device is ATAPI. I can reproduce the
> second here, but AFAIK the only controller which had problem with PCS
> persence bits was ESB6300 until now.
>
> Can you post the result of 'lspci -n' and ata_piix boot probing
> messages with the following patch applied? It would be helpful if you
> tell us how devices are actually connected. Also, where did the patch
> come from? With what comment?

At this point it may be relevant to note that Intel tells me that PCS
has changed on -every- chip. So, ICH8 PCS register behaves differently
from ICH7 and prior.

Jeff



2006-05-19 02:16:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Jeff Garzik wrote:
> Tejun Heo wrote:
>> On Thu, May 18, 2006 at 04:07:58PM -0700, Andrew Morton wrote:
>>> Yes it does. I dropped it and got
>>>
>>> SCSI subsystem initialized
>>> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
>>> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
>>> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
>>> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
>>> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
>>> ata1: SATA port has no device.
>>>
>>> Then I undropped it and got
>>>
>>> SCSI subsystem initialized
>>> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
>>> ACPI (acpi_bus-0191): Device is not power manageable [20060310]
>>> ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19
>>> ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
>>> ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
>>> ata1.00: ATA-7, max UDMA/133, 321672960 sectors: LBA48 NCQ (depth 0/32)
>>> ata1.00: configured for UDMA/133
>>> scsi0 : ata_piix
>>>
>>> and a computer which boots.
>>>
>>> Look closer, please ;)
>>
>> Hello, Andrew.
>>
>> I see. It seems that you're reporting two separate problems - your
>> PCS register doesn't report presence properly && the TF registers
>> report ghost device if the first device is ATAPI. I can reproduce the
>> second here, but AFAIK the only controller which had problem with PCS
>> persence bits was ESB6300 until now.
>>
>> Can you post the result of 'lspci -n' and ata_piix boot probing
>> messages with the following patch applied? It would be helpful if you
>> tell us how devices are actually connected. Also, where did the patch
>> come from? With what comment?
>
> At this point it may be relevant to note that Intel tells me that PCS
> has changed on -every- chip. So, ICH8 PCS register behaves differently
> from ICH7 and prior.

Yeah, the PCS bit is sad to look at. From ICH6, the docs say that the
AHCI SStatus should be used for presence detection but that cannot be
done without AHCI BAR mapped.

ICH7 (or 6 was it?) added a window register into AHCI area, which
weirdly cannot be used without actually enabling AHCI BAR - what's the
point? My suspicion is that they designed it to work without AHCI BAR
enabled but maybe some revisions screwed up and ICH7 still had PCS
presence bits working, so the weird result.

I don't have ICH8 docs but, again, my guess is that they've got the
window register correct this time and determined to screw PCS presence
bits. All these suspicions and guesses need verification, but if my
guesses are right, the solution would be...

* leave anything order than ICH6 as it is now
* trust PCS presence bits for ICH6/7
* use AHCI window register to access SStatus on ICH8

--
tejun

2006-05-19 10:37:11

by Albert Lee

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> Albert Lee wrote:
>
>>> To sum up, it happens when the master slot is occupied by an ATAPI
>>> device and the corresponding slave slot is empty. The slave slot
>>> reports ATAPI signature (probably duplicated from the master) and passes
>>> all legacy presence test thus resulting in timeout on IDENTIFY.
>>>
>>
>> This problem was seen with PATA Promise 20275 adapter + IBM DVD-RAM
>> drive.
>> Single master device configuration, no slave device.
>> The master device acts as slave and creates a phantom slave device.
>> (http://marc.theaimsgroup.com/?l=linux-ide&m=113151315602979&w=2)
>>
>> The problem was later fixed by Tejun's ata_exec_internal() patch:
>> (http://marc.theaimsgroup.com/?l=linux-ide&m=113455450809405&w=2)
>> After the patch, the phantom device is finally detected by
>> ata_dev_identify().
>>
>> Libata uses polling PIO for IDENTIFY DEVICE before this major update.
>> The polling PIO finds something wrong when it reads a 0x00 device status.
>> So, the phantom device is detected quite quickly.
>>
>> With irq-driven PIO, maybe the phantom device is only detected after
>> time-out.
>> So it takes longer (30 secs) to detect the phantom device.
>>
>> No good idea how to fix this. Maybe read more registers to see whether
>> the
>> phantom device can be detected early before the IDENTIFY DEVICE.
>>
>
> Does the Promise controller show the ghosting problem again with the
> recent updates? ata_piix can be fixed by using PCS present bits. I
> don't know about Promise though.
>

Checked the Promise 20275 manual, no device present bits.

It seems we still need IDENTIFY DEVICE to identify the phantom slave.
The IDE code uses polling for IDENTIFY DEVICE. (libata did the same.)
Maybe we can also use polling for IDENTIFY DEVICE?

Could you try the attached patch to see if polling helps
to reduce the boot time? Thanks.

--
albert
(Need some time to find the specific IBM DVD-RAM drive for bug verification...)

--- upstream0/drivers/scsi/libata-core.c 2006-05-16 11:08:49.000000000 +0800
+++ 300_phantom_device/drivers/scsi/libata-core.c 2006-05-19 17:37:23.000000000 +0800
@@ -1194,6 +1194,9 @@ static int ata_dev_read_id(struct ata_de

tf.protocol = ATA_PROT_PIO;

+ /* Use polling for early detection of phantom device 1 */
+ tf.flags |= ATA_TFLAG_POLLING;
+
err_mask = ata_exec_internal(dev, &tf, NULL, DMA_FROM_DEVICE,
id, sizeof(id[0]) * ATA_ID_WORDS);
if (err_mask) {

2006-05-19 11:03:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFT] major libata update

Albert Lee wrote:
> Checked the Promise 20275 manual, no device present bits.
>
> It seems we still need IDENTIFY DEVICE to identify the phantom slave.
> The IDE code uses polling for IDENTIFY DEVICE. (libata did the same.)
> Maybe we can also use polling for IDENTIFY DEVICE?
>
> Could you try the attached patch to see if polling helps
> to reduce the boot time? Thanks.
>
> --
> albert
> (Need some time to find the specific IBM DVD-RAM drive for bug verification...)
>
> --- upstream0/drivers/scsi/libata-core.c 2006-05-16 11:08:49.000000000 +0800
> +++ 300_phantom_device/drivers/scsi/libata-core.c 2006-05-19 17:37:23.000000000 +0800
> @@ -1194,6 +1194,9 @@ static int ata_dev_read_id(struct ata_de
>
> tf.protocol = ATA_PROT_PIO;
>
> + /* Use polling for early detection of phantom device 1 */
> + tf.flags |= ATA_TFLAG_POLLING;
> +
> err_mask = ata_exec_internal(dev, &tf, NULL, DMA_FROM_DEVICE,
> id, sizeof(id[0]) * ATA_ID_WORDS);
> if (err_mask) {
>

Great, it worked. Here's the relevant part of log w/ both ATA_DEBUG and
ATA_VERBOSE_DEBUG on. Although it tries several times but all those are
slightly over 15secs, so it's quite usable.

[ata_eh_recover ] ENTER
[__ata_port_freeze ] ata1 port frozen
[piix_sata_prereset ] ata1: ENTER, pcs=0x1f base=0
[piix_sata_prereset ] ata1: LEAVE, pcs=0x1b present_mask=0x1
[ata_std_softreset ] ENTER
[ata_std_softreset ] about to softreset, devmask=3
[ata_bus_softreset ] ata1: bus reset via SRST
[ata_dev_classify ] found ATAPI device by sig
[ata_dev_classify ] found ATAPI device by sig
[ata_std_softreset ] EXIT, classes[0]=3 [1]=3
[ata_std_postreset ] ENTER
[ata_std_postreset ] EXIT
[ata_eh_thaw_port ] ata1 port thawed
[ata_eh_revalidate_and_attach] ENTER
[ata_dev_read_id ] ENTER, host 1, dev 0
[ata_dev_select ] ENTER, ata1: device 0, wait 1
[ata_dev_select ] ENTER, ata1: device 0, wait 1
[ata_exec_command_pio] ata1: cmd 0xA1
[ata_hsm_move ] ata1: protocol 2 task_state 2 (dev_stat 0x5A)
[ata_pio_sector ] data read
[ata_hsm_move ] ata1: protocol 2 task_state 3 (dev_stat 0x50)
[ata_hsm_move ] ata1: dev 0 command complete, drv_stat 0x50
[ata_port_flush_task ] ENTER
[ata_port_flush_task ] flush #1
[ata_port_flush_task ] flush #2
[ata_port_flush_task ] EXIT
[ata_dev_configure ] ENTER, host 1, dev 0
[ata_dump_id ] 49==0x0f00 53==0x0006 63==0x0007 64==0x0003
75==0x0000
[ata_dump_id ] 80==0x0078 81==0x0000 82==0x0000 83==0x0000
84==0x0000
[ata_dump_id ] 88==0x101f 93==0x4101
ata1.00: ATAPI, max UDMA/66
ata1.00: applying bridge limits
[ata_dev_configure ] EXIT, drv_stat = 0x50
[ata_dev_read_id ] ENTER, host 1, dev 1
[ata_dev_select ] ENTER, ata1: device 1, wait 1
[ata_dev_select ] ENTER, ata1: device 1, wait 1
[ata_exec_command_pio] ata1: cmd 0xA1
[ata_hsm_move ] ata1: protocol 2 task_state 2 (dev_stat 0x0)
[ata_hsm_move ] ata1: protocol 2 task_state 4 (dev_stat 0x0)
[__ata_port_freeze ] ata1 port frozen
[ata_port_flush_task ] ENTER
[ata_port_flush_task ] flush #1
[ata_port_flush_task ] flush #2
[ata_port_flush_task ] EXIT
ata1.01: failed to IDENTIFY (I/O error, err_mask=0x2)
[ata_eh_revalidate_and_attach] EXIT
ata1: failed to recover some devices, retrying in 5 secs

--
tejun

2006-05-22 07:22:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFT] major libata update

Tejun Heo wrote:
> * Proposed solution
>
> It seems that the only solution is to make use of the PCS presence bits
> somehow. It is know that 6300ESB family of controllers have flaky
> presence bits (ata_piix marks them with PIIX_FLAG_IGNORE_PCS), but I
> couldn't find any document/errata for PCS bits for any other
> controllers. So, we can use PCS for all !PIIX_FLAG_IGNORE_PCS
> controllers or take a conservative approach and make use of it only on
> cases where ghosting problem is reported (ICH7 and 8, I guess. Can
> anyone test 6?).
>
> Please note that we already use some use of the PCS value when probing
> SATA port. If its value is zero, we skip the port. It's done this way
> mainly due to historical reasons - until recently ata_piix didn't have
> MAP tables to map PM/PS/SM/SS to specific ports thus used the PCS values
> in rougher form.
>
> Jeff, what do you think?


Sounds sane...

Jeff