2008-02-06 16:32:39

by Harald Arnesen

[permalink] [raw]
Subject: Latest git oopses during boot

Photo of screen attached (no serial terminal, and alas, no decent
tripod either).

git-bisect gives me this:

6b7b651055221127304a4e373ee9b762398d54d7 is first bad commit
commit 6b7b651055221127304a4e373ee9b762398d54d7
Author: FUJITA Tomonori <[email protected]>
Date: Mon Feb 4 22:27:55 2008 -0800

iommu sg merging: add device_dma_parameters structure

IOMMUs merges scatter/gather segments without considering a low level
driver's restrictions. The problem is that IOMMUs can't access to the
limitations because they are in request_queue.

This patchset introduces a new structure, device_dma_parameters,
including dma information. A pointer to device_dma_parameters is added
to struct device. The bus specific structures (like pci_dev) includes
device_dma_parameters. Low level drivers can use dma_set_max_seg_size
to tell IOMMUs about the restrictions.

We can move more dma stuff in struct device (like dma_mask) to struct
device_dma_parameters later (needs some cleanups before that).

This includes patches for all the IOMMUs that could merge sg (x86_64,
ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
tested. The patches for other IOMMUs are only compile tested.

$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../configure --prefix=/opt/gcc
Thread model: posix
gcc version 4.2.3

Config:

CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=m

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
CONFIG_AFFS_FS=m
CONFIG_ECRYPT_FS=m
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_BEFS_FS=m
# CONFIG_BEFS_DEBUG is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
CONFIG_HPFS_FS=m
CONFIG_QNX4FS_FS=m
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
# CONFIG_SUNRPC_BIND34 is not set
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_WEAK_PW_HASH is not set
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
CONFIG_AMIGA_PARTITION=y
CONFIG_ATARI_PARTITION=y
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
# CONFIG_BSD_DISKLABEL is not set
CONFIG_MINIX_SUBPARTITION=y
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=m
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_737=m
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=m
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=m
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=m
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
CONFIG_NLS_ISO8859_3=m
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=m
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_DEBUG_FS is not set
CONFIG_HEADERS_CHECK=y
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_HIGHMEM is not set
# CONFIG_DEBUG_BUGVERBOSE is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_FORCED_INLINING is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_SAMPLES is not set
# CONFIG_EARLY_PRINTK is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_4KSTACKS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
# CONFIG_DOUBLEFAULT is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_CPA_DEBUG is not set

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_DEBUG_PROC_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=m
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_HASH=m
CONFIG_CRYPTO_MANAGER=m
CONFIG_CRYPTO_HMAC=m
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_XTS=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_586=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_586=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SALSA20_586=m
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_HW is not set
# CONFIG_VIRTUALIZATION is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=m
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y


Attachments:
(No filename) (8.36 kB)
oops.jpg (102.79 kB)
Download all attachments

2008-02-06 22:48:18

by Andrew Morton

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On Wed, 6 Feb 2008 17:32:22 +0100
"Harald Arnesen" <[email protected]> wrote:

> Photo of screen attached (no serial terminal, and alas, no decent
> tripod either).

Thanks, but you've disabled so many debug options that the trace isn't
very useful.

> git-bisect gives me this:
>
> 6b7b651055221127304a4e373ee9b762398d54d7 is first bad commit
> commit 6b7b651055221127304a4e373ee9b762398d54d7
> Author: FUJITA Tomonori <[email protected]>
> Date: Mon Feb 4 22:27:55 2008 -0800
>
> iommu sg merging: add device_dma_parameters structure
>
> IOMMUs merges scatter/gather segments without considering a low level
> driver's restrictions. The problem is that IOMMUs can't access to the
> limitations because they are in request_queue.
>
> This patchset introduces a new structure, device_dma_parameters,
> including dma information. A pointer to device_dma_parameters is added
> to struct device. The bus specific structures (like pci_dev) includes
> device_dma_parameters. Low level drivers can use dma_set_max_seg_size
> to tell IOMMUs about the restrictions.
>
> We can move more dma stuff in struct device (like dma_mask) to struct
> device_dma_parameters later (needs some cleanups before that).
>
> This includes patches for all the IOMMUs that could merge sg (x86_64,
> ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
> tested. The patches for other IOMMUs are only compile tested.
>
> $ gcc -v
> Using built-in specs.
> Target: i686-pc-linux-gnu
> Configured with: ../configure --prefix=/opt/gcc
> Thread model: posix
> gcc version 4.2.3
>
> Config:
>
> ...
>
> # CONFIG_DEBUG_BUGVERBOSE is not set
>

This one really should be enabled at all times, please.

Can you please set it and retry?

2008-02-06 23:11:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: Latest git oopses during boot



On Wed, 6 Feb 2008, Harald Arnesen wrote:
>
> Photo of screen attached (no serial terminal, and alas, no decent
> tripod either).

To make that oops even remotely useful, you do need at least
CONFIG_DEBUG_BUGVERBOSE enabled, and really preferably also
CONFIG_KALLSYMS.

Otherwise there is no symbolic debug info, and it's impossible to even
really guess what is going on.

It seems to happen early on module load (which is good, that makes it
likely easier to debug), but without knowing the function and call chain,
it's a bit useless.

Linus

2008-02-07 10:02:19

by Andrew Morton

[permalink] [raw]
Subject: Re: Latest git oopses during boot


(cc's restored, and expanded a bit)

On Thu, 7 Feb 2008 10:44:29 +0100 "Harald Arnesen" <[email protected]> wrote:

> On Feb 6, 2008 11:47 PM, Andrew Morton <[email protected]> wrote:
> > On Wed, 6 Feb 2008 17:32:22 +0100
> > "Harald Arnesen" <[email protected]> wrote:
> >
> > > Photo of screen attached (no serial terminal, and alas, no decent
> > > tripod either).
> >
> > Thanks, but you've disabled so many debug options that the trace isn't
> > very useful.
> >
> >
> > > git-bisect gives me this:
> > >
> > > 6b7b651055221127304a4e373ee9b762398d54d7 is first bad commit
> > > commit 6b7b651055221127304a4e373ee9b762398d54d7
> > > Author: FUJITA Tomonori <[email protected]>
> > > Date: Mon Feb 4 22:27:55 2008 -0800
> > >
> > > iommu sg merging: add device_dma_parameters structure
> > >
> > > IOMMUs merges scatter/gather segments without considering a low level
> > > driver's restrictions. The problem is that IOMMUs can't access to the
> > > limitations because they are in request_queue.
> > >
> > > This patchset introduces a new structure, device_dma_parameters,
> > > including dma information. A pointer to device_dma_parameters is added
> > > to struct device. The bus specific structures (like pci_dev) includes
> > > device_dma_parameters. Low level drivers can use dma_set_max_seg_size
> > > to tell IOMMUs about the restrictions.
> > >
> > > We can move more dma stuff in struct device (like dma_mask) to struct
> > > device_dma_parameters later (needs some cleanups before that).
> > >
> > > This includes patches for all the IOMMUs that could merge sg (x86_64,
> > > ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
> > > tested. The patches for other IOMMUs are only compile tested.
> > >
> > > $ gcc -v
> > > Using built-in specs.
> > > Target: i686-pc-linux-gnu
> > > Configured with: ../configure --prefix=/opt/gcc
> > > Thread model: posix
> > > gcc version 4.2.3
> > >
> > > Config:
> > >
> > > ...
> > >
> > > # CONFIG_DEBUG_BUGVERBOSE is not set
> > >
> >
> > This one really should be enabled at all times, please.
> >
> > Can you please set it and retry?
> >
>
> I'm off to my day job now, but I did it earlier this morning. New
> screenshot attached.
>
> Seems to be the advansys driver, so I tried to remove it - and indeed,
> the kernel now boots. So I guess it's either that driver or my ancient
> Nikon Coolscan II that is the only thing attached to the board.

Thanks. I uploaded the oops picture to
http://userweb.kernel.org/~akpm/oops.jpg

> Cc to the Matthew Wilcox added.

mm... looks like all Matthew's changes were in 2.6.23. And 2.6.23 worked
OK, yes?


The only recent changes to drivers/scsi/advansys.c are

commit b80ca4f7ee36c26d300c5a8f429e73372d153379
Author: FUJITA Tomonori <[email protected]>
Date: Sun Jan 13 15:46:13 2008 +0900

[SCSI] replace sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE

This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
several LLDs. It's a preparation for the future changes to remove
sense_buffer array in scsi_cmnd structure.

Signed-off-by: FUJITA Tomonori <[email protected]>
Signed-off-by: James Bottomley <[email protected]>

:100644 100644 9dd3952... 492702b... M drivers/scsi/advansys.c

commit 747d016e7e25e216b31022fe2b012508d99fb682
Author: Randy Dunlap <[email protected]>
Date: Mon Jan 14 00:55:18 2008 -0800

advansys: fix section mismatch warning

Fix section mismatch warning:

WARNING: vmlinux.o(.exit.text+0x152a): Section mismatch: reference to .init.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

which seem fairly benign.


gcc inlining is going to make it rather a lot of work to find out which
statement has actually oopsed there.

2008-02-07 11:15:16

by Harald Arnesen

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On 2/7/08, Andrew Morton <[email protected]> wrote:
>
> (cc's restored, and expanded a bit)

Ah, sorry, not used to gmail's web interface. Clicked the wrong button.

> > Seems to be the advansys driver, so I tried to remove it - and indeed,
> > the kernel now boots. So I guess it's either that driver or my ancient
> > Nikon Coolscan II that is the only thing attached to the board.
>
> Thanks. I uploaded the oops picture to
> http://userweb.kernel.org/~akpm/oops.jpg
>
> > Cc to the Matthew Wilcox added.
>
> mm... looks like all Matthew's changes were in 2.6.23. And 2.6.23 worked
> OK, yes?

Both 2.6.23 and 2.6.24 are ok.

> The only recent changes to drivers/scsi/advansys.c are
>
> commit b80ca4f7ee36c26d300c5a8f429e73372d153379
> Author: FUJITA Tomonori <[email protected]>
> Date: Sun Jan 13 15:46:13 2008 +0900
>
> [SCSI] replace sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE
>
> This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
> several LLDs. It's a preparation for the future changes to remove
> sense_buffer array in scsi_cmnd structure.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> Signed-off-by: James Bottomley <[email protected]>
>
> :100644 100644 9dd3952... 492702b... M drivers/scsi/advansys.c
>
> commit 747d016e7e25e216b31022fe2b012508d99fb682
> Author: Randy Dunlap <[email protected]>
> Date: Mon Jan 14 00:55:18 2008 -0800
>
> advansys: fix section mismatch warning
>
> Fix section mismatch warning:
>
> WARNING: vmlinux.o(.exit.text+0x152a): Section mismatch: reference to .init.
>
> Signed-off-by: Randy Dunlap <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: James Bottomley <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> which seem fairly benign.
>
>
> gcc inlining is going to make it rather a lot of work to find out which
> statement has actually oopsed there.
--
Hilsen Harald

2008-02-07 11:16:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On Thu, Feb 07, 2008 at 12:14:56PM +0100, Harald Arnesen wrote:
> > The only recent changes to drivers/scsi/advansys.c are
> >
> > commit b80ca4f7ee36c26d300c5a8f429e73372d153379
> > Author: FUJITA Tomonori <[email protected]>
> > Date: Sun Jan 13 15:46:13 2008 +0900
> >
> > [SCSI] replace sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE
> >
> > This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
> > several LLDs. It's a preparation for the future changes to remove
> > sense_buffer array in scsi_cmnd structure.
> >
> > Signed-off-by: FUJITA Tomonori <[email protected]>
> > Signed-off-by: James Bottomley <[email protected]>

The sense buffer changes have cause a fair amount of trouble, so I'd
look into this one for debugging the problem..

2008-02-07 14:33:29

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On Thu, 7 Feb 2008 12:14:56 +0100
"Harald Arnesen" <[email protected]> wrote:

> On 2/7/08, Andrew Morton <[email protected]> wrote:
> >
> > (cc's restored, and expanded a bit)
>
> Ah, sorry, not used to gmail's web interface. Clicked the wrong button.
>
> > > Seems to be the advansys driver, so I tried to remove it - and indeed,
> > > the kernel now boots. So I guess it's either that driver or my ancient
> > > Nikon Coolscan II that is the only thing attached to the board.
> >
> > Thanks. I uploaded the oops picture to
> > http://userweb.kernel.org/~akpm/oops.jpg
> >
> > > Cc to the Matthew Wilcox added.
> >
> > mm... looks like all Matthew's changes were in 2.6.23. And 2.6.23 worked
> > OK, yes?
>
> Both 2.6.23 and 2.6.24 are ok.
>
> > The only recent changes to drivers/scsi/advansys.c are
> >
> > commit b80ca4f7ee36c26d300c5a8f429e73372d153379
> > Author: FUJITA Tomonori <[email protected]>
> > Date: Sun Jan 13 15:46:13 2008 +0900
> >
> > [SCSI] replace sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE
> >
> > This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
> > several LLDs. It's a preparation for the future changes to remove
> > sense_buffer array in scsi_cmnd structure.
> >
> > Signed-off-by: FUJITA Tomonori <[email protected]>
> > Signed-off-by: James Bottomley <[email protected]>
> >
> > :100644 100644 9dd3952... 492702b... M drivers/scsi/advansys.c
> >
> > commit 747d016e7e25e216b31022fe2b012508d99fb682
> > Author: Randy Dunlap <[email protected]>
> > Date: Mon Jan 14 00:55:18 2008 -0800
> >
> > advansys: fix section mismatch warning
> >
> > Fix section mismatch warning:
> >
> > WARNING: vmlinux.o(.exit.text+0x152a): Section mismatch: reference to .init.
> >
> > Signed-off-by: Randy Dunlap <[email protected]>
> > Cc: Matthew Wilcox <[email protected]>
> > Cc: James Bottomley <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > Signed-off-by: Linus Torvalds <[email protected]>
> >
> > which seem fairly benign.
> >
> >
> > gcc inlining is going to make it rather a lot of work to find out which
> > statement has actually oopsed there.
> --

Can you try this?

Thanks,

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index 374ed02..f5dde12 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -566,7 +566,7 @@ typedef struct asc_dvc_var {
ASC_SCSI_BIT_ID_TYPE unit_not_ready;
ASC_SCSI_BIT_ID_TYPE queue_full_or_busy;
ASC_SCSI_BIT_ID_TYPE start_motor;
- uchar overrun_buf[ASC_OVERRUN_BSIZE] __aligned(8);
+ uchar *overrun_buf;
dma_addr_t overrun_dma;
uchar scsi_reset_wait;
uchar chip_no;
@@ -13833,6 +13833,12 @@ static int __devinit advansys_board_found(struct Scsi_Host *shost,
*/
if (ASC_NARROW_BOARD(boardp)) {
ASC_DBG(2, "AscInitAsc1000Driver()\n");
+
+ asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);
+ if (!asc_dvc_varp->overrun_buf) {
+ ret = -ENOMEM;
+ goto err_free_wide_mem;
+ }
warn_code = AscInitAsc1000Driver(asc_dvc_varp);

if (warn_code || asc_dvc_varp->err_code) {
@@ -13840,8 +13846,10 @@ static int __devinit advansys_board_found(struct Scsi_Host *shost,
"warn 0x%x, error 0x%x\n",
asc_dvc_varp->init_state, warn_code,
asc_dvc_varp->err_code);
- if (asc_dvc_varp->err_code)
+ if (asc_dvc_varp->err_code) {
ret = -ENODEV;
+ kfree(asc_dvc_varp->overrun_buf);
+ }
}
} else {
if (advansys_wide_init_chip(shost))
@@ -13891,9 +13899,11 @@ static int advansys_release(struct Scsi_Host *shost)
free_dma(shost->dma_channel);
}
if (ASC_NARROW_BOARD(board)) {
+ ASC_DVC_VAR *asc_dvc_varp = &board->dvc_var.asc_dvc_var;
dma_unmap_single(board->dev,
board->dvc_var.asc_dvc_var.overrun_dma,
ASC_OVERRUN_BSIZE, DMA_FROM_DEVICE);
+ kfree(asc_dvc_varp->overrun_buf);
} else {
iounmap(board->ioremap_addr);
advansys_wide_free_mem(board);

2008-02-07 18:19:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: Latest git oopses during boot



On Thu, 7 Feb 2008, Harald Arnesen wrote:
>
> OK, tried it. Another screen shot attached
> (I must really get another box to use as a serial terminal).

This oops decodes to

8b 44 24 10 mov 0x10(%esp),%eax
8b 90 7c 02 00 00 mov 0x27c(%eax),%edx
83 ea 54 sub $0x54,%edx
24 18 and $0x18,%al
8b 4c 24 14 mov 0x14(%esp),%ecx
f6 41 04 04 testb $0x4,0x4(%ecx)
75 57 jne 0x70
ba d0 80 00 00 mov $0x80d0,%edx
b8 68 bf 30 c0 mov $0xc030bf68,%eax
e8 2f 8a 38 c7 call 0xc7388a57
** a3 14 00 00 00 mov %eax,0x14 **
85 c0 test %eax,%eax
0f 84 b3 14 00 00 je 0x14c0
8b 44 24 14 mov 0x14(%esp),%eax
83 c0 0c add $0xc,%eax

and the oopsing instruction is literally an insane "store to absolute
address 0x14" which will definitely oops unconditionally.

Quit frankly, that code makes no sense. It smells like code corruption,
especially as it is right at a return point of a function call (ie maybe
the function screwed up the stack accesses somehow).

Actually, it look slike the call address itself is screwed up too. I
don't think "0xc7388a57" is likely to be a valid address.

The code *looks* like the test

if (ASC_NARROW_BOARD(boardp)) {
ASC_DBG(1, "narrow board\n");
asc_dvc_varp = &boardp->dvc_var.asc_dvc_var;
asc_dvc_varp->bus_type = bus_type;

but with strange corruption.

Can you do a

make drivers/scsi/advansys.lst

and see what it should be?

Linus

2008-02-07 21:22:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: Latest git oopses during boot



On Thu, 7 Feb 2008, Harald Arnesen wrote:
> >
> > Can you do a
> >
> > make drivers/scsi/advansys.lst
> >
> > and see what it should be?
>
> Anyway, here it is, as an attachment.

Ok, I was wrong. The code really *does* compile to that insane

a3 14 00 00 00 mov %eax,0x14

by your compiler.

That's the

asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);

thing, and gcc seems to have decided that it can statically prove that
asc_dvc_varp is NULL.

Quite frankly, I don't see that being true. But you have some patches in
your tree that I haven't followed, so.. Are you sure the patches applied
to the right spot? The patch I saw added that kzalloc() to the _end_ of
the function (long after asc_dvc_varp was initialized), maybe that one got
mis-applied?

Or maybe your compiler version is simply totally broken.

Linus

2008-02-07 21:41:54

by Harald Arnesen

[permalink] [raw]
Subject: Re: Latest git oopses during boot

Linus Torvalds <[email protected]> writes:

> On Thu, 7 Feb 2008, Harald Arnesen wrote:
>> >
>> > Can you do a
>> >
>> > make drivers/scsi/advansys.lst
>> >
>> > and see what it should be?
>>
>> Anyway, here it is, as an attachment.
>
> Ok, I was wrong. The code really *does* compile to that insane
>
> a3 14 00 00 00 mov %eax,0x14
>
> by your compiler.
>
> That's the
>
> asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);
>
> thing, and gcc seems to have decided that it can statically prove that
> asc_dvc_varp is NULL.
>
> Quite frankly, I don't see that being true. But you have some patches in
> your tree that I haven't followed, so.. Are you sure the patches applied
> to the right spot? The patch I saw added that kzalloc() to the _end_ of
> the function (long after asc_dvc_varp was initialized), maybe that one got
> mis-applied?
>
> Or maybe your compiler version is simply totally broken.
>
> Linus

I'll try applying the patch to a freshly downloaded git-tree.

Shall I try another compiler? I have at least these two:

gcc version 3.4.6 (Ubuntu 3.4.6-6ubuntu2)
gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)

in addition to the self-compiled 4.2.3 I used for the tests.
--
Hilsen Harald.

2008-02-07 22:13:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: Latest git oopses during boot



On Thu, 7 Feb 2008, Harald Arnesen wrote:
>
> I'll try applying the patch to a freshly downloaded git-tree.

Ok, good.

> Shall I try another compiler? I have at least these two:
>
> gcc version 3.4.6 (Ubuntu 3.4.6-6ubuntu2)
> gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)

I would suggest a patch mis-application problem first (or possibly even
the patch itself being broken - I simply didn't look very closely at the
patch, but it *looked* ok).

If it's a compiler bug, it's a pretty big one, and quite frankly, I doubt
it. Compiler bugs do happen, but they are pretty rare, and they tend to
have more subtle effects than the one you see.

However:

> in addition to the self-compiled 4.2.3 I used for the tests.

4.2.3? Really? That's pretty damn recent, and so almost totally untested.
That does make a compiler bug at least more likely.

So yes, if you already have other compilers installed, you should try
them. If it really is a compiler bug, it's a really bad one, and you would
want to let the gcc people know.

Still, I'd double-check that the

asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);

line was added properly first. You should see it way after the point where
it did

asc_dvc_varp = &boardp->dvc_var.asc_dvc_var;

to initialize it (and both statements should be inside a

if (ASC_NARROW_BOARD(boardp)) {

conditional - please check that the source code looks sane too).

Linus

2008-02-07 22:24:32

by Harald Arnesen

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On 2/7/08, Linus Torvalds <[email protected]> wrote:
>
>
> On Thu, 7 Feb 2008, Harald Arnesen wrote:
> >
> > I'll try applying the patch to a freshly downloaded git-tree.
>
> Ok, good.
>
> > Shall I try another compiler? I have at least these two:
> >
> > gcc version 3.4.6 (Ubuntu 3.4.6-6ubuntu2)
> > gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
>
> I would suggest a patch mis-application problem first (or possibly even
> the patch itself being broken - I simply didn't look very closely at the
> patch, but it *looked* ok).
>
> If it's a compiler bug, it's a pretty big one, and quite frankly, I doubt
> it. Compiler bugs do happen, but they are pretty rare, and they tend to
> have more subtle effects than the one you see.
>
> However:
>
> > in addition to the self-compiled 4.2.3 I used for the tests.
>
> 4.2.3? Really? That's pretty damn recent, and so almost totally untested.
> That does make a compiler bug at least more likely.
>
> So yes, if you already have other compilers installed, you should try
> them. If it really is a compiler bug, it's a really bad one, and you would
> want to let the gcc people know.
>
> Still, I'd double-check that the
>
> asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);
>
> line was added properly first. You should see it way after the point where
> it did
>
> asc_dvc_varp = &boardp->dvc_var.asc_dvc_var;
>
> to initialize it (and both statements should be inside a
>
> if (ASC_NARROW_BOARD(boardp)) {
>
> conditional - please check that the source code looks sane too).
>
> Linus

I just re-downloaded an re-patched and re-compiled (with gcc 4.2.3),
and now the kernel boots. I must have screwed up the previous
patching.

It now works, with Fujita's patch applied.
--
Hilsen Harald

2008-02-08 00:10:53

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Latest git oopses during boot

On Thu, 7 Feb 2008 23:24:00 +0100
"Harald Arnesen" <[email protected]> wrote:

> On 2/7/08, Linus Torvalds <[email protected]> wrote:
> >
> >
> > On Thu, 7 Feb 2008, Harald Arnesen wrote:
> > >
> > > I'll try applying the patch to a freshly downloaded git-tree.
> >
> > Ok, good.
> >
> > > Shall I try another compiler? I have at least these two:
> > >
> > > gcc version 3.4.6 (Ubuntu 3.4.6-6ubuntu2)
> > > gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
> >
> > I would suggest a patch mis-application problem first (or possibly even
> > the patch itself being broken - I simply didn't look very closely at the
> > patch, but it *looked* ok).
> >
> > If it's a compiler bug, it's a pretty big one, and quite frankly, I doubt
> > it. Compiler bugs do happen, but they are pretty rare, and they tend to
> > have more subtle effects than the one you see.
> >
> > However:
> >
> > > in addition to the self-compiled 4.2.3 I used for the tests.
> >
> > 4.2.3? Really? That's pretty damn recent, and so almost totally untested.
> > That does make a compiler bug at least more likely.
> >
> > So yes, if you already have other compilers installed, you should try
> > them. If it really is a compiler bug, it's a really bad one, and you would
> > want to let the gcc people know.
> >
> > Still, I'd double-check that the
> >
> > asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL);
> >
> > line was added properly first. You should see it way after the point where
> > it did
> >
> > asc_dvc_varp = &boardp->dvc_var.asc_dvc_var;
> >
> > to initialize it (and both statements should be inside a
> >
> > if (ASC_NARROW_BOARD(boardp)) {
> >
> > conditional - please check that the source code looks sane too).
> >
> > Linus
>
> I just re-downloaded an re-patched and re-compiled (with gcc 4.2.3),
> and now the kernel boots. I must have screwed up the previous
> patching.
>
> It now works, with Fujita's patch applied.

Thanks Harald and Linus,

The bug has been in the advansys driver. 2.6.23 and 2.6.24 works just
because the size of Scsi_Host structure was multiples of 8. After
2.6.24, some patches change Scsi_Host structure and now the size is
not multiples of 8. So we hit this bug.


I'll resend the patch with a proper description.