To: bugsERASE@freebsd.org Cc: fabioERASE@cesar.unicamp.br, ftyERASE@mcnc.org, gcrutchrERASE@nightflight.com, jERASE@uriah.heep.sax.de, jcERASE@irbs.com, julianERASE@freebsd.org, kukuERASE@gilberto.physik.rwth-aachen.de, lehey.padERASE@sni.de, mrmERASE@Sceard.com, nikmERASE@ixa.net, tomppaERASE@fidata.fi, wilkoERASE@yedi.iaf.nl Subject: Re: Adaptec 1542A Users From: Julian H. Stacey Hi bugs@freebsd.org Cc Adaptec 1542A SCSI Adapter People, & Julian Elischer (SCSI Guru :-) Tomi Vainio Has confirmed he sees the same Adaptec 1542A SCSI adapter bug that I do. To quote his 2 mails to me: ====== > Date: Fri, 12 Apr 1996 23:37:12 +0300 (EET DST) > From: Tomi Vainio > To: "Julian H. Stacey" Subject: Re. Adaptec 1542A Users > > Julian H. Stacey writes: > > > > Since my posting, > > I have written & read a 190 meg image on sd3 using a system with a 1542B, > > then I took it to the 1542A system, as sd1 ... it blew within a few K. > > > > The test on a 1542A system is simple: > > make > > ./testblock -v -l 10000000 /usr2/tmp/rubbish > > (where /usr2/tmp must be on /dev/sd1 or 2 or 3, but not /dev/sd0 or /dev/wd* > > & wait to see if you get something like: > > data mismatch at byte 40961 (0xa001), after 0 (0x0) previously checked ok. > > I connected sd1 to my 1542A and here are results: > > 1. No problems if testblock is only one that generates disk activity. > 2. I launched couple find processes to sd0 and at same time I > run testblock. Testblock failed only 1/10 of test runs. > 3. I copied files with cp to sd1 when running testblock on > sd1. Testblock failed on every time. > > Tomppa > > ../testblock -v -l 10000000 /v/fish > ../testblock: Neither -w or -r specified, so will both write then read. > Using a block size of 61440, to a limit of 10000000. > ../testblock writing then reading /v/fish. > ../testblock: Started rewinding /v/fish. > ../testblock: Finished rewinding /v/fish. > ../testblock: In /v/fish, data mismatch at byte 49153 (0xc001), after 0 (0x0) previously checked ok. > Byte read 255, byte expected 0 > ../testblock: With /v/fish, only checked 0 bytes, 10,014,720 failed. > ../testblock: Finished. > }{ ====== > Date: Sat, 13 Apr 1996 02:39:33 +0300 (EET DST) > From: Tomi Vainio > To: "Julian H. Stacey" > Julian H. Stacey writes: > > > > Please tell me in what way the data is corrupted. > > Do you get a block of 8 0xFF bytes aligned at the beginning of random > > (not all) 0x1000 boundary aligned blocks ? > > > > Please run the enclosed 8f.c on some corrupt files, > > maybe with > > script > > find /usr.flaky.drive -exec 8f {} /dev/null \; > > I am very interested to know if you see the same bad data I do ! > > > > fish is file that testblock made and zip files are copied with cp > > Tomppa > > worm:/v(13)# find . -exec 8f/8f {} /dev/null \; > ../fish: byte (dec) 147457 (hex) 24001, line 577 character 247 > ../fish: byte (dec) 2203649 (hex) 21a001, line 8608 character 25 > ../fish: byte (dec) 2383873 (hex) 246001, line 9312 character 28 > ../herit201.zip: byte (dec) 147457 (hex) 24001, line 570 character 21 > ../herit209.zip: byte (dec) 212993 (hex) 34001, line 833 character 53 > ../herit209.zip: byte (dec) 1245185 (hex) 130001, line 5132 character 122 > ../herit208.zip: byte (dec) 53249 (hex) d001, line 233 character 482 > ../herit208.zip: byte (dec) 147457 (hex) 24001, line 619 character 355 > ../herit207.zip: byte (dec) 49153 (hex) c001, line 178 character 39 > ../herit204.zip: byte (dec) 999425 (hex) f4001, line 3874 character 275 > ../herit204.zip: byte (dec) 1458177 (hex) 164001, line 5863 character 11 > ../herit203.zip: byte (dec) 147457 (hex) 24001, line 406 character 56 > ../herit203.zip: byte (dec) 868353 (hex) d4001, line 3138 character 72 > ../herit202.zip: byte (dec) 49153 (hex) c001, line 194 character 239 > ../herit210.zip: byte (dec) 49153 (hex) c001, line 192 character 497 > ../herit210.zip: byte (dec) 147457 (hex) 24001, line 611 character 491 > ../8f/8f.c: byte (dec) 173 (hex) ad, line 10 character 32 > So it looks like a generic bug in FreeBSD code: With a 1542A (& not a 1542B, which seems OK), In simultaneous multiple task write mode to sd1 (or 2 or 3 or 4), At random multiples of 0x1000 bytes, The first 8 bytes of a block get forced to 0xFF. (Of course it may well be that FreeBSD code is not `in error' but merely doesnt allow for some wart in the 1542A, that's fixed in the 1542B, but whatever, we need a fix). Those who have not yet proven this on their system might like to try something like this: sync ; echo maybe even dump sd1 to tape # See below cd <<>>/tmp testblock -l 10000000 rubbish1 & testblock -l 10000000 rubbish2 & testblock -l 10000000 rubbish3 & & do some other sd0 to sd1 copying in parallel. Then run my 8f on all the data files youve run. Remember if you have a swap partition on sd1, & you swapped, the swap may be damaged so you might crash. If you'r really unlucky, while the system is creating new inodes for the rubbish files, & is manipulating the file system, 8 bytes (out of several 0x1000) bytes of file system structure data may get mangled. Here's hoping our SCSI Guru, Julian Elischer (or anyone else come to that) can suggest some code changes (ideally diffs) for me to compile here, & test on my machines, & that I can report back on. (Others too, naturally, if they want). I guess Im in an ideal test set up here: I have a 1542B current system with 3 discs, & a 1542A 2.1 system with 3 or 4 discs, (each also with scsi tape, & I think Ive seen this bug on tape too, on my 1542A) I have supplied CC readers with testblock.c & 8f.c, for others interested, I'll toss them in http://....../~jhs/src/ Julian -- Julian H. Stacey jhs@