<in the previous part>... doesn't find SCSI disks. Here it is tricky, it may be a problem with the interrupt routing, or DMA or SCSI host emulation...
... or a bug in the AIX driver itself.
As AIX 4.2 tries to perform scsi inquiry, that's what happens in the QEMU log:
The register at 0xac-0xaf is DSA Relative Selector (DRS). Is known to qemu, but seems to be not used in any operations.
The newer LSI53c1010-66 manual says:
"This register supplies AD[63:32] during Table Indirect
Fetches and Load/Store Data Structure Address (DSA)
relative operations"
So, maybe just add the support of this register to QEMU and allow the 64 bit DMA transfers, right?
Wrong. The write to this register is the last write and it doesn't start any SCSI command. Let's look where it happens:
The register r4 is 0x2c, but the procedure writes to 0xac. Weird.
Let's look at the other registers:
What's that 80 at the end of r5? 0x80 + 0x2c is 0xac. Coincidence? Don't think so.
So, what happens here is the driver tries to write 0x2c, but the bus is shifted, so it hits 0xac. After some chasing I found where this shift is coming from:
It is added and stored unconditionally. If I drop this addic, something different happens:
Why would it work on the physical hardware? I guess because the addresses are aliased. Pretty similar to the le bug in Solaris.
So, it's not that QEMU has some unimplemented registers. In this case it has too many implemented ones.
On the other hand, it still doesn't detect the scsi disk, so maybe it has not just too much features, but too few as well...
/Stay tuned
... or a bug in the AIX driver itself.
As AIX 4.2 tries to perform scsi inquiry, that's what happens in the QEMU log:
(qemu) lsi_scsi: Write reg ??? ac = e4 lsi_scsi: Write reg ??? ad = 38 lsi_scsi: Write reg ??? ae = c4 lsi_scsi: Write reg ??? af = 81
The register at 0xac-0xaf is DSA Relative Selector (DRS). Is known to qemu, but seems to be not used in any operations.
The newer LSI53c1010-66 manual says:
"This register supplies AD[63:32] during Table Indirect
Fetches and Load/Store Data Structure Address (DSA)
relative operations"
So, maybe just add the support of this register to QEMU and allow the 64 bit DMA transfers, right?
Wrong. The write to this register is the last write and it doesn't start any SCSI command. Let's look where it happens:
p8xx_start_chip: ... 0x018ff854: stw r8,-4(r7) 0x018ff858: li r4,44 ; 0x2c 0x018ff85c: b 0x18fe348 ; p8xx_write_reg <= write happens here
The register r4 is 0x2c, but the procedure writes to 0xac. Weird.
Let's look at the other registers:
0x018fe368 in ?? () (gdb) info registers r3 r5 r4 r3 0x18f5000 26169344 r5 0x31000080 822083712 r4 0x2c 44
What's that 80 at the end of r5? 0x80 + 0x2c is 0xac. Coincidence? Don't think so.
So, what happens here is the driver tries to write 0x2c, but the bus is shifted, so it hits 0xac. After some chasing I found where this shift is coming from:
p8xx_config: ... 0x018fdb9c: bl 0x1909938 0x018fdba0: lwz r2,20(r1) 0x018fdba4: cmpwi cr1,r3,19 0x018fdba8: beq cr1,0x18fdbc8 0x018fdbac: li r8,1 0x018fdbb0: stb r8,256(r28) 0x018fdbb4: li r3,0 0x018fdbb8: bl 0x1909938 0x018fdbbc: lwz r2,20(r1) 0x018fdbc0: lwz r8,160(r28) 0x018fdbc4: b 0x18fdbd0 0x018fdbc8: stb r29,256(r28) 0x018fdbcc: lwz r8,160(r28) 0x018fdbd0: lis r11,4096 0x018fdbd4: addic r10,r8,128 ; this is the 0x80 I'm looking for 0x018fdbd8: li r8,-1 0x018fdbdc: rlwinm r31,r26,1,15,30 0x018fdbe0: addic r23,r28,10580 0x018fdbe4: stw r10,252(r28) 0x018fdbe8: li r25,1 0x018fdbec: addi r30,r28,0 0x018fdbf0: stwu r25,10512(r30) 0x018fdbf4: stw r8,10588(r28) 0x018fdbf8: lwz r8,10500(r28) 0x018fdbfc: stw r11,10520(r28) 0x018fdc00: stw r10,10532(r28) ; and here it is stored ...
It is added and stored unconditionally. If I drop this addic, something different happens:
(qemu) lsi_scsi: Write reg DSP0 2c = e4 lsi_scsi: Write reg DSP1 2d = 58 lsi_scsi: Write reg DSP2 2e = c4 lsi_scsi: Write reg DSP3 2f = 81 lsi_scsi: SCRIPTS dsp=81c458e4 opcode 41000000 arg 81c45a44 lsi_scsi: Selected target 0 lsi_scsi: SCRIPTS dsp=81c458ec opcode 78370000 arg 00000000 lsi_scsi: Read-Modify-Write reg 0x37 MOV data8=0x00 sfbr=0x00 ...
Why would it work on the physical hardware? I guess because the addresses are aliased. Pretty similar to the le bug in Solaris.
So, it's not that QEMU has some unimplemented registers. In this case it has too many implemented ones.
On the other hand, it still doesn't detect the scsi disk, so maybe it has not just too much features, but too few as well...
/Stay tuned
3 comments:
whoa, totally awesome!
IBM always seems to do stuff in a weird way with the hardware.. Back in the day they gave me a version of AIX with my equipment (I was at a bank) and naturally it didn't work, I had to sit on the phone with them for hours debugging their install, and having some crazy fix in maintenance mode to tweak their registry thing to get it to work. I found that systems booted far more reliably in maintenance mode too.. Although I doubt that helps you in the slightest.
Still, awesome progress!
Yeah, I already realize that it's really really hard to debug what's going on in AIX without phoning IBM.
In my case it's even more fun: I use the Motorola version of AIX. I guess the Motorola engineers had other ideas about the debug process than IBM. There are some additional boot options for instance, the "boot -s prompt" option, which I've shown in the previous post. I found them using 'strings disk-dump.img'. The prompt does look interesting, for instance it has a 'service mode' switch. Unfortunately I see no difference if I turn it on. So, at some point I'm going to need the AIX 4.2 install media for the Motorola machines.
Also Motorola guys left more debug info in their code. "boot -s verbose" is really verbose up to the point where the kernel starts. And after that its a pure IBM silence. The NCR driver is built with no debug information, except the function names. Some kernel functions are built even without the function names. I suspect the two of them are mutex_lock and mutex_unlock, but it's really a shot in a dark.
So I've learned the PPC assembly language to get up to this point, but to solve the current problem I have to learn the NCR/LSI script language.
Somehow it's ironical: ~8 years ago I started my journey into the qemu world by debugging the SCSI inquiry command of a SPARC NCR HBA. And here am I again. :-)
SCSI really is too complicated!!! Sometimes I wonder if mainframe channel attached storage is simpler.
Post a Comment