Showing posts with label esp. Show all posts
Showing posts with label esp. Show all posts

Saturday, July 17, 2010

Bug in NetBSD 1.6 - 3.1 emulation

Was going to write that I couldn't imagine what did the NetBSD guys smoke during 5 years between the versions 1.6 and 3.1 inclusively. The NCR 53c9x SCSI chip (known as "esp" in SPARC and PPC machines) was not so uncommon back then. How could they introduce an instability and didn't notice it ?!?

The code from NetBSD 1.5.3:

NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
//...
NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_GO(sc);

The code from NetBSD 1.6-3.1:

NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
NCRDMA_GO(sc);

The code from NetBSD 4:

NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
//...
NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_GO(sc);

See the difference? In the versions 1.6-3.1 the command is executed before the DMA is set up, so the SCSI controller may not get the command using DMA.

And then I googled for the expected bug reports and found none. Why could it work on the real hardware? Maybe some latency: if the latency of a SCSI controller was larger than a DMA controller, it might work? Maybe concurrency: if some other driver (for instance Ethernet) prepared DMA for itself, the SCSI driver could steal it? Maybe DMA didn't work for these versions at all and after the first attempt they switched to the PIO mode.

Monday, May 31, 2010

Another week another SCSI bug

Fixed Solaris 2.6+ boot which I accidentally broke last week. It's not that my Solaris 2.3 dma/irq fix was wrong, but the fix unleashed a counterpart interrupts handling bug in esp controller.

Too bad that no one reported it earlier. I wouldn't have to hack till midnight now. ;-) And thanks to VooDoo_UzH_ for reporting it.

Saturday, May 15, 2010

Trying to reach 1993

Trying to boot yet older Solaris/SunOS version: Solaris 2.3. According to the Wikipedia, it's the first one which supported SPARCStation-5. You may wanna ask "what is about SunOS 4.1.4 (Solaris 1.1.2), it can be booted, and it must be the older?". No, it's not. Looks like Solaris 2.x and SunOS 4.x were developed independently: 1.1.2 was released in November 1994, and 2.3 was released in November 1993. Is a bit misleading. But explains why 2.3 has problems with the esp scsi controller while 1.1.2 doesn't. And the both systems are so old that the kadb debugger can't set deferred breakpoints. Anyway, the current status of Solaris 2.3:

ok boot disk1:d -vs
Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@1,0:d  File and args: -vs
Size: 719688+166144+108728 Bytes
SunOS Release 5.3 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1993, Sun Microsystems, Inc.
vac: enabled in write through mode
cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1075 MHz)
mem = 65536K (0x4000000)
avail mem = 58142720
Ethernet address = 52:54:0:12:34:56
root nexus = SUNW,SPARCstation-5
iommu0 at root: obio 0x10000000
sbus0 at iommu0: obio 0x10001000
espdma0 at sbus0: SBus slot 5 0x8400000
esp0 at espdma0: SBus slot 5 0x8800000 sparc ipl 4
        polled command timeout
esp:            State=CLEARING Last State=CLEARING
esp:            Latched stat=0x97 intr=0x8 fifo 0x0
esp:            last msg out: NO-OP; last msg in: COMMAND COMPLETE
esp:            DMA csr=0xa4240030
esp:            addr=fc00300a dmacnt=8000 last=fc003008 last_cnt=30
esp:            Cmd dump for Target 1 Lun 0:
esp:            cdblen=6, cdb=[ 0x12 0x0 0x0 0x0 0x30 0x0 ]; Status=0x0
esp:            pkt_state=0x1f pkt_flags=0xb pkt_statistics=0x0
esp:            cmd_flags=0x1422 cmd_timeout=60

Sunday, February 7, 2010

another week, another qemu bug

There are very few qemu/sparc modules out there which I haven't had to touch. Since I've started I founded/fixed bugs in: irq, esp, cpuesp, esp, cpu, scsi-disk, cpuscsi-disk, fdd, tcx, mmu, slavio, mmu. Today this list is extended with (sparc32_)dma.

Fixed a bug in dma which produced spurious interrupts and incomplete reads/writes. Will submit the patch later on this week.

Monday, February 1, 2010

Spurious interrupts

Previously qemu dropped interrupts on disabling them. The real hardware doesn't do it. Which means, that lots of interrupts were dropped, including the spurious ones. But the real ones were dropped too, that's why the system timer was ticking so slow.

The question is where the spurious  IRQs are coming from: it's not only the ESP which produces them, under Solaris 8 & 9 there are lots of complains about spurious timer interrupts, and they both seem to crash due to a buffer overflow during processing of a (possibly spurious) LE interrupt.

NetBSD 1.3.3 boot doesn't crash with my wrong irq patch. The patch makes qemu drop interrupts, including the spurious ones.

All in all it looked like there is one global problem with interrupts processing. Until now. But now it looks like there is not one source of spurious interrupts, but many.
  • esp doesn't seem to produce spurious interrupts under Solaris while reading.
  • I've found a bug in the slavio timer which produces spurious interrupts.
  • NetBSD may be crashing due to another issue: I don't have a disk which I could boot under OBP. It is possible that OpenBIOS is not compatible with the older NetBSD versions. Update: This is the reason for NetBSD crashing, Michael Kostylev confirmed it.

Saturday, September 5, 2009

Third bug in SCSI layer (esp) fixed

Up to now "select without attention" was handled the same way as "select with attention". According to NCR53C9X documentation, select without ATN sends the CDB (Command Descriptor Block) directly, whereas select with ATN sends one message phase byte followed by 6, 10, or 12 command phase bytes. This one byte was shifting CDB and producing invalid commands. After fixing this bug scsi probe looks like this:

ok probe-scsi
Target 0
Unit 0 Disk
Target 2
Unit 0 Removable Read Only device
ok

It still doesn't show all the target properties, but it doesn't matter. The next stop is booting.

Sunday, August 30, 2009

second (and third) bugs in SCSI (esp) emulation

It looks like "Message Accepted" shouldn't write a response. At least ESP_RFLAGS must definetely be 0.

After I fixed the bug, OBP got one step further. Now it sees the targets:

ok probe-scsi
Target 0
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command e0
scsi-disk: Unsupported command length, command e0
Target 2
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command e0
scsi-disk: Unsupported command length, command e0
ok

Next stop is inquiring targets parameters.