I had a sudden urge to do something ludicrous with the Dragon, so I hooked up an FPGA to the cartridge port. It was an ideal opportunity to experiment with some simple DMA-based graphics acceleration
I encountered the same floating address bus issue with the SYNC instruction that was behind reliability problems encountered with other add-ons. It would seem that using more modern CMOS logic encourages bad behaviour on the bus. (TTL era logic does a reasonable job of holding itself at a high level)
Now I can get back to the thing that was distracting me from the other thing
All the data is held in the Dragon's memory and the 6809 is animating the coordinates. Everything is happening at normal speed.
The blitter is copying data from one place in the Dragon's memory to another, pixel-shifting it on the fly.
The background is copied at a rate of two cycles per byte. (One read, one write). The background is a giant sprite, slightly wider than the screen and is clipped at the left and right screen edges.
The bouncing sprites are copied at a rate of four cycles per byte. (Read mask, read image, read screen, write screen). Again, pixel-shifting and AND/OR operations are done on the fly.
For each graphics object, the 6809 sets up the blitter registers and then triggers a copy operation. The blitter takes over by halting the 6809. When the operation is complete the 6809 starts running again.
sixxie wrote: ↑Sat Feb 20, 2021 11:06 pm
So this is why you've been quiet
Heh, I'm usually up to something
The design would fit into an XC95288XL, using up about 85%. That's the only 5V tolerant part I can think of off the top of my head.
For an FPGA you're looking at 461 LUT4s & 150 flops.
At 3V3 there are quite a few choices under £10. Level conversion and power supply components would be necessary of course.
e.g. EPM570, LCMXO2-640. ICE40-HX1K fpgas look interesting too. These would leave plenty of room for a larger design with full clipping plus audio gen. I'm calling dibs on the name 'HardCore' jk
In this demo, is there any proportion of the CPU time that's free (not setup, and unhalted) cycles? Or have you filled the frame period to the brim?
My mind is always drawn to all of the time on the bus available from the inside of the machine (AVMA and !E when outside of the display area). An internal DMA implementation could move a ton of data for free!
That demo does about represent the most that can be done in 20ms via the Dragon bus. To get a decent jump in performance would require something like building the whole screen image in external memory and copying it back. (Or not, and calling it a video card!)
I think I was able to get over 60 of those bouncing sprites going in one video frame when I disabled the background. That means they were of course leaving trails so again not a practical application.
DMA during NVMA cycles would be pretty cool. Dunno how much of an issue the resulting loss of atomicity would be though i.e. read/modify/write instructions would end up torn. I think the BUSY signal is intended to be used to maintain atomicity.
Actually I suppose you could run another processor in the NVMA cycles. Same issue with torn instructions on both processors. One for the concurrency experts to ponder
Most of the Williams 6809-based arcade games had an on-board blitter to move pixels to the bitmap display. The very earliest games, such as Defender, didn't use one, but most of the later ones did. There were 2 variants of the blitter chip, the latter just a bugfix.
Many years ago I did an FPGA implementation of Williams games including Robotron & Joust, which did have the blitter (Rev1). I'm going completely off my memory here, but your description of how your blitter works sounds very similar to how the Williams one did (I guess there's only so many ways to achieve this on a simple 8-bit micro system).