Stackblast copying

A place to discuss everything Dragon related that doesn't fall into the other categories.
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Stackblast copying

Post by pser1 »

I have been tinkering with the idea of copying a graphic screen while the RAM is not accesible to the VDG (after the end of each frame)
and so, I have coded this small routine, but the cycle counting that I get is far too big to do that ...
The cycles count obtained is that one:
- intro part = 40 cycles
- Loop part = 132 cycles
- final part = 29 cycles
Only the loop for the 156 lines needs: 132x156=20,592 cycles

But we have (in PAL machines) 70+50 lines after the FS at end of frame, so working at double speed, we have 120x47x2=13,680 cycles
but this is not enough to do the stackblast copy!
The code is shown at the end of the message

It seems that I have missed something, but cannot spot what!
Does anybody have any hint to copy the whole screen before the next frame begins?
Any ideas will be very appreciated
I know that ordering the origin in a special way, I could avoid re-positioning the stack 3 of the 4 times every line
but anyway it keeps being a too big total cyclecount.
regards
pere

Code: Select all

; --------------------------------------------------------------------------------------------------
; stack copier to move a screen 256x156 from $6700 to $c00
; --------------------------------------------------------------------------------------------------
stkBlst	sync
			opt	cc								; init cyclecounting
			sta	$ffd9							; double speed
			sts	stkF03+2						; save stack
			tfr	dp,a							; get DP
			tfr	cc,b							; and CC
			std	stkF04+1						; save both
			orcc	#$50							; disable interrupts
			lds	#$6700+8						; set origin pointer to get 8 bytes from source data 
			ldu	#$c00							; set destination to screen beginning
			opt	cc								; reset cyclecounting
stkF02	puls	d,x,y,cc,dp					; get 8 bytes from origin
			pshu	d,x,y,cc,dp					; put them into destination
			leas	16,s							; advance pointer to next 8 bytes block
			puls	d,x,y,cc,dp
			pshu	d,x,y,cc,dp
			leas	16,s							; eight bytes more
			puls	d,x,y,cc,dp
			pshu	d,x,y,cc,dp
			leas	16,s							; eight bytes more			
			puls	d,x,y,cc,dp
			pshu	d,x,y,cc,dp
			leas	16,s							; eight bytes more = 32 bytes -> one PM4 line		
			cmpu	#$2000						; end of screen to be filled?
			blo	stkF02						; no, next file
			opt	cc								; reset cyclecounting
			sta	$ffd8							; normal speed
stkF03	lds	#$0000						; restore stack pointer
stkF04	ldd	#$0000						; get DP, CC
			tfr	a,dp							; set DP
			tfr	b,cc							; and CC
			rts									; return
; --------------------------------------------------------------------------------------------------
User avatar
Bosco
Posts: 330
Joined: Tue Mar 04, 2014 11:49 pm
Location: Nottingham, UK

Re: Stackblast copying

Post by Bosco »

Hi Pere.

Copying that much data during the vertical blank sounds ambitious?

Flagon Bird was double-buffered meaning I had the entire frame to update the (minimal) logic and redraw the display.

My current project gets away with a single buffer but still runs at a stable 60Hz. My code is much faster than before and I only draw the bare minimum each frame but even still, when the game starts getting busy, I'll be finishing drawing elements at the bottom of the display while the top has already begun refreshing.

I find it useful sometimes to switch VDG colour sets after drawing has finished and back again after updating has finished. In this example you can see drawing has over-run by three scanlines but the game is taking less than a third of a frame to update. Of course it changes from frame to frame but gives you a sense of performance and can be a handy tool for visualising performance spikes.

I haven't really answered your stack blast question but maybe provided some food for thought?

Image
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Stackblast copying

Post by pser1 »

Hi,
thanks a lot for these ideas, Steve.
In fact I have no problems updating the screen while moving the main character, I was just
trying to load a whole screen from disk and show it the next frame.
Unfortunately I have only 6k RAM free where I could load that partial screen (256x156) and
thought that the stackblast copy could do the trick at double speed after a SYNC
But clearly it won't. Too many required cycles!
I don't want to switch pages in the SAM to get that (by now)
I was hoping I had forgotten something implementing the stackblast copy ...
I just wanted to try that principle. I have heard about it many times but had never used it.
In the rotator I used double buffer too and this solved the problem in a very clean way.
No problem at all, loading the image right to $c00 with Dragon32 and DDOS is fast enough
to be more than acceptable even when a fast load is desired.
cheers
pere

Ps I had thought of switching SAM, then do the stackblast at normal speed and switch back
again to work on the $c00 area. Maybe the delay this would add to the game could be
so small that it could not be noticed by the player ... dunno!
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Stackblast copying

Post by sorchard »

Hi Pere,

What you've done is already really fast. I don't know how to make it go faster without unrolling the loop and getting rid of all of the LEAS instructions.

I was going to agree with Bosco and say double buffering would be the best solution but now I've seen your reply, I think there's another way:

If you put in a delay so that when you start copying, the raster is already ahead and running away from you, it will be a long time before it catches up again. With your one LEAS per line, I think there's enough time to copy 160 lines.

Not related to performance but possibly of interest:

The interrupts probably won't stay disabled for long after the orcc #$50 because cc will be changed to 'random' values by the stack instructions. To completely disable the interrupts you will need to program the pias:

Code: Select all

; disable interrupts in pia
  ldx #$ff00
  lda #$34
  sta 1,x     ; disable hsync irq
  sta 3,x     ; disable vsync irq
  sta $21,x   ; disable printer firq
  sta $23,x   ; disable cart firq
vsync and cart are normally the only ones that are enabled but I like to be sure. You'll have to choose what is really necessary. There's also the Dragon disk NMI: clr $ff48 will disable it and turn off the motor. I'm not sure what needs to be done for CoCo.

An alternative to using sync to wait for vblank, is to use something like the following code. It avoids having to keep turning the interrupts on and off, unless you need interrupts for something else.

Code: Select all

      lda $ff02    ; clear vblank flag
loop  lda $ff03
      bpl loop     ; not vblank
cc and dp can be more easily saved on the stack, plus get an rts for 'free':

Code: Select all

    pshs cc,dp
    ;
    ; save stack pointer
    ; do copy
    ; restore stack pointer
    ;
    puls cc,dp,pc
Stew
User avatar
Bosco
Posts: 330
Joined: Tue Mar 04, 2014 11:49 pm
Location: Nottingham, UK

Re: Stackblast copying

Post by Bosco »

That's a great suggestion Stew that I hadn't considered. :D
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Stackblast copying

Post by pser1 »

Hi Stew and Steve,
Thanks a bunch for your ideas, they are really great!
As I have said, I have no problems moving the main character and it changes
location very often and, by now, Dragon DOS seems to be fast enough loading
every screen.
I was thinking that maybe the last screen should appear one-shot,so that it
doesn't seem it has been just loaded from disk.
I prepare the disk so that the screens are saved into the faster places. Sometimes
one gets a not optimum sectors chunk and loads a bit slowly, then I rename it to 'waste'
that area and copy it again, 100% of the times it loads very fast from the new place.

To sum up, I understand that if the blast copy is fast enough, it could be done and probably
the user will not detect that the bottom part is being updated while the VDG is updating
the top part. So the stackblast could be made in two parts, one at double speed
while VDG cannot access RAM and the rest at normal speed.
The question about CC changes, requiring disabling the IRQs on origin is really important!
I had not foreseen that! I will make some tests using these ideas.
One more time, many thanks for sharing so good ideas!
cheers
pere
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Stackblast copying

Post by pser1 »

Hello Stew,
- With your one LEAS per line, I think there's enough time to copy 160 lines.
In fact I am using 4 x LEAS for every screen line because I am moving that pointer
after every 8 bytes copy
Is there any way to reduce that without reworking the source data?
thanks in advance
pere
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Stackblast copying

Post by sorchard »

Hi Pere,

Removing the LEAS instructions would mean reworking the source data unfortunately, but I think you could do all of the copy in one part at normal speed with 1 x LEAS per line. You just need to begin 120x57 cycles after the start of vblank. This should give you more than 20000 cycles to do the copy before the vdg catches up, assuming a PAL machine, the copy is in a downward direction and the copy starts on the top line of the screen.

Thinking about it some more, it may be possible to do it with 4 x LEAS per line. If the copy routine starts as the VDG passes the top line of the display, there will be 17784 cycles before the VDG reaches the top of the new image. In that time you would have copied 17784/132 = 134 lines. It will now take the VDG another 134x57 = 7638 cycles to catch up with that point, so that's another 55 lines you've got time to draw. And then you'll have time for another 23 lines, etc. That easily copies the whole screen without tearing. Sounds too good to be true :)
Stew
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Stackblast copying

Post by pser1 »

Hi Stew,
this is an excellent lesson / solution about the problem I was trying to solve!
As this should be almost the last screen of the game, it is not important the values
that arrived to the blastcopy routine for the irq's (ff01-03-21-23)
So maybe the code could be something like this ...
cheers
pere

Code: Select all

stkBlst	lda 	$ff02							; clear vblank flag
stkF00  	lda 	$ff03				; (2)
      	bpl 	stkF00			; (5)		; if not vblank loop
			ldx	#847				; (8)		; 120x57=6,840 cycles -(8+56) cycles = 6,776 / 8 -> 847

			opt	cc								; init cyclecounting
lTime		leax	-1,x				; (5)
			bne	lTime				; (3)		; 847x8= 6.776

			opt	cc								; init cyclecounting
			pshs	cc,dp				; ( 7)	; save registers onto stack
			sts	stkF02+2			; (14)	; save stack
			orcc	#$50				; (17)	; disable interrupts
			ldx 	#$ff00			; (20)	; point to PIAs
			lda 	#$34				; (22)	; to disable irq's
			sta 	1,x				; (27)	; disable hsync irq
			sta 	3,x				; (32)	; disable vsync irq
			sta 	$21,x				; (37)	; disable printer firq
			sta 	$23,x				; (42)	; disable cart firq
			clr	$ff48				; (49)	; disable disk NMI
			lds	#$6700+8			; (53)	; set origin pointer to get 8 bytes from source data 
			ldu	#$c00				; (56)	; set destination to screen beginning

										; arrives here 6.840 cycles after the FS as desired
										
			opt	cc								; init cyclecounting
stkF01	puls	d,x,y,cc,dp		; (13 ) 	; get 8 bytes from origin
			pshu	d,x,y,cc,dp		; (26 ) 	; put them into destination
			leas	16,s				; (31 ) 	; advance pointer to next 8 bytes block
			puls	d,x,y,cc,dp		; (44 )  
			pshu	d,x,y,cc,dp		; (57 )  
			leas	16,s				; (62 ) 	; eight bytes more
			puls	d,x,y,cc,dp		; (75 )  
			pshu	d,x,y,cc,dp		; (88 )  
			leas	16,s				; (93 ) 	; eight bytes more			
			puls	d,x,y,cc,dp		; (106)  
			pshu	d,x,y,cc,dp		; (119)  
			leas	16,s				; (124) 	; eight bytes more = 32 bytes -> one PM4 line		
			cmpu	#$1f80			; (129) 	; end of screen to be filled?
			blo	stkF01			; (132) 	; no, next line

										; total cycles for those lines: 156x132=20,592 cycles
										; whole FS turn = 312x57=17,784 cycles (to get to line 1 again)
										; now blastcopy must do 20,592-17,784=2,808 cycles
										; which means 2,808/132=21.27 screen lines (lines from 135 to 156)
 
stkF02	lds	#$0000						; restore stack pointer
			puls	cc,dp,pc						; return
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Stackblast copying

Post by sorchard »

Hi Pere,

That looks good. I would change the direction so it copies downwards. Something like this:

Code: Select all

    ldu #copy_start    ; first byte of image to copy
    lds #$c00 + 8      ; start address of screen + correction for stack
    ;
    ;
    pulu d,x,y,cc,dp
    pshs d,x,y,cc,dp
    leas 16,s
    ;
    ;
    cmpu #copy_end     ; byte after last image byte
    blo
Stew
Post Reply