Scrolling Graphics Questions

Hardware Hacking, Programming and Game Solutions/Cheats
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Scrolling Graphics Questions

Post by pser1 »

Hello,
I am a complete newby concerning graphics, so please don't shoot me!
I have been coding some scroll routines to move one pixel up and down an image in PMODE3, and others moving two pixels right and left ...
Once done, I have tried to get them a bit faster, so I have used LWASM to make it cycle-counting.
These are some results:

Code: Select all

DRAGON a 0,89MHz x 50 frames/sec  -> 17.800 cycles x frame
ScrollType	  1st FS	  NumFrm   1st Opt  NumFrm		Perfor.	 Reduc
HSCROLL		 118.638		6,66		97.470	5,48		82,15 %	17,85 %
HSCROLR		 119.790		6,73		96.892	5,44		80,88 %	19,12 %
HSCROLLC		 130.542	 	7,33	  104.574	5,88		80,11 %	19,89 %
HSCROLRC		 130.926	 	7,36	  103.228	5,80		78,84 %	21,16 %
VSCROLD		  50.922		2,86		41.680	2,34		81,85 %	18,15 %
VSCROLU		  50.933		2,86		41.680	2,34		81,83 %	18,17 %
VSCROLDC		  51.057		2,87		41.781	2,35		81,83 %	18,17 %
VSCROLUC		  51.056		2,87		41.780	2,35		81,83 %	18,17 %
The second column contains the number of cycles to process a whole screen, so the number of frames needed is HIGH
specially for the horizontal ones as you can see in the third column.
After aplying some optimizations, the second wave of routines performed better and the figures on the 4th and 5th column relate to the modified ones.
I am preparing the screen at one page and once ended I switch the SAM bits that contain the beginning of the image.
Before doing so I do poll for the FS interrupt to arrive, so I assume that the decimal numbers should have to be rounded
up to know the number of frames that the process needs.
As an example, VSCROLD lasts 2,87 so 3 FS, and the modified (better) version uses 2,35 so 3 FS too
According to this, there should be no difference between them, but for sure the second is really 18% faster than the other.
What am I missing?
Is there any way (formula or whatever) to calculate how many frames must pass before the screen gets updated?
It seems that VSCROLD should move at 50/3 = 16 frames per second
Sorry for so many questions ...
I would really appreciate any info / pointer on this subject
Thanks beforehand

regards
pere
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Scrolling Graphics Questions

Post by sorchard »

Hi Pere,

Your calculations seem fine to me. All I can think of is perhaps the polling code is not waiting for vsync, and is falling through because the vsync flag is already set.

If that is the case then the solution is to clear the vsync flag (lda $ff02) just before the polling code instead of after.

If you would like a slightly more accurate calculation, then the exact cycles per frame is 312 lines x 57 cycles = 17784 cycles

I'm intrigued as to what it might be that you're scrolling :)
Stew
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Scrolling Graphics Questions

Post by pser1 »

Hi Stew,
I am attaching here an VDK I have compiled for spanish friends at the RetroWiki web
It contains scrolls for PMODE3 full screen in these types:
left normal and cycling
right both ways
up both ways
down both ways
With RUN"TESTALL" you can choose any of them ...
Finally there is another bas to move upper half screen at double spped
than the lower half ... just do RUN"TPAR"
In the second zip you will find all of the sources.
I am really reading $ff02 after the detection ... and in two of them I even put a SYNC after that!

cheers
pere
SCROLLS PMODE3 v0.3.zip
(5.89 KiB) Downloaded 320 times
Scrolls PM3 - Source files.zip
(11.48 KiB) Downloaded 312 times
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Scrolling Graphics Questions

Post by sorchard »

Hi Pere,

Looks like you're making a nice slide show app.

Either of these vsync methods should work for you:

Code: Select all

    lda $ff02
wt_sync    lda $ff03
    bpl wt_sync
or

Code: Select all

   lda $ff02
   sync
The idea behind putting lda $ff02 before sync is that it guarantees the code will wait until the next vsync.
Stew
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Scrolling Graphics Questions

Post by pser1 »

Hi Stew,
thanks a lot.
I will try both methods.
These are simple examples for a friend from Retrowiki that asked me to create some kind
of scrolling routine for the Dragon. Probably he is willing to create a a game!
The vertical are faster (double) than the horizontal despite they move 192 lines
whilst the horizontal move 128 steps of 2 bits each. Clearly the needed code for this last
operation is slower ... for sure in my implementation!
I would like to make them go faster ...
I found that indexed n,R was better than ,R++ and that's why I unwounded
the normal loops so that they 'need' less cycles at the cost of more bytes (size-speed as usual).

cheers
pere
sixxie
Posts: 1346
Joined: Fri Jul 18, 2008 8:36 am
Location: Hertfordshire
Contact:

Re: Scrolling Graphics Questions

Post by sixxie »

Yeah horizontal will always be slower. Unless you maintain four buffers (shifted 0, 2, 4, 6 pixels), then you can use byte aligned operations. Lots of RAM used though!

Are you using stack ops for your vertical scrolls? You can save a lot of cycles that way. With interrupts disabled on the PIAs, you can copy 8 bytes at a time - should be able to do a whole hi-res screen in less than 2 frames.
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Scrolling Graphics Questions

Post by sorchard »

Just to echo what sixxie is saying:

Stack ops do give a big speed up and "Four buffer copy horizontal scroll (exploding heart?) technique" gives amazing results but some sort of compromise is required to keep the memory use practical. e.g. drop the resolution to PMODE1, or have a smaller play area or scroll 4 bits at a time instead of 2. Or have a D64 ;)
Stew
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Scrolling Graphics Questions

Post by pser1 »

sixxie wrote:Yeah horizontal will always be slower. Unless you maintain four buffers (shifted 0, 2, 4, 6 pixels), then you can use byte aligned operations. Lots of RAM used though!
Are you using stack ops for your vertical scrolls? You can save a lot of cycles that way. With interrupts disabled on the PIAs, you can copy 8 bytes at a time - should be able to do a whole hi-res screen in less than 2 frames.
Hi Ciaran,
Thanks a lot for that idea, I would have never thought about that possibility!
I had tested an initial version with stack ... this is an unwounded version with macros:

Code: Select all

			org	$4000					; donde ubicar el programa
start		orcc	#$50					; deshabilita interrupciones
			sts	<$e6					; guarda el stack
			ldb	#192					; numero de lineas a hacer scroll
			stb	<$76					; lo guarda en variable pagina 0
L1			lds   #$23c0				; apunta al primer byte de la penúltima fila de pixels - #ori apunta al inicio de datos a mover

c1lin		macro
			puls  d,x,y,u				; copia 8 bytes del origen
			leas  32,s					; apunta a octavo byte del inicio siguiente linea
			pshs  d,x,y,u				; guarda los 8 bytes en destino
			leas  -24,s					; apunta a siguiente linea origen para seguir copiando
			puls  d,x,y,u				; copia 8 bytes del origen
			leas  32,s					; suma offset para obtener dirección de destino
			pshs  d,x,y,u				; guarda los 8 bytes en destino
			leas  -24,s					; resta el offset y le suma 8 para seguir copiando
			puls  d,x,y,u				; copia 8 bytes del origen
			leas  32,s					; suma offset para obtener dirección de destino
			pshs  d,x,y,u				; guarda los 8 bytes en destino
			leas  -24,s					; resta el offset y le suma 8 para seguir copiando
			puls  d,x,y,u				; copia 8 bytes del origen
			leas  32,s					; suma offset para obtener dirección de destino
			pshs  d,x,y,u				; guarda los 8 bytes en destino
			leas  -88,s					; apunta 3 filas atrás (la destino, origen y otra)
			endm
			
c10lin	macro
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			c1lin
			endm

			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c10lin
			c1lin

			dec	<$76					; decrementa contador de lineas
			lbne	L1						; no finalizado, sigue
			lds	<$e6					; restore stack
			andcc	#$af					; habilita interrupciones
			rts							; retorna
			end 	start
cheers
pere
pser1
Posts: 1655
Joined: Sun Mar 25, 2012 7:32 pm
Location: Barcelona (SPAIN)

Re: Scrolling Graphics Questions

Post by pser1 »

sorchard wrote:Just to echo what sixxie is saying:
Stack ops do give a big speed up and "Four buffer copy horizontal scroll (exploding heart?) technique" gives amazing results but some sort of compromise is required to keep the memory use practical. e.g. drop the resolution to PMODE1, or have a smaller play area or scroll 4 bits at a time instead of 2. Or have a D64 ;)
Hi Stew,
I don't know what exactly my Retrowiki friend wants to do with the scroll ...
That's why I prepared the eight variants, he has not yet decided which one is he going to use ...
Even he talked about a paralax style scrolling, so I tested in the last one with two parts scrolling at different speeds.
Probably the best will be reducing the scrolling area as much as possible.
By the way I have added an old version using the stack and macros. Don't remember why I changed to the routines
that are in the last versions (?)
Using a D64 will not help that much, as I am using disks, so Basic mandatory too, only eight extra Kbytes (one screen!)
when switching to map1, unless I accept to detach DOS from the program ...
cheers
pere
sorchard
Posts: 530
Joined: Sat Jun 07, 2014 9:43 pm
Location: Norwich UK

Re: Scrolling Graphics Questions

Post by sorchard »

Hi Pere,

You can make it go even faster if you use both stacks. Sixxie's suggestion was to disable interrupts in the pias and make use of cc & dp as well, i.e. something like this to move the screen up one line:

ldu #copy_from_address
lds #copy_to_address+8

; repeat these 3 instructions many times
pulu cc,dp,d,x,y
pshs cc,dp,d,x,y
leas 16,s

Even the leas 16,s can be removed if the source data is arranged in the right way. (Though not much use if the screen is being scrolled onto itself)

If interrupts are disabled you can use your extra 32K in MAP1 for anything you like. You just need to switch back to MAP0 when you want to use disks & basic again.
Stew
Post Reply