PyDC converter (was: dragon 32 cassette format ?)

Hardware Hacking, Programming and Game Solutions/Cheats
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

My origin BASIC test code is:

Code: Select all

10 FOR I = 1 TO 10
20 PRINT I;"HELLO WORLD!"
30 NEXT I
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

Current state is on github. Last commit is: https://github.com/jedie/python-code-sn ... 9b258cea6d

I have insert a test, if i can found on binary level this: "HELLO WORLD!"
This exists in both WAV files (also on github downloadable!)

So i think the WAVE2bits decoding must be ok.

Think i have a problem to synchronise... Just to search for "10101010" and slice into 8bits blocks is not the right way...
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

Next step: I catch the sync byte with:

Code: Select all

def goto_next_block(bit_list, debug=False):
    """
    >>> bits = (
    ... "10101010" # 0x55 leader byte
    ... "00111100" # 0x3C sync byte
    ... "00010010" # 0x48 'H'
    ... )
    >>> bit_list = [int(i) for i in bits]
    >>> bit_list = goto_next_block(bit_list)
    >>> bit_list
    [0, 0, 0, 1, 0, 0, 1, 0]

    more bits inserted:
    >>> bits = ("1010" # inserted
    ... "101010100011110000010010")
    >>> goto_next_block([int(i) for i in bits])
    [0, 0, 0, 1, 0, 0, 1, 0]

    no complete leader byte
    >>> bits = ("1010" # incomplete
    ... "0011110000010010")
    >>> goto_next_block([int(i) for i in bits])
    [0, 0, 0, 1, 0, 0, 1, 0]
    """
    end = get_last_pos_iter_steps(bit_list, LEADER_BYTE)
    if not end:
        if debug:
            print "INFO: leader byte block end not found."
    else:
        if debug:
            print "leader byte block end found at:", end
        bit_list = bit_list[end:]

    sync_pos = get_start_pos_iter_window(bit_list, SYNC_BYTE)
    if sync_pos is None:
        if debug:
            print "ERROR: Sync byte '%s' not found!" % SYNC_BYTE
        sys.exit(-1)
    if debug:
        print "Sync byte '%s' found at position: %i" % (SYNC_BYTE, sync_pos)

    # Cut until sync byte
    cut_pos = sync_pos + len(SYNC_BYTE)
    bit_list = bit_list[cut_pos:]

    return bit_list
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
User avatar
tormod
Posts: 416
Joined: Sat Apr 27, 2013 12:06 pm
Location: Switzerland
Contact:

Re: dragon 32 cassette format ?

Post by tormod »

Great project! I use xroar for converting WAV to CAS for single files, but it is not easily scriptable for more automated tasks. For instance I have some WAV's of whole cassettes that I want to extract all the files from (CAS would do). I tried DC.EXE for this (running in DOSEMU) with variable luck, and that program is anyway dead (no source) I believe.

A set of modular tools would be awesome: WAV2CAS, CAS2BAS, BAS2CAS.

I have thought about making CAS2WAV an option of makewav since it already has all the bits for it. However a CAS optimized for emulators (no long syncs and pauses) cannot always be easily converted to a WAV for real machines. This reminds me that sixxie was working on a new CAS format to better encode such things.

makewav also does ML2CAS or ML2WAV but does not deal with BASIC files. It would be easy to extend makewav to create CAS and WAV from raw BASIC files as well, but I guess (de)tokenizing rather belongs in another tool.
User avatar
robcfg
Posts: 1532
Joined: Sat Apr 04, 2009 10:16 pm
Location: Stockholm, Sweden
Contact:

Re: dragon 32 cassette format ?

Post by robcfg »

In an ideal world the number of lead bits should be multiple of 8, but reality is quite different.

What I do, and what I think you're doing now, is follow the 0x55 bytes until you find a different value. If the next byte is 0x3C then everything is ok and that's a block start. Otherwise, and knowing that the start of block is 0x3C, you'll have to skip bits until the value is 0x3C and continue reading bytes from that position.

For example, if your wav file is like this:

10101010 (0x55) 00111100 (0x3C) ... This is the ideal case

10101010 (0x55) 10001111 (0xF1) 00... There are two bits out of sequence.

Knowing that after all the 0x55's there has to be a 0x3C, you can make a table like this one (sorry, it's C++):

Code: Select all

case 0x78:bitsToSkip = 1;break;
case 0xF1:bitsToSkip = 2;break;
case 0xE2:bitsToSkip = 3;break;
case 0xC5:bitsToSkip = 4;break;
case 0x8A:bitsToSkip = 5;break;
case 0x15:bitsToSkip = 6;break;
case 0x2A:bitsToSkip = 7;break;
So you know how many bits to discard to get the true block start marker.

In my example, you'll get a 2, meaning that you have to skip 2 bits:

10101010 (0x55) [10](skipped) 00111100 (0x3C) ...

I hope the example is good enough, otherwise, feel free to ask further.
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

robcfg wrote:What I do, and what I think you're doing now, is follow the 0x55 bytes until you find a different value. If the next byte is 0x3C then everything is ok and that's a block start.
That is what i doning now and it seems to work good, see: https://github.com/jedie/python-code-sn ... 9a8ffffcca (many white-space-changes :( )

Ans the output looks like this:

Code: Select all

TestResults(failed=0, attempted=44)
Read 'HelloWorld1 xroar.wav'...
Numer of audio frames: 75025
read...
Framerate: 22050
samplewidth: 1
channels: 1
struct_unpack_str: b
4760 bits decoded.

===============================================================================
Start leader '10101010' found at position: 1
bits before header: '1'
Found 257 leader bytes
Find sync byte after 0 Bits
*** block type: 0x0 (filename block)
*** block length: 15
   0 - 00000100 00000100 00000100 00000100 00000100 00000100 00000100 00000100 

00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000100 0x20  32 ' '
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
===============================================================================
Start leader '10101010' found at position: 8
bits before header: '11110000'
Found 258 leader bytes
Find sync byte after 0 Bits
*** block type: 0x1 (data block)
*** block length: 50
   0 - 01111000 01001000 00000000 01010000 00000001 00000100 10010010 00000100 
   8 - 11010011 00000100 10001100 00000100 00111101 00000100 10001100 00001100 
  16 - 00000000 01111000 10010100 00000000 00101000 11100001 00000100 10010010 
  24 - 11011100 01000100 00010010 10100010 00110010 00110010 11110010 00000100 
  32 - 11101010 11110010 01001010 00110010 00100010 10000100 01000100 00000000 
  40 - 01111000 10001100 00000000 01111000 11010001 00000100 10010010 00000000 

-------------------------------------------------------------------------------
01111000 0x1e  30 '\x1e'
01001000 0x12  18 '\x12'
00000000  0x0   0 '\x00'
01010000  0xa  10 '\n'
00000001 0x80 128 ' FOR '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
00000100 0x20  32 ' '
11010011 0xcb 203 '='
00000100 0x20  32 ' '
10001100 0x31  49 '1'
00000100 0x20  32 ' '
00111101 0xbc 188 ' TO '
00000100 0x20  32 ' '
10001100 0x31  49 '1'
00001100 0x30  48 '0'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
10010100 0x29  41 ')'
00000000  0x0   0 '\x00'
00101000 0x14  20 '\x14'
11100001 0x87 135 ' PRINT '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
11011100 0x3b  59 ';'
01000100 0x22  34 '"'
00010010 0x48  72 'H'
10100010 0x45  69 'E'
00110010 0x4c  76 'L'
00110010 0x4c  76 'L'
11110010 0x4f  79 'O'
00000100 0x20  32 ' '
11101010 0x57  87 'W'
11110010 0x4f  79 'O'
01001010 0x52  82 'R'
00110010 0x4c  76 'L'
00100010 0x44  68 'D'
10000100 0x21  33 '!'
01000100 0x22  34 '"'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
10001100 0x31  49 '1'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
11010001 0x8b 139 ' NEXT '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
===============================================================================
===============================================================================
Start leader '10101010' found at position: 2
bits before header: '11'
Found 2 leader bytes
Find sync byte after 6 Bits
*** block type: 0xff (end-of-file block)
*** block length: 0
end of file.
Now i will look to convert the ASCII block to plain BASIC code. The tokens would be replace
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

The question is now, how is the basic code formatted?

The raw data is:

Code: Select all

01111000 0x1e  30 '\x1e'
01001000 0x12  18 '\x12'
00000000  0x0   0 '\x00'
01010000  0xa  10 '\n'
00000001 0x80 128 ' FOR '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
00000100 0x20  32 ' '
11010011 0xcb 203 '='
00000100 0x20  32 ' '
10001100 0x31  49 '1'
00000100 0x20  32 ' '
00111101 0xbc 188 ' TO '
00000100 0x20  32 ' '
10001100 0x31  49 '1'
00001100 0x30  48 '0'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
10010100 0x29  41 ')'
00000000  0x0   0 '\x00'
00101000 0x14  20 '\x14'
11100001 0x87 135 ' PRINT '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
11011100 0x3b  59 ';'
01000100 0x22  34 '"'
00010010 0x48  72 'H'
10100010 0x45  69 'E'
00110010 0x4c  76 'L'
00110010 0x4c  76 'L'
11110010 0x4f  79 'O'
00000100 0x20  32 ' '
11101010 0x57  87 'W'
11110010 0x4f  79 'O'
01001010 0x52  82 'R'
00110010 0x4c  76 'L'
00100010 0x44  68 'D'
10000100 0x21  33 '!'
01000100 0x22  34 '"'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
10001100 0x31  49 '1'
00000000  0x0   0 '\x00'
01111000 0x1e  30 '\x1e'
11010001 0x8b 139 ' NEXT '
00000100 0x20  32 ' '
10010010 0x49  73 'I'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
00000000  0x0   0 '\x00'
This is from this basic code:

Code: Select all

10 FOR I = 1 TO 10
20 PRINT I;"HELLO WORLD!"
30 NEXT I
Raw data again, with inserted comments:
01111000 0x1e 30 '\x1e' << a marker before every code line?
01001000 0x12 18 '\x12' << what's this? A checksum?
00000000 0x0 0 '\x00' << start code

01010000 0xa 10 '\n' << Line Number 10
00000001 0x80 128 ' FOR '
00000100 0x20 32 ' '
10010010 0x49 73 'I'
00000100 0x20 32 ' '
11010011 0xcb 203 '='
00000100 0x20 32 ' '
10001100 0x31 49 '1'
00000100 0x20 32 ' '
00111101 0xbc 188 ' TO '
00000100 0x20 32 ' '
10001100 0x31 49 '1'
00001100 0x30 48 '0'
00000000 0x0 0 '\x00' << end of this code line
01111000 0x1e 30 '\x1e' << a marker before every code line?
10010100 0x29 41 ')' << what's this? A checksum?
00000000 0x0 0 '\x00' << start new line of code

00101000 0x14 20 '\x14' << Line Number 20
11100001 0x87 135 ' PRINT '
00000100 0x20 32 ' '
10010010 0x49 73 'I'
11011100 0x3b 59 ';'
01000100 0x22 34 '"'
00010010 0x48 72 'H'
10100010 0x45 69 'E'
00110010 0x4c 76 'L'
00110010 0x4c 76 'L'
11110010 0x4f 79 'O'
00000100 0x20 32 ' '
11101010 0x57 87 'W'
11110010 0x4f 79 'O'
01001010 0x52 82 'R'
00110010 0x4c 76 'L'
00100010 0x44 68 'D'
10000100 0x21 33 '!'
01000100 0x22 34 '"'
00000000 0x0 0 '\x00' << end of this code line
01111000 0x1e 30 '\x1e' << a marker before every code line?
10001100 0x31 49 '1' << what's this? A checksum?
00000000 0x0 0 '\x00' << start new line of code

01111000 0x1e 30 '\x1e' << Line number 30
11010001 0x8b 139 ' NEXT '
00000100 0x20 32 ' '
10010010 0x49 73 'I'
00000000 0x0 0 '\x00' << end of code
00000000 0x0 0 '\x00'
00000000 0x0 0 '\x00'
Please read the comments. Does anyone know what the marked bytes means?
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

Hm. The "Line Number" can't be exist only in one byte. Oserwise the line number is limited to <256 isn't it?
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
User avatar
robcfg
Posts: 1532
Joined: Sat Apr 04, 2009 10:16 pm
Location: Stockholm, Sweden
Contact:

Re: dragon 32 cassette format ?

Post by robcfg »

You're right, and I found the format of tokenized Basic here.

Now everything seems in place :)
jedie
Posts: 655
Joined: Wed Aug 14, 2013 12:23 pm
Location: germany
Contact:

Re: dragon 32 cassette format ?

Post by jedie »

robcfg wrote:I found the format of tokenized Basic here.
That helps. But i didn't bring all together.

Here the same data from above:
0x1e
0x12 << memory address of the next line ?
0x0 << start new line of code
0xa == Line Number 10

...FOR I = 1 TO 10...

0x0 << End of line delimiter
0x1e
0x29 << memory address of the next line ?
0x0 << start new line of code
0x14 == Line Number 20

...PRINT I;"HELLO WORLD!"...

0x0 << End of line delimiter
0x1e
0x31 << memory address of the next line ?
0x0 << start new line of code
0x1e == Line Number 30

...NEXT I...

0x0 << End of line delimiter
0x0 0x0 << end of code
The red and green bytes are the questions...

Are yes both/together the "memory address of the next line" value?
How to interpret them?
... too many ideas and too little time ... Related stuff written in Python:
Dragon 32 emulator / PyDC - Python Dragon 32 converter: https://github.com/jedie/DragonPy
DWLOAD server / Dragon-Lib and other stuff: https://github.com/6809
Post Reply