2.18 Parsing binary files

And what about binary files, like classic Forth blockfiles? Well, you could use 'REFILL' in that context too, but it would probably break up words since it can't find an end-of-line marker and its buffer is smaller than 1024 characters. Does that mean it can't be done? No! But 'REFILL' makes it easier for you, because it handles a few tasks automatically.

First, it has its own buffer (TIB). When you're not using 'REFILL' you have to define one yourself. Second, it terminates the string for you. You don't want 'WORD' to wander into new territory, do you? Third, it sets '>IN' for you every time its receives new input. You have to take care of that one too.

Never heard of '>IN'? Well, the only way for 'WORD' to know on what position the previous scan ended is to store that information into a variable. This variable is called '>IN'.

Not all internal 4tH variables are accessable, mostly because we can't imagine what use they could have to you. Some variables are just better left alone. But '>IN' is available for some very obvious reason: you can use it to point at your own input-buffer and make 'WORD' work for you.

The following program will read the first screen of a block-file for you and print out all the words. You will see that all spaces are eliminated and every word is printed on a new line, just the behaviour you would expect from 'WORD'.

     1025 constant size              \ screensize + terminator
     size 1- value c/scr             \ screensize

     size string WorkSpace           \ 1: our own buffer
     64 string filename              \ filename string

     : openfile                      \ open the block file
           c" romans.blk" filename copy
           input open                \ open block file
           if
                input file           \ read from file
           else
                ." Cannot open file"
                cr quit              \ message when error
           then
     ;

     : readfile                      \ fill the buffer
           WorkSpace c/scr over over \ address and count
           bl fill                   \ clear the buffer
           accept drop               \ fill the buffer
           input close               \ close the file
     ;

     : initparse                     \ configures parsing
           0 WorkSpace c/scr +  c!   \ 2: terminate screen
           WorkSpace >in !          \ 3: set >IN to Workspace
     ;

     : parseblock
           begin
                bl word              \ get word
                count dup 0<>        \ length zero?
           while
                type cr              \ if so, print it
           repeat

           drop drop                 \ else drop addr/cnt
           ." End of block" cr       \ signal "End of block"
     ;

     : parsefile                     \ do it all
           openfile                  \ open the file
           readfile                  \ read it
           initparse                 \ set up parsing
           parseblock                \ parse it
     ;

     parsefile

Note there is no need to reset '>IN'. If you use 'REFILL', it will be reset automatically. If you want to parse again or from another area, you will have to set '>IN' manually.

If you wonder where the 'C"' comes from, it is actually an alias for '"'. If you ever want to port your program to ANS-Forth, you'll have to use 'C"' inside colon-definitions and '"' outside. Note that 4tH doesn't care!