The next program in our project is
EnTab which replaces runs of blanks in a text file by tabs and blanks. Here is the specification, i.e., the "manual"
PROGRAM
EnTab -- convert runs of blanks into tabs
USAGE
EnTab
FUNCTION
EnTab copies its input to its output, replacing strings of blanks by
tabs so that the output is visually the same as the input but contains
fewer characters. Tab stops are assumed to be set every 3 columns
(i.e., 1, 4, 7, ...), so that each sequence of one to four blanks
ending on a tab stop is replaced by a tab character.
BUGS
1. EnTab is naive about backspaces, vertical motions and non-printing
characters.
2. EnTab will convert a single blank to a tab if it occurs at a tab
stop, thus EnTab is not an exact inverse of DeTab.
3. if any record in the input is longer than 255 char Entab will
truncate that record to 255 char.
We first address how tab stops are to be represented in the program. Since we will next write the inverse program
DeTab, we choose to write a unit,
TabsUnit, to be used by both programs and thus ensure conformity between the two programs.
Here is the code for the unit
TabsUnit:
Unit TabsUnit ;
{
used by DeTab and EnTab
}
interface
Uses
Tools ;
Type
TabType = Set of Byte ;
Var
TabSet : TabType ;
Procedure SetTabs ;
implementation
Procedure SetTabs ;
{
SetTabs -- set initial tab stops
}
CONST
TABSPACE = 3 ; { 3 spaces per tab }
Var
i : Byte ;
begin { SetTabs }
for i := 0 to MAXSTR do
if i MOD TABSPACE = 1 then
Include (TabSet, i)
else
Exclude (TabSet, i)
end ; { SetTabs }
end.
As you can see we choose to represent tab stop with a set of byte. Any number in the range 0 .. 255 that is a member of the set is a tab stop. The set is declared to be a user defined type
TabType. The unit also declares a variable
TabSet : TabType and a procedure
SetTabs which sets a tab every three spaces.
Our first cut of the program itself is (in pseudocode) :
Program EnTab
begin
while not eof
read a string from standard input
entab that string
write the string to standard output
end
end
This approach reduces the entire problem to one of entabbing a string. By reading in an entire string before entabbing it we are able to access any char in the string at any time and are not forced to deal with the data on-the-fly. The price we pay for this is that our program will not be able to handle records that are longer than the maximum length of a Turbo Pascal
string, 255 chars.
Our experience is that by approaching a programming problem from the viewpoint of how a human being would tackle it we reduce its complexity.
The question becomes "How would a human decide where to put tabs and what blanks to eliminate?" This question already breaks the problem into two parts that we can consider separately
(1)where do we put tabs? Obviously any space that immediately precedes a tab stop is a candidate for replacement by a tab.
(2)what blanks do we eliminate? Having completed the first step we simply delete any and all blanks that immediate precede a tab. Since it is possible, indeed quite likely, that more than one blank will precede a tab this needs to be done in a loop.
Here is our code for EnTab.
Program EnTab ;
{
EnTab -- replace blanks with tabs and blanks
}
Uses
Tools,
TabsUnit ;
Procedure EnTabStr (Var Str : String) ;
{
EnTab a single line
}
Var
i : Byte ;
begin { EnTabStr }
{
first scan of the string - where to put tabs
}
for i := 1 to Length (Str) do
if (Str[i] = BLANK) AND (i + 1 in TabSet) then
Str[i] := TAB ;
{
second scan of the string - what blanks to eliminate
}
for i := 1 to Length (Str) do
if Str[i] = TAB then
while Str[i - 1] = BLANK do begin
Delete(Str, i - 1, 1) ;
i := i - 1
end
end ; { EnTabStr }
Var
Str : String ;
TabStops : TabType ;
begin { EnTab }
SetTabs ;
while NOT eof do begin
ReadLn (Str) ;
EnTabStr (Str) ;
WriteLn (Str)
end
end. { EnTab }
As you can see, by working with a string we are able to scan the record and implant tabs, then [i]back up[/i] and scan the record again to eliminate blanks. You cannot back up if dealing with the data stream on-the-fly.
We have already acknowledged that the program contains a "bug" in not being able to handle records longer than 255 chars. We believe the bug is inconsequential and unlikely ever to manifest itself. If the bug is a problem there are ways to cure it without resorting to on-the-fly
processing. The first is to define a type LongString:
Type
LongString = Record
Len : Word ;
LongS : Array [1 .. 32767] of char
end ;
Allowing a string up to 32767 char. This approach will require you to write several supporting routines: ReadLongString, WriteLongString, LongLength, LongDelete. You will also have to redefine how tabs are represented and detected since a set can have only 256 members. The simplest would be to have every position after 255 automatically be a tab stop. This suggests that one might be able to rewrite the main routing thus:
begin { EnTab }
SetTabs ;
while NOT eof do begin
Read (Str) ;
EnTabStr (Str) ;
Write (Str) ;
{
if record is longer than 255 chars
}
while NOT eoln do begin
Read(Ch) ;
Write(Ch)
end ;
{
move on to next record
}
ReadLn ;
WriteLn
end
end. { EnTab }
Leaving the tabs representation and EnTabStr intact.
Finally, you could read each record into a typed file of char, effectively
giving a string representation that could be over two billion char long.
That should be enough for anybody.
I have not written, debugged or tested any code using this approach. I leave that to you.