Current area: HOME -> Blogs -> Actor's Blog -> Read Post

Entabbing

Posted on Friday, December 28, 2007 at 3:54 AM
The next program in our project is EnTab which replaces runs of blanks in a text file by tabs and blanks. Here is the specification, i.e., the "manual"
PROGRAM
   EnTab -- convert runs of blanks into tabs
USAGE
   EnTab
FUNCTION
   EnTab copies its input to its output, replacing strings of blanks by
   tabs so that the output is visually the same as the input but contains
   fewer characters.  Tab stops are assumed to be set every 3 columns
   (i.e., 1, 4, 7, ...), so that each sequence of one to four blanks
   ending on a tab stop is replaced by a tab character.
BUGS
   1. EnTab is naive about backspaces, vertical motions and non-printing
      characters.
   2. EnTab will convert a single blank to a tab if it occurs at a tab
      stop, thus EnTab is not an exact inverse of DeTab.
   3. if any record in the input is longer than 255 char Entab will
      truncate that record to 255 char.


We first address how tab stops are to be represented in the program. Since we will next write the inverse program DeTab, we choose to write a unit, TabsUnit, to be used by both programs and thus ensure conformity between the two programs.

Here is the code for the unit TabsUnit:
Unit TabsUnit ;
{
   used by DeTab and EnTab
}
interface
   Uses
      Tools ;
   Type
      TabType  =  Set of Byte ;
   Var
      TabSet   :  TabType ;

   Procedure SetTabs ;
implementation
   Procedure SetTabs ;
   {
      SetTabs -- set initial tab stops
   }
   CONST
      TABSPACE = 3 ;    { 3 spaces per tab }
   Var
      i  :  Byte ;
   begin { SetTabs }
      for i := 0 to MAXSTR do
         if i MOD TABSPACE = 1 then
            Include (TabSet, i)
         else
            Exclude (TabSet, i)
   end ; { SetTabs }
end.


As you can see we choose to represent tab stop with a set of byte. Any number in the range 0 .. 255 that is a member of the set is a tab stop. The set is declared to be a user defined type TabType. The unit also declares a variable TabSet : TabType and a procedure SetTabs which sets a tab every three spaces.

Our first cut of the program itself is (in pseudocode) :
Program EnTab
begin
   while not eof
      read a string from standard input
      entab that string
      write the string to standard output
   end
end


This approach reduces the entire problem to one of entabbing a string. By reading in an entire string before entabbing it we are able to access any char in the string at any time and are not forced to deal with the data on-the-fly. The price we pay for this is that our program will not be able to handle records that are longer than the maximum length of a Turbo Pascal string, 255 chars.

Our experience is that by approaching a programming problem from the viewpoint of how a human being would tackle it we reduce its complexity. The question becomes "How would a human decide where to put tabs and what blanks to eliminate?" This question already breaks the problem into two parts that we can consider separately

(1)where do we put tabs? Obviously any space that immediately precedes a tab stop is a candidate for replacement by a tab.

(2)what blanks do we eliminate? Having completed the first step we simply delete any and all blanks that immediate precede a tab. Since it is possible, indeed quite likely, that more than one blank will precede a tab this needs to be done in a loop.

Here is our code for EnTab.
Program EnTab ;
{
   EnTab -- replace blanks with tabs and blanks
}
Uses
   Tools,
   TabsUnit ;
      Procedure EnTabStr (Var Str : String) ;
      {
         EnTab a single line
      }
      Var
         i  : Byte ;
      begin { EnTabStr }
         {
            first scan of the string - where to put tabs
         }
         for i := 1 to Length (Str) do
            if (Str[i] = BLANK) AND (i + 1 in TabSet) then
               Str[i]   := TAB ;
         {
            second scan of the string - what blanks to eliminate
         }
         for i := 1 to Length (Str) do
            if Str[i] = TAB then
               while Str[i - 1] = BLANK do begin
                  Delete(Str, i - 1, 1) ;
                  i := i - 1
               end
      end ; { EnTabStr }
Var
   Str      :  String ;
   TabStops :  TabType ;
begin { EnTab }
   SetTabs ;
   while NOT eof do begin
      ReadLn (Str) ;
      EnTabStr (Str) ;
      WriteLn (Str)
   end
end.  { EnTab }
As you can see, by working with a string we are able to scan the record and implant tabs, then [i]back up[/i] and scan the record again to eliminate blanks. You cannot back up if dealing with the data stream on-the-fly.

We have already acknowledged that the program contains a "bug" in not being able to handle records longer than 255 chars. We believe the bug is inconsequential and unlikely ever to manifest itself. If the bug is a problem there are ways to cure it without resorting to on-the-fly processing. The first is to define a type LongString:
Type
   LongString = Record
      Len   :  Word ;
      LongS :  Array [1 .. 32767] of char
   end ;
Allowing a string up to 32767 char. This approach will require you to write several supporting routines: ReadLongString, WriteLongString, LongLength, LongDelete. You will also have to redefine how tabs are represented and detected since a set can have only 256 members. The simplest would be to have every position after 255 automatically be a tab stop. This suggests that one might be able to rewrite the main routing thus:
begin { EnTab }
   SetTabs ;
   while NOT eof do begin
      Read (Str) ;
      EnTabStr (Str) ;
      Write (Str) ;
      {
         if record is longer than 255 chars
      }
      while NOT eoln do begin
         Read(Ch) ;
         Write(Ch)
      end ;
      {
         move on to next record
      }
      ReadLn ;
      WriteLn
   end
end.  { EnTab }
Leaving the tabs representation and EnTabStr intact.

Finally, you could read each record into a typed file of char, effectively giving a string representation that could be over two billion char long. That should be enough for anybody.

I have not written, debugged or tested any code using this approach. I leave that to you.
Tags: String, Tabs

Comments
No comments posted yet.


Sponsored links

Build IT Knowledge with Current & Trusted Content
Helps Employees Develop & Hone New Technical Programming Skills. Sign Up & Get Full Access.
Check Out IT Certification Preparation Materials
Sign Up With SkillSoft & Get Access to Training Materials for Over 50 Professional Certifications.
Six Sigma Certification
100% Online-Six Sigma Certificate from Villanova - Find Out More Now.
Virtual File System SDK
Create your own file systems in Windows and .NET applications
PureCM Software Configuration Management
Version control and integrated issue tracking - powerful and easy to use. Get your FREE trial now!


Newsletter | Submit Content | About | Advertising | Awards | Contact Us | Link to us |
© 1996-2008 Community Networks Ltd All rights reserved. Reproduction in whole or in part, in any form or medium without express written permission is prohibited. Violators of this policy may be subject to legal action. Please read Terms Of Use and Privacy Statement for more information. Development by Synchron Data - .NET development.