<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>'Analysing text files to obtain statistics on their content.' Thread RSS Feed</title>
    <link>http://www.programmersheaven.com/</link>
    <description>Contains the latest posts from the thread 'Analysing text files to obtain statistics on their content.' posted on the 'Perl' forum at Programmer's Heaven.</description>
    <language>en</language>
    <copyright>Copyright 2013 Programmers Heaven</copyright>
    <pubDate>Sun, 19 May 2013 00:21:51 -0700</pubDate>
    <lastBuildDate>Sun, 19 May 2013 00:21:51 -0700</lastBuildDate>
    <generator>Argotic Syndication Framework 2007.3.0.1, http://www.codeplex.com/Argotic</generator>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <ttl>360</ttl>
    <image>
      <url>http://www.programmersheaven.com/images/ph.gif</url>
      <title>Programmers Heaven</title>
      <link>http://www.programmersheaven.com/</link>
      <width>88</width>
      <height>31</height>
    </image>
    <item>
      <title>Analysing text files to obtain statistics on their content.</title>
      <link>http://www.programmersheaven.com/mb/perl/372885/372885/analysing-text-files-to-obtain-statistics-on-their-content/</link>
      <description>Analysing text files to obtain statistics on their content &lt;br /&gt;
&lt;br /&gt;
You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows: &lt;br /&gt;
&lt;br /&gt;
1) When run, the program should check if an argument has been provided. If not, the program should prompt for, and accept input of, a filename from the keyboard. &lt;br /&gt;
&lt;br /&gt;
2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format. The filename part should be no longer than 8 characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters. The file extension should be optional, but if given is should be ".TXT" (upper- or lowercase). &lt;br /&gt;
&lt;br /&gt;
If no extension if given, ".TXT" should be added to the end of the filename. So, for example, if "testfile" is input as the filename, this should become "testfile.TXT". If "input.txt" is entered, this should remain unchanged. &lt;br /&gt;
&lt;br /&gt;
3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point. &lt;br /&gt;
&lt;br /&gt;
4) The program should then check to see if the file exists using the filename provided. If the file does not exist, a suitable error message should be displayed and the program should end at this point. &lt;br /&gt;
&lt;br /&gt;
5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end. &lt;br /&gt;
&lt;br /&gt;
6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am very new to Perl and have managed to compile this code using examples from various books. Could anyone oversee this coding and see how it could be improved. &lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl &lt;br /&gt;
&lt;br /&gt;
use strict; &lt;br /&gt;
use warnings; &lt;br /&gt;
&lt;br /&gt;
if ($#ARGV == -1) #no filename provided as a command line argument. &lt;br /&gt;
{ &lt;br /&gt;
print("Please enter a filename: "); &lt;br /&gt;
$filename = &amp;lt;STDIN&amp;gt;; &lt;br /&gt;
chomp($filename); &lt;br /&gt;
} &lt;br /&gt;
else #got a filename as an argument. &lt;br /&gt;
{ &lt;br /&gt;
$filename = $ARGV[0]; &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
#perform the specified checks &lt;br /&gt;
#check if filename is valid, exit if not &lt;br /&gt;
if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) &lt;br /&gt;
{ &lt;br /&gt;
die("File format not valid\n");) &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
if ($filename !~ m/\.TXT$/i) &lt;br /&gt;
{ &lt;br /&gt;
$filename .= ".TXT"; &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
#check if filename is actual file, exit if it is. &lt;br /&gt;
if (-e $filename) &lt;br /&gt;
{ &lt;br /&gt;
die("File does not exist\n"); &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
#check if filename is empty, exit if it is. &lt;br /&gt;
if (-s $filename) &lt;br /&gt;
{ &lt;br /&gt;
die("File is empty\n"); &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
my $i = 0; &lt;br /&gt;
my $p = 1; &lt;br /&gt;
my $words = 0; &lt;br /&gt;
my $chars = 0; &lt;br /&gt;
&lt;br /&gt;
open(READFILE, "&amp;lt;$data1.txt") or die "Can't open file '$filename: $!"; &lt;br /&gt;
&lt;br /&gt;
#then use a while loop and series of if statements similar to the following &lt;br /&gt;
while (&amp;lt;READFILE&amp;gt;) { &lt;br /&gt;
chomp; #removes the input record Separator &lt;br /&gt;
$i = $.; #"$". is the input record line numbers, $i++ will also work &lt;br /&gt;
$p++ if (m/^$/); #count paragraphs &lt;br /&gt;
$my @t = split (/\s+/); #split sentences into "words" &lt;br /&gt;
$words += @t; #add count to $words &lt;br /&gt;
$chars += tr/ //c; #tr/ //c count all characters except spaces and add to $chars &lt;br /&gt;
} &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
#display results &lt;br /&gt;
print "There are $i lines in $data1\n"; &lt;br /&gt;
print "There are $p Paragraphs in $data1\n"; &lt;br /&gt;
print "There are $words in $data1\n"; &lt;br /&gt;
print "There are $chars in $data1\n"; &lt;br /&gt;
&lt;br /&gt;
close(READFILE); &lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/perl/372885/372885/analysing-text-files-to-obtain-statistics-on-their-content/</guid>
      <pubDate>Wed, 25 Jun 2008 03:50:46 -0700</pubDate>
      <category>Perl</category>
    </item>
    <item>
      <title>Re: Analysing text files to obtain statistics on their content.</title>
      <link>http://www.programmersheaven.com/mb/perl/372885/385210/re-analysing-text-files-to-obtain-statistics-on-their-content/#385210</link>
      <description>Hiya,&lt;br /&gt;
&lt;br /&gt;
I am doing the very same piece of course work. Could you tell me if your piece was correct and if you managed to complete it successfully?&lt;br /&gt;
&lt;br /&gt;
Thanks.&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/perl/372885/385210/re-analysing-text-files-to-obtain-statistics-on-their-content/#385210</guid>
      <pubDate>Wed, 04 Feb 2009 05:45:05 -0700</pubDate>
      <category>Perl</category>
    </item>
    <item>
      <title>Re: Analysing text files to obtain statistics on their content.</title>
      <link>http://www.programmersheaven.com/mb/perl/372885/392147/re-analysing-text-files-to-obtain-statistics-on-their-content/#392147</link>
      <description>Hi&lt;br /&gt;
Were you able to complete your TMA10 successfully. I am currently doing the same assignment and I'm finding it very difficult. could you give me some help and tips&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/perl/372885/392147/re-analysing-text-files-to-obtain-statistics-on-their-content/#392147</guid>
      <pubDate>Wed, 10 Jun 2009 11:36:22 -0700</pubDate>
      <category>Perl</category>
    </item>
    <item>
      <title>Re: Analysing text files to obtain statistics on their content.</title>
      <link>http://www.programmersheaven.com/mb/perl/372885/394793/re-analysing-text-files-to-obtain-statistics-on-their-content/#394793</link>
      <description>You are to write a Perl program that analyses text files to obtain statistics on their content.  The program should operate as follows:&lt;br /&gt;
&lt;br /&gt;
1) When run, the program should check if an argument has been provided.  If not, the program should prompt for, and accept input of, a filename from the keyboard.&lt;br /&gt;
&lt;br /&gt;
I will help you with point one to get the ball rolling.  You should write something similar to the following;&lt;br /&gt;
&lt;br /&gt;
###Page 8-4 - command line arguments###&lt;br /&gt;
&lt;br /&gt;
###point 1)###&lt;br /&gt;
#Provide a filename as a command line argument here such as Program:myscipt.pl#&lt;br /&gt;
if ($#ARGV == -1) # no filename provided as a command line argument#  &lt;br /&gt;
{&lt;br /&gt;
print("Enter filename: ");   #more can be added here&lt;br /&gt;
$filename = &amp;lt;STDIN&amp;gt;;&lt;br /&gt;
chomp($filename);&lt;br /&gt;
}&lt;br /&gt;
else # got a filename as an argument&lt;br /&gt;
{&lt;br /&gt;
 $filename = $ARGV[0];&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
###end code snippet###&lt;br /&gt;
&lt;br /&gt;
With this in mind lets now attempt to address points two and three.  &lt;br /&gt;
&lt;br /&gt;
2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format.  The filename part should be no longer than 8 &lt;br /&gt;
&lt;br /&gt;
characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters.  The file extension should be optional, but if &lt;br /&gt;
&lt;br /&gt;
given is should be ".TXT" (upper- or lowercase).  &lt;br /&gt;
&lt;br /&gt;
If no extension if given, ".TXT" should be added to the end of the filename.  So, for example, if "testfile" is input as the filename, this should become "testfile.TXT".  If &lt;br /&gt;
&lt;br /&gt;
"input.txt" is entered, this should remain unchanged.&lt;br /&gt;
&lt;br /&gt;
3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point.&lt;br /&gt;
&lt;br /&gt;
###page 3-6/3-7 - character classes###&lt;br /&gt;
&lt;br /&gt;
### point 2 and 3)###&lt;br /&gt;
### then, perform the specified checks###&lt;br /&gt;
### check if filename is valid, exit if not###&lt;br /&gt;
if ($filename !~ m/  /i)#####Add \w  an alpha numerical word character;equivalent to [a-zA-Z_0-9]&lt;br /&gt;
{&lt;br /&gt;
die("not valid.\n");    #more can be added here&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# does the filename end with .TXT?&lt;br /&gt;
if ($filename !~ m/    /i)&lt;br /&gt;
{&lt;br /&gt;
$filename .= ".TXT";&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
###end code snippet###&lt;br /&gt;
&lt;br /&gt;
Note how I have left blanks where patterns should be.  Refer to page 3-6/3-7.  Try this&lt;br /&gt;
first and then try to answer points 4 &amp;amp; 5.  &lt;br /&gt;
&lt;br /&gt;
4) The program should then check to see if the file exists using the filename provided.  If the file does not exist, a suitable error message should be displayed and the program &lt;br /&gt;
&lt;br /&gt;
should end at this point.&lt;br /&gt;
&lt;br /&gt;
5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end.&lt;br /&gt;
&lt;br /&gt;
###Page 7-6 - Determining information about files###&lt;br /&gt;
&lt;br /&gt;
### point 4)###&lt;br /&gt;
### check if filename is actual file, exit if not###&lt;br /&gt;
if ( -e $file   )&lt;br /&gt;
{&lt;br /&gt;
die(" error ");   #more can be added here&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
### point 5)###&lt;br /&gt;
### check if filename is empty, exit if it is###&lt;br /&gt;
if ( -z $file   )&lt;br /&gt;
{&lt;br /&gt;
open(INOUT,+&amp;lt;$file") ¦¦    die(" error couldn't open $file,$! ");   #more can be added here&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
###end code snippet###&lt;br /&gt;
&lt;br /&gt;
Again, note how I have left blanks for the if conditions.  Refer to Page 7-6 for more&lt;br /&gt;
information.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.&lt;br /&gt;
&lt;br /&gt;
I will leave the remainder of the assignment for you to complete although here is an example on how to count sentences.&lt;br /&gt;
&lt;br /&gt;
###page 7-4 - Opening files for Reading###&lt;br /&gt;
###Page 7-5 - The getc function###&lt;br /&gt;
###Page 2-3 - String Boolean expressions###&lt;br /&gt;
###Page 2-6 - The While Statement###&lt;br /&gt;
&lt;br /&gt;
open(READFILE, "&amp;lt;$filename")&lt;br /&gt;
 or die "Could not open file \"$filename\":$!";&lt;br /&gt;
&lt;br /&gt;
$sentences = 0;&lt;br /&gt;
&lt;br /&gt;
###you would need to declare a variable such as ###&lt;br /&gt;
&lt;br /&gt;
my($ch);&lt;br /&gt;
&lt;br /&gt;
###then use a while loop and series of if statements similar to the following###&lt;br /&gt;
&lt;br /&gt;
while ($ch = getc(READFILE))&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
 # count sentences:&lt;br /&gt;
 if ($ch eq "?" || $ch eq "!" || $ch eq ".")&lt;br /&gt;
 # if character is one of the three end of sentence markers&lt;br /&gt;
 {&lt;br /&gt;
  $sentences++;&lt;br /&gt;
 }&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
close(READFILE);&lt;br /&gt;
&lt;br /&gt;
# display results&lt;br /&gt;
print("Sentences:    $sentences\n");&lt;br /&gt;
&lt;br /&gt;
###end code snippet###&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/perl/372885/394793/re-analysing-text-files-to-obtain-statistics-on-their-content/#394793</guid>
      <pubDate>Sat, 08 Aug 2009 16:12:13 -0700</pubDate>
      <category>Perl</category>
    </item>
  </channel>
</rss>