Perl

Moderators: Jonathan
Number of threads: 1236
Number of posts: 3605

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
Perl script to check doc_b against doc_a for inconsistence Posted by satimis on 9 Nov 2004 at 8:06 AM
Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

1)
Original document "doc_a"
Check this link to sea what scannars are supported by SANE

Already having 2 typing mistakes
sea
scannars

2)
The reproduced document "doc_b" must maintain these 2 mistakes for consistence.
check thes link to sea what scannars are suppurted by SeNE

Unfortunately another 3 typing mistakes were further made;
thes
suppurted
SeNE

What I expect to have in the printout is;
Original    Mistake Line No. Word No.
this     thes     1         2
supported suppurted 1         9
SANE     SeNE     1         11

not just printing out their contents and saying "differ"

Kindly advise how to start. TIA

B.R.
satimis
Report
Re: Perl script to check doc_b against doc_a for inconsistence Posted by Weirdofreak on 9 Nov 2004 at 10:24 AM
Well, you may want to check out diff first, if you're doing this because you need a tool rather than because you feel like it. It's more suited to large documents with big differences though, rather than trying to catch individual words.

First you'd want to split each line into the separate words. If the words are different, print them out with the line/word number. You may want to use tabstops to align them, or " " x 10 - (length $word). If the array from the original file is longer than the other one, print out splice @arr1, $#arr2 with a message saying that that's what's missing, and if @arr2 is longer, do split @arr2, $#arr1 and say that it shouldn't be there. It won't be very good if you add a word in the middle of a line ("foo bar baz quux" becoming "foo bar baz blech quux" will tell you that quux shouldn't be there, rather than blech), but it should suffice.

You'll probably want to not store the file in an array to save memory. Instead, do while (<$file>) { ... }[/qrey] unless you need to keep it for some reason.
Report
Re: Perl script to check doc_b against doc_a for inconsistence Posted by satimis on 9 Nov 2004 at 5:52 PM
Hi

Tks for your advice.

I'm only a newbei on Perl. I'll use following as starting point.

Script:-
open (FILE, "doc_a.txt") or die;
@doc_a = <FILE>;
close FILE;

open (FILE, "doc_b.txt") or die;
@doc_b = <FILE>;
close FILE;

$n_a = @doc_a;
$n_b = @doc_b;

if ($n_a != $n_b) {
print "Error: documents are not the same length";
exit(0);
}

else {
for my $i (0 .. $#doc_a) {
my @line_a = split(/ /,$doc_a[$i]);
my @line_b = split(/ /,$doc_b[$i]);
&compare(@line_a,@line_b,$i);
}
&print_results();

.....
.....
etc.
- End -

I have not resolved how to have the mistakes (mistyping words on doc_b, inconsistent to doc_a) printed out in a table with line number and word number as demonstrated in my first posting. Could you please give me some suggestion. Tks.


: You'll probably want to not store the file in an array to save memory. Instead, do while (<$file>) { ... }[/qrey] unless you need to keep it for some reason.
:

Could you please advise me more detail how to achieve it. TIA

B.R.
satimis
Report
Re: Perl script to check doc_b against doc_a for inconsistence Posted by Weirdofreak on 10 Nov 2004 at 10:58 AM
Untested, but try this.

print "Original\tMistake\t\tLine\tWord\n";
my $short = ($#line_a > $#line_b ? $#line_b : $#line_a);
for my $j (0 .. $short) {
  my ($a, $b, $l, $w) = ($line_a[$j], $line_b[$j], $i + 1, $j + 1); # redundant, but looks better
  print "$a\t\t$b\t\t$l\t$w\n" if $a ne $b;
}
if ($#line_a > $#line_b) {
  print "File b is missing '", @line_a[@line_b .. $#line_a], "' on line $l\n"; # I'm not sure if you need to explicitly scalarise @line_b or not
} elsif ($#line_b > $#line_a) {
  print "File b should not have '", @line_b[@line_a .. $#line_b], "' on line $l\n"; # agan, you may need to scalarise @line_a
}


The formatting will probably get messed up with that method, but it looks the nicest in code form . You may want to look at formats - this sort of thing is what I think they were made for, although they're slightly archaic. I think you'd want something like
format STDOUT =
@<<<<<<<<<<<< @<<<<<<<<<<<< @<<<<<< @<<<<<<\n
$a,           $b,          $l,   $w,
.

But I don't know much about them at all, including how to actually use them, so you're on your own there.



 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.