Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

quicker way to find matches in 7000+ entry arrays

mims1979mims1979 Member Posts: 4
I've got some code that basically opens a text file, reads each line into an array (@regexs). Next, it opens another text file and reads all those lines into another array(@lists), then looks for a match in the array created from the previous file. This all works just fine, and was pretty quick back when first created. However, the 1st text file is now 7000+ lines and it takes for ever for me to loop through the array looking for matches in the second text file. My code is below, I'm not looking for someone to do my work for me, but just let me know if there is a more efficient/quicker way to find the matches than to have to loop over the entire 7000+ entry array for each entry in the @lists array. At times, the @lists array can be 1000+ entries as well, so you can see how it takes forever.

foreach $lines (@lists){
foreach $regex (@regexs){
if ($lines =~ /$regex/gi){
#Do one thing
} else {
#Do something else
}
}
}

Comments

  • JonathanJonathan Member Posts: 2,914
    : I've got some code that basically opens a text file, reads each line into an array (@regexs). Next, it opens another text file and reads all those lines into another array(@lists), then looks for a match in the array created from the previous file. This all works just fine, and was pretty quick back when first created. However, the 1st text file is now 7000+ lines and it takes for ever for me to loop through the array looking for matches in the second text file. My code is below, I'm not looking for someone to do my work for me, but just let me know if there is a more efficient/quicker way to find the matches than to have to loop over the entire 7000+ entry array for each entry in the @lists array. At times, the @lists array can be 1000+ entries as well, so you can see how it takes forever.
    :
    Whoa...that's a lot of computation. It's hard to give you much help without knowing a bit more about the situation, but here's one thing that may improve things somewhat...

    : if ($lines =~ /$regex/gi){
    Add the "o" modifier (so it's /gio at the end). When the script sees a regex that you hardcoded, like "/a+/", it just compiles the regex once. When you have a variable in the regex, it compiles it every time before doing the match because that variable may have changed. If you add the "o" modifier, you are saying "compile it once". Of course, if you modify the variable's value, you ain't gonna get what you'd hope for as it will still be using the regex compiled from the old value. But you probably aren't doing that, so this should give you something of a speed-up. :-)

    Any help?

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • hdingmanhdingman Member Posts: 1
    7000+ ** 1000+ ? WOW! That WOULD take a while. . .

    Can you re-arrange (sort) the first flat text file?

    If you can sort it, you can use a binary search instead of an iterative loop, so it would only take (at most) 15 comparisons to search 7000 records. It's a standard classroom exercise in reentrant (recursive) subroutines. Short and sweet. Use an insertion sort to add records to the "master" file.

    I made the same kind of mistake in college years ago - the end user said they'd "never" resort the data, so I used a bubble sort (yuk!) and then they resorted the file hundreds of times in the first month! It was a billable account, in "kilo-core ticks" and the first month's bill was a whopper! My next project became "optimize their sorting - IMMEDIATELY!" (and if you've never heard of a kilo-core tick, you're not as old as I am...)

    Howard


    : I've got some code that basically opens a text file, reads each line into an array (@regexs). Next, it opens another text file and reads all those lines into another array(@lists), then looks for a match in the array created from the previous file. This all works just fine, and was pretty quick back when first created. However, the 1st text file is now 7000+ lines and it takes for ever for me to loop through the array looking for matches in the second text file. My code is below, I'm not looking for someone to do my work for me, but just let me know if there is a more efficient/quicker way to find the matches than to have to loop over the entire 7000+ entry array for each entry in the @lists array. At times, the @lists array can be 1000+ entries as well, so you can see how it takes forever.
    :
    : foreach $lines (@lists){
    : foreach $regex (@regexs){
    : if ($lines =~ /$regex/gi){
    : #Do one thing
    : } else {
    : #Do something else
    : }
    : }
    : }
    :
Sign In or Register to comment.