Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Text file keyword counter - no STL or C++ streams

KuroyumeKuroyume Member Posts: 1
[b][red]This message was edited by Kuroyume at 2006-7-27 0:35:42[/red][/b][hr]
Okay, I need an efficient algorithm for counting a set of keywords (note that - there is more than one to count) and storing their file positions (as LONG - long) in (possibly very large) text files using C++. Now, everyone is going to show examples with STL or standard ifstream, but this is a plugin SDK and file access is through its file class methods. So, the algorithm must be efficient without the tricks of C++, sorry to say. Also, the SDK file reader is Unicode-compliant and works on Windows and MacOS. So platform-dependent tricks are also not possible.

I'll show the source, but without knowing the SDK and all of the context, it may be confusing. Some notes to help:

* A 'word' here is text surrounded by whitespace.

* There is a "FillBuffer" to expediate file parsing/keyword counting. This is a memory buffer that is filled with part or all of the file as needed.

* fbufptr, fbufend are pointers to the FillBuffer.

* lbufptr, lbufend are points to a line buffer (delineated by
).

* bytesRead counts the bytes read from the file from the beginning.

* String is an SDK string class.

* The WordCounter isn't dynamically flexible. Doesn't need to be and this is done to increase processing speed. This is not an all-purpose solution - very specific to what I'm doing.

[code]// STRUCT: Word Counter for Prepass
// - up to 8 keywords
struct WordCounter
{
// Word to check
String word[8];
// Length of word to check
LONG len[8];
// Number of each word found
LONG count[8];
// File position of each word (up to 2048 references)
LONG pos[8][2048];
};

// Specialized token counter
//*---------------------------------------------------------------------------*
void FileReader::Prepass(WordCounter* wc, WORD number)
//*---------------------------------------------------------------------------*
{
UCHAR i;
LONG len;

// Count words in file
for (;;)
{
// Step 1: Skip leading whitespace
do {
// Reached end of file buffer, read more
if (fbufptr == fbufend)
{
if (!FillBuffer()) return;
}

c = *fbufptr;
fbufptr++;
bytesRead++;
} while (c <= UNICODE_SPACE);
// Step 2: Read text surrounded by whitespace into lbuffer
lbufptr = lbuffer;
len = 0;
do {
*lbufptr = c;
lbufptr++;
len++;

// Reached end of file buffer, read more
if (fbufptr == fbufend)
{
if (!FillBuffer()) return;
}

c = *fbufptr;
fbufptr++;
bytesRead++;
} while (c > UNICODE_SPACE);
// - Null-terminate buffer
*lbufptr = 0;
// Step 3: Compare text to word(s) of same len
for (i = 0; i < number; i++)
{
if ((wc->len[i] == len) && (wc->word[i] == lbuffer))
{
// Store file position of word start
wc->pos[i][wc->count[i]] = bytesRead - len;
// Store reference count
wc->count[i]++;
break;
}
}
}
}[/code]

Thanks for any crits or suggestions.

Robert
Sign In or Register to comment.