C and C++

Moderators: None (Apply to moderate this forum)
Number of threads: 28691
Number of posts: 94711

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
Text file keyword counter - no STL or C++ streams Posted by Kuroyume on 27 Jul 2006 at 12:34 AM
This message was edited by Kuroyume at 2006-7-27 0:35:42

Okay, I need an efficient algorithm for counting a set of keywords (note that - there is more than one to count) and storing their file positions (as LONG - long) in (possibly very large) text files using C++. Now, everyone is going to show examples with STL or standard ifstream, but this is a plugin SDK and file access is through its file class methods. So, the algorithm must be efficient without the tricks of C++, sorry to say. Also, the SDK file reader is Unicode-compliant and works on Windows and MacOS. So platform-dependent tricks are also not possible.

I'll show the source, but without knowing the SDK and all of the context, it may be confusing. Some notes to help:

* A 'word' here is text surrounded by whitespace.

* There is a "FillBuffer" to expediate file parsing/keyword counting. This is a memory buffer that is filled with part or all of the file as needed.

* fbufptr, fbufend are pointers to the FillBuffer.

* lbufptr, lbufend are points to a line buffer (delineated by \n).

* bytesRead counts the bytes read from the file from the beginning.

* String is an SDK string class.

* The WordCounter isn't dynamically flexible. Doesn't need to be and this is done to increase processing speed. This is not an all-purpose solution - very specific to what I'm doing.

// STRUCT: Word Counter for Prepass
// - up to 8 keywords
struct WordCounter
{
	// Word to check
	String	word[8];
	// Length of word to check
	LONG	len[8];
	// Number of each word found
	LONG	count[8];
	// File position of each word (up to 2048 references)
	LONG	pos[8][2048];
};

// Specialized token counter
//*---------------------------------------------------------------------------*
void FileReader::Prepass(WordCounter* wc, WORD number)
//*---------------------------------------------------------------------------*
{
	UCHAR	i;
	LONG	len;

	// Count words in file
	for (;;)
	{
		// Step 1: Skip leading whitespace
		do {
			// Reached end of file buffer, read more
			if (fbufptr == fbufend)
			{
				if (!FillBuffer())	return;
			}

			c = *fbufptr;
			fbufptr++;
			bytesRead++;
		} while (c <= UNICODE_SPACE);
		// Step 2: Read text surrounded by whitespace into lbuffer
		lbufptr = lbuffer;
		len = 0;
		do {
			*lbufptr = c;
			lbufptr++;
			len++;

			// Reached end of file buffer, read more
			if (fbufptr == fbufend)
			{
				if (!FillBuffer())	return;
			}

			c = *fbufptr;
			fbufptr++;
			bytesRead++;
		} while (c > UNICODE_SPACE);
		// - Null-terminate buffer
		*lbufptr = 0;
		// Step 3:	Compare text to word(s) of same len
		for (i = 0; i < number; i++)
		{
			if ((wc->len[i] == len) && (wc->word[i] == lbuffer))
			{
				// Store file position of word start
				wc->pos[i][wc->count[i]] = bytesRead - len;
				// Store reference count
				wc->count[i]++;
				break;
			}
		}
	}
}


Thanks for any crits or suggestions.

Robert



 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.