C and C++

Moderators: None (Apply to moderate this forum)
Number of threads: 28629
Number of posts: 94611

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
Binary file i/o problem? Posted by saw7988 on 28 Jun 2006 at 3:29 PM
I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.

My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.

I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
Report
Re: Binary file i/o problem? Posted by Lundin on 28 Jun 2006 at 11:50 PM
: I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
:
: My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
:
: I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
:


When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:

#include <stdio.h>

int main()
{
  unsigned long number = 0xAABBCCDD;
  int i;
  
  for(i=0; i<4; i++)
    printf("%X", *(i+(unsigned char*)&number) );
    
  return 0;
}


On a little endian CPU (like your PC), it will give the result: DDCCBBAA
On a big endian CPU it will give the result: AABBCCDD
Report
Re: Binary file i/o problem? Posted by tsagld on 29 Jun 2006 at 12:11 AM
: : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
: :
: : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
: :
: : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
: :
:
:
: When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
:
:
: #include <stdio.h>
: 
: int main()
: {
:   unsigned long number = 0xAABBCCDD;
:   int i;
:   
:   for(i=0; i<4; i++)
:     printf("%X", *(i+(unsigned char*)&number) );
:     
:   return 0;
: }
: 

:
: On a little endian CPU (like your PC), it will give the result: DDCCBBAA
: On a big endian CPU it will give the result: AABBCCDD

Most obvious reason is that you open a binary file in text-mode. That gives trouble.



Greets,
Eric Goldstein
http://www.gvh-maatwerk.nl


Report
Re: Binary file i/o problem? Posted by saw7988 on 29 Jun 2006 at 6:41 AM
: : : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
: : :
: : : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
: : :
: : : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
: : :
: :
: :
: : When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
: :
: :
: : #include <stdio.h>
: : 
: : int main()
: : {
: :   unsigned long number = 0xAABBCCDD;
: :   int i;
: :   
: :   for(i=0; i<4; i++)
: :     printf("%X", *(i+(unsigned char*)&number) );
: :     
: :   return 0;
: : }
: : 

: :
: : On a little endian CPU (like your PC), it will give the result: DDCCBBAA
: : On a big endian CPU it will give the result: AABBCCDD
:
: Most obvious reason is that you open a binary file in text-mode. That gives trouble.
:
:
:
: Greets,
: Eric Goldstein
: http://www.gvh-maatwerk.nl
:
:
:

Well I'm opening all files with ios::binary mode, and the data isn't coming through backwords or wrong, when im reading it something is going wrong with specific characters and it stops. I can paste my code and explain what I'm doing, but the program works with textfiles only, so I was thinking the problem might be specific to non textfiles.
Report
Re: Binary file i/o problem? Posted by IDK on 29 Jun 2006 at 10:29 AM
: : : : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
: : : :
: : : : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
: : : :
: : : : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
: : : :
: : :
: : :
: : : When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
: : :
: : :
: : : #include <stdio.h>
: : : 
: : : int main()
: : : {
: : :   unsigned long number = 0xAABBCCDD;
: : :   int i;
: : :   
: : :   for(i=0; i<4; i++)
: : :     printf("%X", *(i+(unsigned char*)&number) );
: : :     
: : :   return 0;
: : : }
: : : 

: : :
: : : On a little endian CPU (like your PC), it will give the result: DDCCBBAA
: : : On a big endian CPU it will give the result: AABBCCDD
: :
: : Most obvious reason is that you open a binary file in text-mode. That gives trouble.
: :
: :
: :
: : Greets,
: : Eric Goldstein
: : http://www.gvh-maatwerk.nl
: :
: :
: :
:
: Well I'm opening all files with ios::binary mode, and the data isn't coming through backwords or wrong, when im reading it something is going wrong with specific characters and it stops. I can paste my code and explain what I'm doing, but the program works with textfiles only, so I was thinking the problem might be specific to non textfiles.
:
Your program can be the problem. Post your code.
Report
Re: Binary file i/o problem? Posted by saw7988 on 29 Jun 2006 at 2:43 PM
Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)

#include <iostream.h>
#include <conio.h>
#include <fstream.h>
#include <string.h>

bool containsChar(unsigned char *string, char c);
int getReps(unsigned char* string, int start);
unsigned char* compress(unsigned char* data, long size);

int main(int argc, char* argv[])
{
	if (argc != 2)
	{
		cout<<"Incorrect Usage...";
		getch();
		return 0;
	}

	char* file;
	file = argv[1];

	cout<<"Opening " <<file <<"..." <<endl;
	ifstream in(file, ios::binary);

	if (!in)
	{
		cout<<"Couldn't read " <<argv[1] <<".";
		getch();
		return 0;
	}

	in.seekg(0, ios::end);
	long filesize = in.tellg();
	in.seekg(0);

	unsigned char* data;
	data = new unsigned char[filesize];
	in.read(data, filesize);
	in.close();
	cout<<"File read." <<endl;

	unsigned char* cdata;
	cout<<"Compressing..." <<endl;
	cdata = compress(data, filesize);
	cout<<"Compressed." <<endl;

	delete [] data;

	cout<<"Writing compressed file..." <<endl;
	ofstream out("output.cpr", ios::binary);
	if (!out)
	{
		cout<<"Couldn't open output.cpr.";
		getch();
		return 0;
	}
	out.write(cdata, strlen((char*)cdata));
	out.close();
	cout<<"Done.";

	delete [] cdata;

	getch();
	return 0;
}

bool containsChar(unsigned char *string, char c)
{
	int size = strlen((char*)string);

	for (int i=0; i<size; i++)
	{
		if (*(string+i) == c)
			return true;
	}

	return false;
}

int getReps(unsigned char* string, int start, long size)
{
	int reps = 1;
	int pos = start;

	while (*(string+pos) == *(string+pos+1))
	{
		pos++;
		reps++;
		if (pos == size)
			break;
	}

	return reps;
}

unsigned char* compress(unsigned char* data, long size)
{
	unsigned char* cdata;
	cdata = new unsigned char[size];
	char marker;

	int ascii = 33;
	while (containsChar(data, (char)ascii))
		ascii++;
	marker = (char)ascii;
	*cdata = marker;

	int pos1=0, pos2=1, reps; //first char in cdata is the marker

	while (pos1 < size)
	{
		reps = getReps(data, pos1, size);

		if (reps < 4)
		{
			for (int i=0; i<reps; i++)
			{
				*(cdata+pos2) = *(data+pos1);
				pos1++;
				pos2++;
			}
		}
		else
		{
			if (reps > 255)
				reps = 255;
			*(cdata+pos2) = marker;
			*(cdata+pos2+1) = (unsigned char)reps;
			*(cdata+pos2+2) = *(data+pos1);
			pos2 += 3;
			pos1 += reps;
		}
	}

	return cdata;
}

Report
Re: Binary file i/o problem? Posted by IDK on 30 Jun 2006 at 2:42 AM
: Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
:


chars are unsigned chars by default, like int are unsigned ints as default.


:
: #include <iostream.h>
: #include <conio.h>
: #include <fstream.h>
: #include <string.h>
: 
: bool containsChar(unsigned char *string, char c);
: int getReps(unsigned char* string, int start);
: unsigned char* compress(unsigned char* data, long size);
: 
: int main(int argc, char* argv[])
: {
: 	if (argc != 2)
: 	{
: 		cout<<"Incorrect Usage...";
: 		getch();
: 		return 0;
: 	}
: 
: 	char* file;
: 	file = argv[1];
: 
: 	cout<<"Opening " <<file <<"..." <<endl;
: 	ifstream in(file, ios::binary);
: 
: 	if (!in)
: 	{
: 		cout<<"Couldn't read " <<argv[1] <<".";
: 		getch();
: 		return 0;
: 	}
: 
: 	in.seekg(0, ios::end);
: 	long filesize = in.tellg();
: 	in.seekg(0);
: 
: 	unsigned char* data;
: 	data = new unsigned char[filesize];
: 	in.read(data, filesize);
: 	in.close();
: 	cout<<"File read." <<endl;
: 
: 	unsigned char* cdata;
: 	cout<<"Compressing..." <<endl;
: 	cdata = compress(data, filesize);
: 	cout<<"Compressed." <<endl;
: 
: 	delete [] data;
: 
: 	cout<<"Writing compressed file..." <<endl;
: 	ofstream out("output.cpr", ios::binary);
: 	if (!out)
: 	{
: 		cout<<"Couldn't open output.cpr.";
: 		getch();
: 		return 0;
: 	}
: 	out.write(cdata, strlen((char*)cdata));
: 	out.close();
: 	cout<<"Done.";
: 
: 	delete [] cdata;
: 
: 	getch();
: 	return 0;
: }
: 
: bool containsChar(unsigned char *string, char c)
: {
: 	int size = strlen((char*)string);
: 
: 	for (int i=0; i<size; i++)
: 	{
: 		if (*(string+i) == c)
: 			return true;
: 	}
: 
: 	return false;
: }
: 
: int getReps(unsigned char* string, int start, long size)
: {
: 	int reps = 1;
: 	int pos = start;
: 
: 	while (*(string+pos) == *(string+pos+1))
: 	{
: 		pos++;
: 		reps++;
: 		if (pos == size)
: 			break;
: 	}
: 
: 	return reps;
: }
: 
: unsigned char* compress(unsigned char* data, long size)
: {
: 	unsigned char* cdata;
: 	cdata = new unsigned char[size];
: 	char marker;
: 
: 	int ascii = 33;
: 	while (containsChar(data, (char)ascii))
: 		ascii++;
: 	marker = (char)ascii;
: 	*cdata = marker;
: 
: 	int pos1=0, pos2=1, reps; //first char in cdata is the marker
: 
: 	while (pos1 < size)
: 	{
: 		reps = getReps(data, pos1, size);
: 
: 		if (reps < 4)
: 		{
: 			for (int i=0; i<reps; i++)
: 			{
: 				*(cdata+pos2) = *(data+pos1);
: 				pos1++;
: 				pos2++;
: 			}
: 		}
: 		else
: 		{
: 			if (reps > 255)
: 				reps = 255;
: 			*(cdata+pos2) = marker;
: 			*(cdata+pos2+1) = (unsigned char)reps;
: 			*(cdata+pos2+2) = *(data+pos1);
: 			pos2 += 3;
: 			pos1 += reps;
: 		}
: 	}
: 
: 	return cdata;
: }
: 

:

I don't see anything wrong...

Report
Re: Binary file i/o problem? Posted by Lundin on 30 Jun 2006 at 2:59 AM
: : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
: :
:
:
: chars are unsigned chars by default, like int are unsigned ints as default.
:



No, it is compiler dependant. Most common is that they are signed. This is the reason why you should always explicitly write signed or unsigned when declaring a variable.
From ANSI C:

"The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char."

/--/

"CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
other two and is not compatible with either."

Report
Re: Binary file i/o problem? Posted by IDK on 30 Jun 2006 at 3:23 AM
: : : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
: : :
: :
: :
: : chars are unsigned chars by default, like int are unsigned ints as default.
: :

:
:
: No, it is compiler dependant. Most common is that they are signed. This is the reason why you should always explicitly write signed or unsigned when declaring a variable.
: From ANSI C:
:
: "The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char."
:
: /--/
:
: "CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
: other two and is not compatible with either."
:
:
Thanks, I'm always learning something new.
Report
Re: Binary file i/o problem? Posted by tsagld on 30 Jun 2006 at 3:46 AM
: Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
:
:
: #include <iostream.h>
: #include <conio.h>
: #include <fstream.h>
: #include <string.h>
: 
: bool containsChar(unsigned char *string, char c);
: int getReps(unsigned char* string, int start);
: unsigned char* compress(unsigned char* data, long size);
: 
: int main(int argc, char* argv[])
: {
: 	if (argc != 2)
: 	{
: 		cout<<"Incorrect Usage...";
: 		getch();
: 		return 0;
: 	}
: 
: 	char* file;
: 	file = argv[1];
: 
: 	cout<<"Opening " <<file <<"..." <<endl;
: 	ifstream in(file, ios::binary);
: 
: 	if (!in)
: 	{
: 		cout<<"Couldn't read " <<argv[1] <<".";
: 		getch();
: 		return 0;
: 	}
: 
: 	in.seekg(0, ios::end);
: 	long filesize = in.tellg();
: 	in.seekg(0);
: 
: 	unsigned char* data;
: 	data = new unsigned char[filesize];
: 	in.read(data, filesize);
: 	in.close();
: 	cout<<"File read." <<endl;
: 
: 	unsigned char* cdata;
: 	cout<<"Compressing..." <<endl;
: 	cdata = compress(data, filesize);
: 	cout<<"Compressed." <<endl;
: 
: 	delete [] data;
: 
: 	cout<<"Writing compressed file..." <<endl;
: 	ofstream out("output.cpr", ios::binary);
: 	if (!out)
: 	{
: 		cout<<"Couldn't open output.cpr.";
: 		getch();
: 		return 0;
: 	}
: 	out.write(cdata, strlen((char*)cdata));
: 	out.close();
: 	cout<<"Done.";
: 
: 	delete [] cdata;
: 
: 	getch();
: 	return 0;
: }
: 
: bool containsChar(unsigned char *string, char c)
: {
: 	int size = strlen((char*)string);
: 
: 	for (int i=0; i<size; i++)
: 	{
: 		if (*(string+i) == c)
: 			return true;
: 	}
: 
: 	return false;
: }
: 
: int getReps(unsigned char* string, int start, long size)
: {
: 	int reps = 1;
: 	int pos = start;
: 
: 	while (*(string+pos) == *(string+pos+1))
: 	{
: 		pos++;
: 		reps++;
: 		if (pos == size)
: 			break;
: 	}
: 
: 	return reps;
: }
: 
: unsigned char* compress(unsigned char* data, long size)
: {
: 	unsigned char* cdata;
: 	cdata = new unsigned char[size];
: 	char marker;
: 
: 	int ascii = 33;
: 	while (containsChar(data, (char)ascii))
: 		ascii++;
: 	marker = (char)ascii;
: 	*cdata = marker;
: 
: 	int pos1=0, pos2=1, reps; //first char in cdata is the marker
: 
: 	while (pos1 < size)
: 	{
: 		reps = getReps(data, pos1, size);
: 
: 		if (reps < 4)
: 		{
: 			for (int i=0; i<reps; i++)
: 			{
: 				*(cdata+pos2) = *(data+pos1);
: 				pos1++;
: 				pos2++;
: 			}
: 		}
: 		else
: 		{
: 			if (reps > 255)
: 				reps = 255;
: 			*(cdata+pos2) = marker;
: 			*(cdata+pos2+1) = (unsigned char)reps;
: 			*(cdata+pos2+2) = *(data+pos1);
: 			pos2 += 3;
: 			pos1 += reps;
: 		}
: 	}
: 
: 	return cdata;
: }
: 

:

You say that the code works fine on text files, and doesn't work on binar files. That seems logical to me, since you use the strlen-function on the data at some places.
If the data contains a zero-byte, that is interpreted as the end of the string by strlen(), which is probably not what you want.
Binary data may (or is even likely to) contain zeroes, text-data don't.


Greets,
Eric Goldstein
http://www.gvh-maatwerk.nl


Report
Re: Binary file i/o problem? Posted by saw7988 on 30 Jun 2006 at 6:22 AM
: : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
: :
: :
: : #include <iostream.h>
: : #include <conio.h>
: : #include <fstream.h>
: : #include <string.h>
: : 
: : bool containsChar(unsigned char *string, char c);
: : int getReps(unsigned char* string, int start);
: : unsigned char* compress(unsigned char* data, long size);
: : 
: : int main(int argc, char* argv[])
: : {
: : 	if (argc != 2)
: : 	{
: : 		cout<<"Incorrect Usage...";
: : 		getch();
: : 		return 0;
: : 	}
: : 
: : 	char* file;
: : 	file = argv[1];
: : 
: : 	cout<<"Opening " <<file <<"..." <<endl;
: : 	ifstream in(file, ios::binary);
: : 
: : 	if (!in)
: : 	{
: : 		cout<<"Couldn't read " <<argv[1] <<".";
: : 		getch();
: : 		return 0;
: : 	}
: : 
: : 	in.seekg(0, ios::end);
: : 	long filesize = in.tellg();
: : 	in.seekg(0);
: : 
: : 	unsigned char* data;
: : 	data = new unsigned char[filesize];
: : 	in.read(data, filesize);
: : 	in.close();
: : 	cout<<"File read." <<endl;
: : 
: : 	unsigned char* cdata;
: : 	cout<<"Compressing..." <<endl;
: : 	cdata = compress(data, filesize);
: : 	cout<<"Compressed." <<endl;
: : 
: : 	delete [] data;
: : 
: : 	cout<<"Writing compressed file..." <<endl;
: : 	ofstream out("output.cpr", ios::binary);
: : 	if (!out)
: : 	{
: : 		cout<<"Couldn't open output.cpr.";
: : 		getch();
: : 		return 0;
: : 	}
: : 	out.write(cdata, strlen((char*)cdata));
: : 	out.close();
: : 	cout<<"Done.";
: : 
: : 	delete [] cdata;
: : 
: : 	getch();
: : 	return 0;
: : }
: : 
: : bool containsChar(unsigned char *string, char c)
: : {
: : 	int size = strlen((char*)string);
: : 
: : 	for (int i=0; i<size; i++)
: : 	{
: : 		if (*(string+i) == c)
: : 			return true;
: : 	}
: : 
: : 	return false;
: : }
: : 
: : int getReps(unsigned char* string, int start, long size)
: : {
: : 	int reps = 1;
: : 	int pos = start;
: : 
: : 	while (*(string+pos) == *(string+pos+1))
: : 	{
: : 		pos++;
: : 		reps++;
: : 		if (pos == size)
: : 			break;
: : 	}
: : 
: : 	return reps;
: : }
: : 
: : unsigned char* compress(unsigned char* data, long size)
: : {
: : 	unsigned char* cdata;
: : 	cdata = new unsigned char[size];
: : 	char marker;
: : 
: : 	int ascii = 33;
: : 	while (containsChar(data, (char)ascii))
: : 		ascii++;
: : 	marker = (char)ascii;
: : 	*cdata = marker;
: : 
: : 	int pos1=0, pos2=1, reps; //first char in cdata is the marker
: : 
: : 	while (pos1 < size)
: : 	{
: : 		reps = getReps(data, pos1, size);
: : 
: : 		if (reps < 4)
: : 		{
: : 			for (int i=0; i<reps; i++)
: : 			{
: : 				*(cdata+pos2) = *(data+pos1);
: : 				pos1++;
: : 				pos2++;
: : 			}
: : 		}
: : 		else
: : 		{
: : 			if (reps > 255)
: : 				reps = 255;
: : 			*(cdata+pos2) = marker;
: : 			*(cdata+pos2+1) = (unsigned char)reps;
: : 			*(cdata+pos2+2) = *(data+pos1);
: : 			pos2 += 3;
: : 			pos1 += reps;
: : 		}
: : 	}
: : 
: : 	return cdata;
: : }
: : 

: :
:
: You say that the code works fine on text files, and doesn't work on binar files. That seems logical to me, since you use the strlen-function on the data at some places.
: If the data contains a zero-byte, that is interpreted as the end of the string by strlen(), which is probably not what you want.
: Binary data may (or is even likely to) contain zeroes, text-data don't.
:
:
: Greets,
: Eric Goldstein
: http://www.gvh-maatwerk.nl
:
:
:

Oh wow, the strlen thing didn't even occur to me with zero bytes, I'll try to change that and see if it helps. Thanks



 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.