I wish my assignment was easy this semester but it isn't; it's very hard and complicated. I've been working in it for weeks now and my program still has bugs and logical errors.
I was wondering if you could please help me with this assignment. I would be very grateful.
I've included below a copy of what is expected for this assignmne , and below that, I've copied and pasted my code.
I commented the codes which gave compilation errors. I used Borland to compile the program.
It has the following bugs:
-It doesn't append words to the left and right nodes
-It doesn't store the page numbers into the linked list
-It doesn't filter out noise words
-It stores duplicate words
-The query function has problems
I would kindly appreciate your help.
Thanks in advance,
Introduction
This assignment deals with the construction of an index for a text book. You are provided with an ilustration of an Abstract Data Type (ADT) - index, with a well-defined interface.
You will need to define structs to represent this ADT, and use typedefs to create the ADT Index. The interface of the ADT consists of the following function prototypes.
void IndexInit(Index*);
void IndexAppend(Index*, char*, int);
void IndexPrint(Index*);
int IndexQuery(Index*, char*, int**);
void IndexDestroy(Index*);
Task Description
Part A - Implementation of the Client Program
Write a program to build an index for a text book and allow a user to query the index, i.e. to find page numbers given a word.
In detail, the program must perform the following tasks.
Declare an instance of Index ADT.
Read a chapter of the book titled "Open Sources - Voices of Open Source Revolution" the chapter is titled A Brief History of Hackerdom <./book.txt> , tokenise the input into words using the strtok() function, and construct the index. the input text file is separated into pages using <END> tages. (Roughly there are 40 lines per page in the input file). So for example, page 3 begins immediately after the second <END> tag.
The following code fragment illustrates how strtok() function can be used to tokenise a string into words.
#define SEPARATORS " .,?\"\n"
....
....
char *line, *word;
....
....
gets(line);
if (strlen(line)<=1) break;
word = strtok(line,SEPARATORS);
while(word!=NULL)
{
printf("%s\n",word);
word = strtok(NULL,SEPARATORS);
}
....
Note that, not all the words in the book are inserted to the index. For each of the words tokenised from the book, check if it appears in the common-words file stopped-words.txt <./stopped-words.txt> and insert only if it does not appear in the common words file.
Display the entire index of the book on the standard output, each entry in a separate line.
Let the user to enter a word from the standard input and output the list of page numbers on the standard output. The output should contain only the input word and a comma separated list of page numbers. If the word appears in more than two adjacent pages, the list should be displayed as a range, as illustrated below. No other junk should be sent to the output. Let the user to continue querying, until he enters the word 'enough'.
Destroy the entire index.
Part B - Implementation of the Index ADT
Implement the above data structure using the binary tree-based representation and the associated functions to manipulate the ADT Index illustrated in Figure 01.
The Index ADT is a homogeneous collection of records, where each record comprises of a word and a list of page numbers that word appear in the text. The list of page numbers for each word is stored in an integer linked-list (as illustrated vertically in the diagram), and the list of words are themselves stored in the nodes of the binary tree.
The task of each function in the interface is given below.
IndexInit - Initialises an Index instance. It initialises the 'words' pointer (top-level pointer) to NULL, and the number of words to 0.
IndexAppend - appends an entry <word, page> into the index. It first checks if the word already exists in the words tree, and if it exists, adds the page number into the page number list of that word. However, if the entered page number already exists in the corresponding list, it simply discards the new entry. If the word doesn't exist in the words list, it appends a new node into the top-level list and appends a node into its (new node's) page number list.
In short, the data structure does not allow duplicates either in the top-level (words) tree or page numbers lists. Top-level tree is maintained in the alphabatical order of words, and page number lists are maintained in the ascending order of page numbers. Each insertion should adhere these rules.
IndexPrint - displays the entire index, as appeared at the end of a text book, i.e. each indexed word in a separate line, followed by a comma separated ordered list of page numbers. If more than two adjacent page numbers appear in a page list, they should be displayed as a range, i.e. if the term 'Abstract' appear in pages 10, 45, 46, 47 and 48, it should be displayed as;
Abstract 10, 45 - 48.
The output should be in the alphabatical order of terms.
IndexQuery - reads in an Index ADT instance and a word as the input and returns the list of page numbers on which the word appears. The thrid argument in the prototype is an array of integers, which is used to return the results set. The function also returns the number of page numbers n the results set as well. (The return value of the function is used for this purpose).
IndexDestroy - as its name explains, it destroys the entire ADT instance and deallocates memory used for the whole data structure.
You may define as many local functions as necessary to construct the above functions in a well structured form, however, the ADT interface should include only the above-listed functions. In other words, any client program should access the Index ADT only using these functions, not any other local functions.
You must use a Makefile to compile the ADT implementation and the client program together, and your submission must include the Makefile used to compile your program.
IN MY HEADER FILE, I TYPED THE FOLLOWING:
//asgn.h
struct pagelist {
struct pagelist *next;
int page;
};
struct node {
char *word;
struct pagelist *head;
struct node *left;
struct node *right;
int size;
};
typedef struct node NODE;
typedef struct node *Index;
void IndexInit(Index *);
void IndexAppend(Index*, char*,int);
void IndexPrint(Index *);
int IndexQuery(Index*,char*,int**);
void IndexDestroy(Index *);
My implementation is as follows:
//asgn.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include "asgn.h"
#define MAX 10000
static char buff[MAX] = {0};
void IndexInit(Index *);
void IndexAppend(Index *, char *,int);
void IndexPrint(Index *);
int IndexQuery(Index*,char*,int**);
int noise(char *s);
char *tokeniser(char **, char *);
char *char_in_string(char *, int );
int compareString(const char *, const char *);
int PageNumbers(struct pagelist *);
struct pagelist *addPageNo(int);
int findWord (Index *, char);
void printPageNumbers(struct pagelist *);
char *GetFile1(char *,int, FILE *);
char *GetFile2(char *,int, FILE *);
char *word;
char *s=NULL;
char *SEPARATORS = " ";
void IndexPrint(Index *);
void IndexInit(Index *ind) {
(*ind) -> word = NULL;
(*ind) -> size = 0;
printf("Init. success.");
}
void IndexAppend(Index *ind, char *word, int page) {
int diff;
//struct pagelist *pl = malloc(sizeof(*pl));
*ind = (Index)malloc(sizeof(NODE));
diff = compareString((*ind)->word, word);
if ((*ind) !=NULL) {
(*ind)->word = word;
printf("%s ",(*ind)->word);
printf("%d\n",page);
(*ind)->head = addPageNo(page);
//IndexPrint(&ind);
}
else
if (diff > 0) {
IndexAppend(&((*ind)->left), word, page);
}
else
IndexAppend(&((*ind)->right), word, page);
}
void IndexPrint(Index *ind) {
if ((*ind) != NULL) {
//printf("%s ", (ind)->word);
//printPageNumbers((*ind)->head);
}
}
int IndexQuery(Index *ind,char *word,int **results) {
//if (!findWord(&ind,*word)) {
// printf("Word not found.");
//}
//else *results = findWord(&ind,*word);
return **results;
}
void IndexDestroy(Index *ind) {
if((*ind) != NULL) {
free(*ind);
*ind=NULL;
}
}
int noise(char *c) {
int found = 0;
//while (GetFile2(buff, sizeof(buff),stdin) != NULL) {
//s=buff;
//while ((word=tokeniser(&s,SEPARATORS)) != NULL) {
if (c==s)
found=1;
//}
//}
return found;
}
char *tokeniser(char **s, char *delims) {
char *p = NULL;
char *q;
if ((s != NULL) && (*s != '\0') && (delims != NULL)) {
//pass over leading delimeters
while (NULL != char_in_string(delims, **s)) {
++*s;
}
if (**s != '\0') {
q = *s + 1;
p = *s;
while (*q != '\0' && NULL == char_in_string(delims,*q)) {
++q;
}
*s = q + (*q != '\0');
*q='\0';
}
}
return p;
}
char *char_in_string(char *s, int c) {
char *p = NULL;
if (s != NULL) {
if (c != '\0') {
while (*s != '\0' && *s != c) {
++s;
}
if (*s == c) {
p = s;
}
}
}
return p;
}
int compareString(const char *s, const char *t) {
int diff = 0;
char cs;
char ct;
while (diff==0 && *s !='\0' && *t != '\0') {
cs = tolower((unsigned char)*s);
ct = tolower((unsigned char) *t);
if (cs < ct) {
diff=-1;
}
else if (cs>ct) {
diff = 1;
}
++s;
++t;
}
if (diff==0 && *s != *t) {
if (*s == '\0') {
diff =-1;
}
else {
diff=1;
}
}
return diff;
}
struct pagelist* addPageNo(int page) {
struct pagelist *new = malloc(sizeof(*new));
if (new !=NULL) {
new->page = page;
}
return new;
}
void printPageNumbers(struct pagelist *pl){
if (pl != NULL) {
printPageNumbers(pl->next);
printf("%d ", pl->page);
}
}
int PageNumbers(struct pagelist *pl){
int **result;
if (pl != NULL) {
PageNumbers(pl->next);
**result = pl->page;
}
return **result;
}
char *GetFile1(char *s,int n, FILE *fp) {
int c;
char *p=s;
fp = fopen("book.txt","r");
while ( (--n > 0) && (c=getc(fp)) !=EOF) {
*p++ = c;
}
return p;
}
char *GetFile2(char *s,int n, FILE *fp) {
int c;
char *p=s;
fp = fopen("stopped-words.txt","r");
while ( (--n > 0) && (c=getc(fp)) !=EOF) {
*p++ = c;
}
return p;
}
//int findWord (Index *ind, char w) {
//int **results;
//if (ind != NULL) {
//if ((*ind)->word == w){
//*results = PageNumbers((*ind)->head);
//}
//}
//else if ((*ind)->word != w)
// findWord(&((*ind)->left), w);
//else findWord(&((*ind)->right), w);
//return *results;
//}
THE DRIVER IS AS FOLLOWS:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include "asgn.c"
void main(void) {
Index ind;
#define MAXLINE 40000
char *s=NULL;
static char buffer[MAXLINE] = {0};
char *word;
char *pg="END";
int pagenum = 1;
char answer ='y';
char *w=NULL;
int **results;
char *SEPARATORS = "{}. ,?\;";
//IndexInit(&ind);
while (GetFile1(buffer, sizeof(buffer),stdin) != NULL) {
s=buffer;
if (s==pg) {
++pagenum;
}
while ((word=tokeniser(&s,SEPARATORS)) != NULL) {
if (noise(word) == 0) {
IndexAppend(&ind,word,pagenum);
}
else {
continue;
}
}
}
while (answer != 'n') {
printf("Enter a word to be searched: ");
scanf("%s", &w);
**results = IndexQuery(&ind,w,results);
printf("Continue with Query? ");
scanf("%c", &answer);
}
IndexDestroy(&ind);
}