I have task to make two PHP functions:
1. To clean HTML file of all HTML tags and to separate all words with maximum 1 blank (space)
2. To clean result from first function of some 'ignore words'
For example: If somebody call:
$content = function1 ("test.html");
$content = function2 ($content, "ignore.dat");
And if content of test.html is:
<title>This is title</title>
Some code here
<!-- Some comment here--> This is text
After function1 result in $content has to be:
This is title
This is text
And if I define ignore words in ignore.dat as "This" and "is" I have to have result in $content:
I can solve problem by classic way - split line by line on some characters <, >, ... but always I can miss something and this can take a lot of time. Can somebody help me how I can solve this problem using regular expressions.
Thanks a lot