Hello,
I have task to make two PHP functions:
1. To clean HTML file of all HTML tags and to separate all words with maximum 1 blank (space)
2. To clean result from first function of some 'ignore words'
For example: If somebody call:
$content = function1 ("test.html");
$content = function2 ($content, "ignore.dat");
And if content of test.html is:
<html>
<head>
<title>This is title</title>
</head>
<javascript>
Some code here
</javascript>
<body>
<table><tr>
<td>
<!-- Some comment here--> This is text
</td></tr>
</table>
</body>
After function1 result in $content has to be:
This is title
This is text
And if I define ignore words in ignore.dat as "This" and "is" I have to have result in $content:
title
text
I can solve problem by classic way - split line by line on some characters <, >, ... but always I can miss something and this can take a lot of time. Can somebody help me how I can solve this problem using regular expressions.
Thanks a lot
LJUBA