Hi,
This is example in PHP, but problem is RE and 'thinking' is universal
I have to 'take' paragraphs from html file where word paragraph is not only
...
paragraph is and so if file is:
test
result has to be:
test
test2 test3
test2
bla bla
that mean everything between
and
, and and between
and start of new paragraph
or
. Here are some 'holes' and bugs in this request I know, and I know this script can be 'broken' on some unfriendly html file but I want to make some solution for start for 'friendly' files
)))
Because I am not expert in RE I got idea to replace all
,
, and with some unusual word and then to take parts between that words. I know it isn't good solution but I hope it will solve problem.
I made something like this:
<?php /* Paragraph search */
// Take a source
$File = implode ("", file("page.html"));
// Remove all styles (not neccessary):
$File = eregi_replace ("<style(.*)</style>", "", $File);
// Remove all scripts (not neccessary):
$File = eregi_replace ("", "", $File);
// Remove 'baggage' (not neccessary):
$File = eregi_replace ("", "", $File);
// Replace 'targets' with some unusual word:
$File = eregi_replace ("", " ", $File);
$File = eregi_replace ("", " ", $File);
$File = eregi_replace ("
", " ", $File);
$File = eregi_replace ("
", " ", $File);
// Take everything between i
if (preg_match_all("/(.*)/", $File, $matches)) {
for ($i=0; $i < count($matches[0]); $i++) {
echo "Paragraph" . $i . ": " . $matches[1][$i] . "
";
echo "--------------
";
}
}
?>
but I got no results
((
What can be problem, what I can to do and what do You mean about solution? Somebody has better solution?
Thanks and regards,
Aleksandar Ljubojevic - LJUBA
ljubas@yahoo.com
http://ljubas.tripod.com