Paragraph search - Regular expresions

Hi,

This is example in PHP, but problem is RE and 'thinking' is universal

I have to 'take' paragraphs from html file where word paragraph is not only

...

paragraph is and so if file is:

test
test2 test3
test2

bla bla



result has to be:

test
test2 test3
test2
bla bla

that mean everything between

and

, and and between

and start of new paragraph

or

. Here are some 'holes' and bugs in this request I know, and I know this script can be 'broken' on some unfriendly html file but I want to make some solution for start for 'friendly' files :))))

Because I am not expert in RE I got idea to replace all

,

, and with some unusual word and then to take parts between that words. I know it isn't good solution but I hope it will solve problem.

I made something like this:

<?php /* Paragraph search */
// Take a source
$File = implode ("", file("page.html"));
// Remove all styles (not neccessary):
$File = eregi_replace ("<style(.*)</style>", "", $File);
// Remove all scripts (not neccessary):
$File = eregi_replace ("", "", $File);
// Remove 'baggage' (not neccessary):
$File = eregi_replace ("", "", $File);
// Replace 'targets' with some unusual word:
$File = eregi_replace ("", " ", $File);
$File = eregi_replace ("", " ", $File);
$File = eregi_replace ("

", " ", $File);
$File = eregi_replace ("

", " ", $File);
// Take everything between i
if (preg_match_all("/(.*)/", $File, $matches)) {
for ($i=0; $i < count($matches[0]); $i++) {
echo "Paragraph" . $i . ": " . $matches[1][$i] . "
";
echo "--------------
";
}
}
?>

but I got no results :(((

What can be problem, what I can to do and what do You mean about solution? Somebody has better solution?

Thanks and regards,

Aleksandar Ljubojevic - LJUBA
ljubas@yahoo.com
http://ljubas.tripod.com
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories