I need help in parsing unicode webpages & downloading jpeg image files via Perl scripts.
I read
http://www.cs.utk.edu/cs594ipm/perl/crawltut.html about using LWP or HTTP or get($url) functions & libraries. But the content returned is always garbled. I have used get($url) on a non-unicode webpage and the content is returned in perfect ascii.
But now I want to parse
http://www.tom365.com/movie_2004/html/5507.html and the page I get back is garbled encoded. I have read about Encode but don't know how to use it.
I need a Perl script to parse that above page and extract the URL for the image in this pattern:
<div class="movie"><img src="http://pic.tom365.com/imgs/tongjifan.jpg" class="mp" />
If anyone knows how to do this parsing unicode webpages then I'd be very grateful.
Thank you