: How I can parse Word, Excel, PDF, ... files? Where I can find some good filters or something like this?
Check out wotsit.org, you'll find all the filetype information you ever wanted.
There is an interesting project called doc2pdf on SourceForge.net too. Might help you get off the ground.
-bugninja