PDF→HTML変換ツール
調査中。できればJava。できればタダ。
HTMLに変換というか中のテキストやテーブルをパースしたい。
iText
はできんぽい
http://www.lowagie.com/iText/faq.html#parsepdf
Is it possible to parse an existing PDF-document and convert it to another format (HTML, DOC, EXCEL)?
No, the pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. For instance: you can't retrieve a table object from a PDF file. Tables are formed by placing text and lines at selected places.
OOo
HTML→PDFはできるが、逆は無理か?