PDF文件结构的分析-IT之家
tutorial.it55.com/ChengXuKaiFa/ASPNET/2010/05/11/17194640324.html



PDF文件结构的分析-PDF开发技术
oapdf.com/pdf_jishu/pdf_kaifa/162.html



Converting PDF to XML
discerning.com/hacks/docutils/pdf2xml/readme.html
 pdfbox


Wiki: Howto convert pdf to xml
pdfedit.petricek.net/wiki/HowtoPdfToXml?show_comments=1#comments



Apache PDFBox – Apache PDFBox – Java PDF Library
pdfbox.apache.org/
PDFBox


Parsing/Reading a PDF file with C# and Asp.Net to text
naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
There are always more than one way to skin a cat when it comes to programming, but the easiest way I have found for PDFs is to use the fantastic, open-source project PDFBox. The download is good for all sorts of platforms, but you only need a few parts to use it with Asp.Net and C#.


PDF File Parser
xtractpro.com/articles/PDF-File-Parser.aspx?page=2
PDF File Parser


Cross-reference tables are skipped and ignored by now, because they are used by some applications only to quickly access objects in random mode, without loading the whole file. When saving the file, reference tables may not be necessary, and Acrobat will rebuild them on-the-fly anyway.


Parsing/Reading a PDF file with C# and Asp.Net to text
naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
There are always more than one way to skin a cat when it comes to programming, but the easiest way I have found for PDFs is to use the fantastic, open-source project PDFBox. The download is good for all sorts of platforms, but you only need a few parts to use it with Asp.Net and C#.


A PDF Forms Parser – CodeProject
www.codeproject.com/KB/recipes/mgpdfreader.aspx
Although PDF documents are most often used for static content, they can also be used to represent user-fillable forms, much like HTML forms. PDF forms can be created by taking an existing PDF document and placing form fields on it using e.g. Adobe� Acrobat�. In many scenarios the resulting PDF forms are filled out by human users using a PDF viewing tool such as Adobe Acrobat. The actual data can be separated from the PDF that contains the representation using FDF or XFDF files, the latter being an XML format that contains the content of the form fields of a particular document. By using FDF or XFDF it is easy to programmatically fill out PDF forms in scenarios where the content is generated or queried from a database.


Converting PDF to Text in C# – CodeProject
www.codeproject.com/KB/string/pdf2text.aspx
private static string parseUsingPDFBox(string filename) { PDDocument doc = PDDocument.load(filename); PDFTextStripper stripper = new PDFTextStripper(); return stripper.getText(doc); }


用pdfbox解析pdf文件 – 相关参考资料 – pdf之家 pdf阅读器格式文件下载
www.haopdf.cn/viewthread-2179.html



PDF和Java技术(PDFBox)
www.360doc.com/content/09/1222/15/61497_11729087.shtml
# 提取文本,包括Unicode字符。
# 和Jakarta Lucene等文本搜索引擎的整合过程十分简单。
# 加密/解密PDF文档。
# 从PDF和XFDF格式中导入或导出表单数据。
# 向已有PDF文档中追加内容。
# 将一个PDF文档切分为多个文档。
# 覆盖PDF文档。


转贴:PDFBOX—PDF—解析 百度空间_应用平台
apps.hi.baidu.com/share/detail/6354478
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;


/**
* @description 使用PDFBOX组件解析PDF文件
* @author ZhouJingxian
*
*/
public class PDF {
/**
* 解析PDF,获取内容
*
* @param file_path_name
* @return
*/


 

标签: