본문 바로가기

컴퓨터 기술

자바 HTML 파서 리스트

반응형

Swing HTML Parser

HTML Parser
http://sourceforge.net/projects/htmlparser/
CPL 1.0, LGPL

Java Mozilla Html Parser
http://sourceforge.net/projects/mozillaparser/
MPL 1.1

jsoup
http://jsoup.org/
MIT
http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers
XPath보다 편한 jQuery식 검색 지원

Jericho HTML Parser
http://sourceforge.net/projects/jerichohtml/
EPL, LGPL

Java HTML parser
http://sourceforge.net/projects/javahtmlparser/
LGPL

JTidy
http://sourceforge.net/projects/jtidy/
?

TagSoup
http://home.ccil.org/~cowan/XML/tagsoup/
"Apache Tika switched from neko to TagSoup."
아파치 라이센스 2.0

Cobra
http://lobobrowser.org/cobra.jsp
LGPL

Apache Axiom
http://ws.apache.org/commons/axiom/
아파치 라이센스 2.0

CyberNeko HTML Parser
http://sourceforge.net/projects/nekohtml/
아파치 라이센스 2.0
"NekoHTML is the way to go. I've used it in many contexts for years to parse HTML into XML and it is always up to the task, often in places where JTidy fails to deliver."

HtmlCleaner
http://sourceforge.net/projects/htmlcleaner/
BSD
http://stackoverflow.com/questions/1184411/is-htmlcleaner-thread-safe
"an HtmlCleaner object is not thread safe."

HotSAX
http://sourceforge.net/projects/hotsax/
LGPL

VietSpider HTMLParser
http://sourceforge.net/projects/binhgiang/
?

HtmlUnit
http://sourceforge.net/projects/htmlunit/
아파치 라이센스 2.0

YG HTML Parser
http://jakarta.tistory.com/category/YG%20Html%20Parser

Validator.nu HTML Parser
http://about.validator.nu/htmlparser/

반응형