Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Quagmire or Gold Mine?
MIỄN PHÍ
Số trang
4
Kích thước
233.3 KB
Định dạng
PDF
Lượt xem
1415

Quagmire or Gold Mine?

Nội dung xem thử

Mô tả chi tiết

COMMUNICATIONS OF THE ACM November 1996/Vol. 39, No. 11 65

Skeptics believe the Web is too

unstructured for Web mining to suc￾ceed. Indeed, data mining has been

applied traditionally to databases, yet

much of the information on the Web

lies buried in documents designed for

human consumption such as home

pages or product catalogs. Further￾more, much of the information on the

Web is presented in natural-language

text with no machine-readable seman￾tics; HTML annotations structure the

display of Web pages, but provide little

insight into their content.

Some have advocated transforming

the Web into a massive layered data￾base to facilitate data mining [12], but

the Web is too dynamic and chaotic to

be tamed in this manner. Others have

attempted to hand code site-specific

“wrappers” that facilitate the extrac￾tion of information from individual

Web resources (e.g., [8]). Hand cod￾ing is convenient but cannot keep up

with the explosive growth of the Web.

As an alternative, this article argues for

the structured Web hypothesis: Infor￾mation on the Web is sufficiently

structured to facilitate effective Web

mining.

Examples of Web structure include

linguistic and typographic conven￾tions, HTML annotations (e.g.,

<title>), classes of semi-structured doc￾uments (e.g., product catalogs), Web

indices and directories, and much

more. To support the structured Web

hypothesis, this article will survey pre￾liminary Web mining successes and

suggest directions for future work.

Web mining may be organized into

the following subtasks:

• Resource discovery. Locating unfamil￾iar documents and services on the

Web.

• Information extraction. Automatically

Oren Etzioni

TERRY WIDENER

The World-Wide Web:

Quagmire or

Gold Mine?

Is information on the

Web sufficiently structured

to facilitate effective

Web mining?

Tải ngay đi em, còn do dự, trời tối mất!