SEARCH
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Prev | Current Page 133 | Next

Raymond Yee

"Pro Web 2.0 Mashups: Remixing Data and Web Services"

???18 Bots come in a variety of well-known types
and engage in activities that range from positive and benign to illegal and destructive:
??? ???Chatterbots??? that automatically reply to human users through instant messaging or IRC19
??? Wikipedia bots that automate the monitoring, maintaining, and editing of the Wikipedia20
??? Ticket-purchasing bots that buy tickets on behalf of ticket scalpers
??? Bots that generate spam or launch distributed denial of service attacks
Web spiders (also known as web crawlers and web harvesters) are a special type of Internet
bot. They typically focus on getting collections of web pages??”up to billions of pages??”rather
than focused extraction of data on a given page. It??™s the spiders from search engines such as
Google and Yahoo! that visit your web pages to collect your web pages with which to build
their large indexes of the Web.
There are some important technical challenges to screen-scraping. The vast majority of
data embedded in HTML is not marked up to be unambiguously and consistently parsed by
bots. Hence, screen-scraping depends on making rather brittle assumptions about what the
placement and presentation style of embedded data implies about the semantics of the data.
The author of web pages often changes its visual style without intending to change any underlying
semantics??”but still ends up breaking, often inadvertently, screen-scraping code.


Pages:
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145