| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.scheduler |
Scheduler is the part of url management.
|
| us.codecraft.webmagic.scheduler.component |
Component of scheduler.
|
| us.codecraft.webmagic.utils |
Static utils of webmagic.
|
| Modifier and Type | Field and Description |
|---|---|
protected java.util.List<Request> |
Spider.startRequests |
| Modifier and Type | Method and Description |
|---|---|
Request |
Request.addCookie(java.lang.String name,
java.lang.String value) |
Request |
Request.addHeader(java.lang.String name,
java.lang.String value) |
Request |
Page.getRequest()
get request of current page
|
Request |
ResultItems.getRequest() |
<T> Request |
Request.putExtra(java.lang.String key,
T value) |
Request |
Request.setBinaryContent(boolean binaryContent) |
Request |
Request.setCharset(java.lang.String charset) |
Request |
Request.setExtras(java.util.Map<java.lang.String,java.lang.Object> extras) |
Request |
Request.setMethod(java.lang.String method) |
Request |
Request.setPriority(long priority)
Set the priority of request for sorting.
Need a scheduler supporting priority. |
Request |
Request.setUrl(java.lang.String url) |
| Modifier and Type | Method and Description |
|---|---|
java.util.List<Request> |
Page.getTargetRequests() |
| Modifier and Type | Method and Description |
|---|---|
Spider |
Spider.addRequest(Request... requests)
Add urls with information to crawl.
|
void |
Page.addTargetRequest(Request request)
add requests to fetch
|
void |
SpiderListener.onError(Request request) |
protected void |
Spider.onError(Request request) |
void |
SpiderListener.onSuccess(Request request) |
protected void |
Spider.onSuccess(Request request) |
void |
Page.setRequest(Request request) |
ResultItems |
ResultItems.setRequest(Request request) |
| Modifier and Type | Method and Description |
|---|---|
Spider |
Spider.startRequest(java.util.List<Request> startRequests)
Set startUrls of Spider.
Prior to startUrls of Site. |
| Modifier and Type | Method and Description |
|---|---|
HttpClientRequestContext |
HttpUriRequestConverter.convert(Request request,
Site site,
Proxy proxy) |
Page |
HttpClientDownloader.download(Request request,
Task task) |
Page |
Downloader.download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected Page |
HttpClientDownloader.handleResponse(Request request,
java.lang.String charset,
org.apache.http.HttpResponse httpResponse,
Task task) |
protected void |
AbstractDownloader.onError(Request request) |
protected void |
AbstractDownloader.onSuccess(Request request) |
| Modifier and Type | Method and Description |
|---|---|
Request |
QueueScheduler.poll(Task task) |
Request |
Scheduler.poll(Task task)
get an url to crawl
|
Request |
PriorityScheduler.poll(Task task) |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
DuplicateRemovedScheduler.noNeedToRemoveDuplicate(Request request) |
void |
DuplicateRemovedScheduler.push(Request request,
Task task) |
void |
Scheduler.push(Request request,
Task task)
add a url to fetch
|
void |
QueueScheduler.pushWhenNoDuplicate(Request request,
Task task) |
protected void |
DuplicateRemovedScheduler.pushWhenNoDuplicate(Request request,
Task task) |
void |
PriorityScheduler.pushWhenNoDuplicate(Request request,
Task task) |
protected boolean |
DuplicateRemovedScheduler.shouldReserved(Request request) |
| Modifier and Type | Method and Description |
|---|---|
protected java.lang.String |
HashSetDuplicateRemover.getUrl(Request request) |
boolean |
DuplicateRemover.isDuplicate(Request request,
Task task)
Check whether the request is duplicate.
|
boolean |
HashSetDuplicateRemover.isDuplicate(Request request,
Task task) |
| Modifier and Type | Method and Description |
|---|---|
static java.util.List<Request> |
UrlUtils.convertToRequests(java.util.Collection<java.lang.String> urls) |
| Modifier and Type | Method and Description |
|---|---|
static java.util.List<java.lang.String> |
UrlUtils.convertToUrls(java.util.Collection<Request> requests) |
Copyright © 2020. All rights reserved.