The API of Collect allows you to do most of the things you can do with the webinterface
The documentation assumes that you have a server running Collect. The API can be accessed using
http://yourserver:port/api/v1/endpoint
where “endpoint” is the endpoint you want to access.
All responses return JSON objects. There are several types of response objects.
ContentDescription
)A site
object describes a saved page. An example object looks like this:
{
"url": "http://example.com/some/page",
"pagepath": "example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe/html.html",
"id": "example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe",
"domain": "example.com",
"saved": "2018-01-07T13:30:34.030Z",
"title": "Example Domain",
"size": 1416
}
url
: The original url to the saved pagepagepath
: The path where the index/main page is saved (starting at Collect/public/s/
)id
: An unique id for this site (Usually, they aren’t as long as in the example. You can set the length in the config file as described here.)domain
: The domain of the original urlsaved
: The date on which the page was savedtitle
: The title displayed in the listingsize
: Size of all content of this page, in bytesAn Error
describes an error. An example object looks like this:
{
"status": 412,
"message": "Missing parameter \"url\""
}
status
: The same status as in the http responsemessage
: Describes the errorNote: When the status code is in the 4xx or 5xx range, an Error
object is returned.
An processing
object describes a started process. An example object looks like this:
{
"message": "Processing started",
"target": "http://example.com/some/page"
}
message
: A messagetarget
: The url being processed/downloadedAll API requests must be authenticated either by using a cookie(webinterface) or by passing the API token.
If you do an API request, you need to pass your api_token
(see config.json
) as url parameter(token
), eg
http://yourserver:port/api/v1/endpoint?token=my_example_token
Note: The token is omitted from the urls in the documentation
To see the sites that are saved on your Collect server, you can use the /sites/
endpoint.
http://yourserver:port/api/v1/sites/_domain_
The parameter domain
is a domain name, e.g. example.com
.
You can also request sites from multiple domains by joining them using a +
, e.g. example.com+example2.com
.
If don’t give this parameter, all sites from all domains will be returned.
This method returns a List of Site objects or an Error
To get details about a saved website, you can use the /details/
endpoint.
http://yourserver:port/api/v1/details/_id_
The parameter id
is an id of an entry, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe
.
If you don’t give this parameter or the id doesn’t exist on the server, you’ll get a 404 response.
This method returns a Site object or an Error
To add a site to your saved websites, you can use the /site/add
endpoint.
Note: This endpoint returns a status code of 202 on success.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/add
The parameter url
is the link to the site you want to save, eg https://example.com/some/page
.
The parameter depth
is the number of hyperlinks to follow on the specified url. It must be a number between 0 and 5.
If omitted, depth
is 0.
The parameter title
is the title of the site. If omitted, Collect will use the title in the html file.
The parameter samedomain
sets whether hyperlinks to other domains should be followed.
If true
, only links to the same domain will be followed.
The parameter cookies
defines cookies that should be sent with the request, eg cookie1=value1;cookie2=value2
. By default, no cookies are sent.
The parameter useragent
defines the User-Agent
header that will be sent with the request, eg cookie1=value1;cookie2=value2
. By default, no user-agent will be sent.
This method returns a Processing object or an Error
To change the displayed title of a saved site, you can use the /site/_id_/settitle
endpoint.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/_id_/settitle
The parameter id
is the id of a saved site, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe
The parameter title
is the new title of the site
This method returns an Error object. If successful, the Error object has a status of 200.
To delete a saved site, you can use the /site/_id_/delete
endpoint.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/_id_/delete
The parameter id
is the id of a saved site, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe
This method returns an Error object. If successful, the Error object has a status of 200.
To download an archive of all sites currently saved in Collect
, visit
http://yourserver:port/api/v1/backup._extension_
The server will generate an archive that can be downloaded.
Note: You can download the archive if you are logged in. If you aren’t logged in, you can access it by adding your api_token
to the url.
The parameter extension
defines the format of the archive.
It can be either zip
, tar
or tar.gz
.
Use of tar.gz
or zip
is recommended because tar
files are not compressed.
To restore a backup from a backup file, you extract its contents in the Collect/Collect/public/s/
directory.
There should be a file called content.json
at the path Collect/Collect/public/s/content.json
and a directory for each entry.
It seems like sometimes Windows Explorer can’t open these files. This is due to the big file size. If you zip file is bigger than 4GB, some zip implementations won’t be able to decode it. You’ll get an error that the file is corrupt. You can use other programs to open these zip files, e.g. 7zip or WinRAR.
To see these API requests in action, open your browser console and enable XHR
(Firefox) or open the “Network” tab (Chrome & Firefox).
You should see something like this (Firefox):
If you have any questions or something doesn’t work the way you expect it to, feel free to open an issue.