The API of Collect allows you to do most of the things you can do with the webinterface
The documentation assumes that you have a server running Collect. The API can be accessed using
http://yourserver:port/api/v1/endpoint
where “endpoint” is the endpoint you want to access.
All responses return JSON objects. There are several types of response objects.
ContentDescription)A site object describes a saved page. An example object looks like this:
{
"url": "http://example.com/some/page",
"pagepath": "example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe/html.html",
"id": "example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe",
"domain": "example.com",
"saved": "2018-01-07T13:30:34.030Z",
"title": "Example Domain",
"size": 1416
}
url: The original url to the saved pagepagepath: The path where the index/main page is saved (starting at Collect/public/s/)id: An unique id for this site (Usually, they aren’t as long as in the example. You can set the length in the config file as described here.)domain: The domain of the original urlsaved: The date on which the page was savedtitle: The title displayed in the listingsize: Size of all content of this page, in bytesAn Error describes an error. An example object looks like this:
{
"status": 412,
"message": "Missing parameter \"url\""
}
status: The same status as in the http responsemessage: Describes the errorNote: When the status code is in the 4xx or 5xx range, an Error object is returned.
An processing object describes a started process. An example object looks like this:
{
"message": "Processing started",
"target": "http://example.com/some/page"
}
message: A messagetarget: The url being processed/downloadedAll API requests must be authenticated either by using a cookie(webinterface) or by passing the API token.
If you do an API request, you need to pass your api_token(see config.json) as url parameter(token), eg
http://yourserver:port/api/v1/endpoint?token=my_example_token
Note: The token is omitted from the urls in the documentation
To see the sites that are saved on your Collect server, you can use the /sites/ endpoint.
http://yourserver:port/api/v1/sites/_domain_
The parameter domain is a domain name, e.g. example.com.
You can also request sites from multiple domains by joining them using a +, e.g. example.com+example2.com.
If don’t give this parameter, all sites from all domains will be returned.
This method returns a List of Site objects or an Error
To get details about a saved website, you can use the /details/ endpoint.
http://yourserver:port/api/v1/details/_id_
The parameter id is an id of an entry, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe.
If you don’t give this parameter or the id doesn’t exist on the server, you’ll get a 404 response.
This method returns a Site object or an Error
To add a site to your saved websites, you can use the /site/add endpoint.
Note: This endpoint returns a status code of 202 on success.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/add
The parameter url is the link to the site you want to save, eg https://example.com/some/page.
The parameter depth is the number of hyperlinks to follow on the specified url. It must be a number between 0 and 5.
If omitted, depth is 0.
The parameter title is the title of the site. If omitted, Collect will use the title in the html file.
The parameter samedomain sets whether hyperlinks to other domains should be followed.
If true, only links to the same domain will be followed.
The parameter cookies defines cookies that should be sent with the request, eg cookie1=value1;cookie2=value2. By default, no cookies are sent.
The parameter useragent defines the User-Agent header that will be sent with the request, eg cookie1=value1;cookie2=value2. By default, no user-agent will be sent.
This method returns a Processing object or an Error
To change the displayed title of a saved site, you can use the /site/_id_/settitle endpoint.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/_id_/settitle
The parameter id is the id of a saved site, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe
The parameter title is the new title of the site
This method returns an Error object. If successful, the Error object has a status of 200.
To delete a saved site, you can use the /site/_id_/delete endpoint.
Note: Although this is a POST endpoint, the API token must be passed in the query string.
http://yourserver:port/api/v1/site/_id_/delete
The parameter id is the id of a saved site, eg example.com-4c52804bf1541a1f1ef789bf402f7112f91a066dd58c7fb1fe
This method returns an Error object. If successful, the Error object has a status of 200.
To download an archive of all sites currently saved in Collect, visit
http://yourserver:port/api/v1/backup._extension_
The server will generate an archive that can be downloaded.
Note: You can download the archive if you are logged in. If you aren’t logged in, you can access it by adding your api_token to the url.
The parameter extension defines the format of the archive.
It can be either zip, tar or tar.gz.
Use of tar.gz or zip is recommended because tar files are not compressed.
To restore a backup from a backup file, you extract its contents in the Collect/Collect/public/s/ directory.
There should be a file called content.json at the path Collect/Collect/public/s/content.json and a directory for each entry.
It seems like sometimes Windows Explorer can’t open these files. This is due to the big file size. If you zip file is bigger than 4GB, some zip implementations won’t be able to decode it. You’ll get an error that the file is corrupt. You can use other programs to open these zip files, e.g. 7zip or WinRAR.
To see these API requests in action, open your browser console and enable XHR (Firefox) or open the “Network” tab (Chrome & Firefox).
You should see something like this (Firefox):

If you have any questions or something doesn’t work the way you expect it to, feel free to open an issue.