Collect is a server to collect & archive websites written for NodeJS.
It does not download entire sites, but rather single pages and all content needed to display them. This means that Collect stores a static copy of the website (and its assets) on your disk. It also hosts these pages so you can access them over the network.
Before installing Collect, please make sure that git
, node
and npm
are installed.
Note: This install process is tested with Node version 12, 14 and 16. The test status can be read from the “Test” badge: . If this is green, then everything should work!
Start by cloning the repository to your computer/server:
git clone https://github.com/xarantolus/Collect.git
Switch to the Collect
directory:
cd Collect/Collect
Install dependencies:
npm install
Start the server in production
mode (recommended):
npm start production
or
node app production
Expected output:
Preparing integrity check...
Checking cookie file...
Checking if folders for ids exist...
All folders exist.
Checking if ids for folders exist...
All entrys exist.
Finished integrity check.
Collect-Server(1.17.0-production) listening on port 80
Now open the website in your browser by visiting http://localhost:80 if running on the same computer or http://yourserver:80, where yourserver
is the network name of your server.
You will notice that you need to authenticate with a username and password. That can be set up as shown in the next section.
To change settings, edit Collect/config.json
. There, you can set a port
, username
, password
, id_length
, api_token
, allow_public_view
and allow_public_all
. Note that you need to restart the server to apply changes.
After setting up the server, you can read the user guide to find out more about general usage, keyboard shortcuts and download options.
If you already have Collect installed on your computer/server and want to update to the latest version, follow these steps.
Go in the directory where Collect is installed.
cd /path/to/Collect
You might want to back up your settings file.
Windows:
move Collect\config.json ..\
Linux/Unix:
mv Collect/config.json ../config.json
Download the latest version:
git fetch --all
Apply all changes (this usually overwrites your cookies file, but not the directory where your sites are saved.)
git reset --hard origin/master
Restore the settings file.
Windows:
move ..\config.json Collect\
Linux/Unix:
mv ../config.json Collect/config.json
Go to the directory that contains package.json
.
cd Collect
Install all required packages.
npm install
After restarting your server, the new version should be up & running.
If it doesn’t start, delete the node_modules
directory and re-run npm install
.
See the contributing file.
SSH
or any malicious program) can read your credentials.HTTPS
.You’re using this tool at your own risk. I am not responsible for any lost data like passwords or websites.
Website Scraper Module: MIT License. This server is mostly a user interface to this module and would never have been possible without their work.
Website Scraper Module PhantomJS Plugin: MIT License. Makes processing dynamic pages as easy as pie.
The UIkit library: Copyright YOOtheme GmbH under the MIT License. I really love this UI framework.
ArchiverJS: Mit License. node-archiver
is a nice module for generating all kind of archive files. It is used to create backups.
Ionicons: MIT License. The icons are really nice. I used the ion-ios-cloudy-outline
icon.
Notification Sound: CC0 1.0 Universal License
See the License file