I use wget like this:
wget --load-cookies cookies.txt downloadlink
The download is successful. But the problem is that it stores useless files from the link to the directory after I delete it for many times. It auto creates www.mylink.com dir in the directory no matter how many times I delete it. I try to find where the cookies is saved on my server so I can delete them, but I can't find them.
I hope someone can help as the files that are automatically stored are getting bigger each second. Also there's a file called .fuse_hidden0345bd8d000004e3 which is getting bigger too, and I can't delete it either.
The OP solved the problem and posted the solution here on Ubuntu Forums.
(Thanks to vasa1 for finding this.)
While the problem remains somewhat vague, the solution can be applied easily, so this may help others in the same situation.
The solution was to kill wget (probably with killall wget assuming no other important wget instances). Apparently the problem related to a session or sessions that had remained open due to running wget instances.
More information does not appear to be available.
@lance: Please still post an answer if you can give more information, for example, if you know why your solution worked.
I want to download entire website. It has a very simple structure, pure HTML+CSS, no CMS or whatever. However, instead of most pages I get the same 31byte file, with a string "Requested range not satisfiable"
I run wget as following:
wget -m -p -k -U Mozilla http://address
What options can I try?
wget is one possibility to mirror a website, another one is httrack. Maybe you can give a try.
That error means
416 Requested Range Not Satisfiable
The client has asked for a portion of the file, but the server cannot supply that portion. For example, if the client asked for a part of the file that lies beyond the end of the file.
This is taken from http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Some sites also block the use of robots using robots.txt. More info here - http://en.wikipedia.org/wiki/Robots_exclusion_standard
Also as @sh4d0w mentioned httrack, here is the link for it - http://www.httrack.com/
to download an entire website this is the command i use with wget
wget -r http://www.youtube.com/
this download every page on www.youtube.com en the subdirectories yotube.com/?/?/?/?/
this usually works for me.
--domains website.org \
The options are:
The options are:
--recursive: download the entire Web site.
--domains website.org: don't follow links outside website.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and