Files hosted at Google Drive service can be shared publicly. Nevertheless, if they’re too big, Google pages advice that “Google Drive can’t scan this file for viruses” and ask the user for confirmation in order to proceed with the download.
If for any reason you’re trying to do an unattended download of the file or downloading it on a resource-limited environment, this will prevent direct use of the link.
It seems that google exposes a confirmation key with the advice page, that must match a value returned with the next request. So the process transits within four main urls, starting from the (supposed) direct public link of the file:
- https:// docs.google.com/open?id=…
- https:// docs.google.com/uc?id=…&export=download&revid=…
- https:// docs.google.com/uc?export=download&confirm=SqLb&id=…&revid=…/…
- https:// doc-0s-ak-docs.googleusercontent.com/docs/securesc/…
The first url redirects after various steps to a page in which the next url of the process can be catch with:
This will be the second url of the process: this is the one which will ends up setting a “confirm” random value inside this final step’s url. This value seems to have a corresponding and different one on the response url inside the page: this value inside the page must be captured in order to form a new url similar to the 3 step: this time the cookie (previously set by google) returned will match the url value and it’ll eventually (after some redirections) end on an url like that of the 4 step, which is finally the one which will directly download the file.
Too much hassle for a single clean command line…
So, I’ve written a perl script that does all the work. With the help of wget, of course, which can follow “302” HTTP redirections, and load and store cookies. Neat tool!
Here it is at git repository.
The script runs on *nix… and also on a Windows with Perl and wget in the PATH.
There is a new v2.0 version intended to resume partially downloaded files.
For now, it lives on a different branch than the usual v1.x gdown.pl. Note that v2.0 may require wget v17 at least. It can also introduce a little initial delay in the download because it must detect when the real download begins after all the Google Drive redirections (this is not needed in v1.x).
I’d like to have posted the code here, but wordpress templates are a complete mess… I’d prefer plain text. Really.
Next lines correspond to a previous version of gdown.pl: from v1.1 on, it already uses –no-check-certificate by default.
Certificates can pose a problem on your system, because Google uses https all the time:
ERROR: Certificate verification error for drive.google.com: unable to get local issuer certificate
To connect to drive.google.com insecurely, use `–no-check-certificate’.
Unable to establish SSL connection.
If this is the case, install the “ca-certificates” package on your system, or replace “wget” in one line at the end of the script for “wget –no-check-certificate”.