[common] print user agent when dumping URLs (-u, --url)
=======================================================
This is just a suggestion.
Certain sites (known example: tudou.com; other sites may be affected too, currently or in the future) deny access to video URLs unless user agent matches the one used when retrieving URLs from the API, effectively rendering the URLs useless without the proper user agent. Therefore, exposing the user agent (which may not be even to find for casual users) in `-u, --url` output could come in handy at times.
This commit does not affect `--json` output.
Sample sessions:
```
$ you-get -u http://www.tudou.com/programs/view/QjZZ5dzxR9s/
Site: Tudou.com
Title: 中华人民共和国国歌(义勇军进行曲)
Type: MPEG-4 video (video/mp4)
Size: 0.64 MiB (672037 Bytes)
User Agent: Python-urllib/3.6
Real URLs:
http://58.205.218.5/mp4/61/125003061.h264_1.mp4?key=a0cbde8160cf77affa236658b3b93f0011c406d8cf&qrs=98225466&p=4149345344195641307&playtype=52&tk=142762909734852181110599083&brt=52&bc=0&xid=040052010051A775CC553C6DEE9CA13C870B90-2AF7-AA76-622A-58F2D9D1CABF&nt=0&nw=1&bs=0&ispid=1002&rc=200&inf=12&si=un&npc=1168&pp=0&ul=0&mt=0&sid=0&pc=0&cip=140.180.188.12&id=tudou&hf=0&hd=0&sta=0&ssid=0&cvid=&itemid=87497779&fi=0&sz=672037
```
```
$ you-get -u https://vimeo.com/118190268
Site: Vimeo.com
Title: 1Password 5 for iOS – Setting Up One-Time Passwords from AgileBits on Vimeo
Type: MPEG-4 video (video/mp4)
Size: 6.4 MiB (6710452 Bytes)
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
Real URLs:
https://13-lvl3-pdl.vimeocdn.com/01/3638/4/118190268/331170575.mp4?expires=1488177190&token=0cc335f9ce7c7759a09a2
```
## A note on programmatic parsing of `you-get -u` output
This change slightly breaks the expectations of `you-get -u` output. However, since the output includes video info to begin with, with is rather irregular across different sites or even videos from the same site, hopefully users who parse this output already have proper filtering in place (e.g. `sed -n '/^Real URLs:$/,$ { /^Real URLs:$/n; p; }'`, or even simpler, `grep '^http'`) which won't be affected by this injection.
There's also `--json` which isn't affected.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/soimort/you-get/1715)
<!-- Reviewable:end -->
> The whole reason for this feature exists is to stop people from screwing themselves.
Real people who actually need or benefit immensely from a curlrc would be screwed to stop a couple of people who don't need it from screwing themselves (I looked at almost all threads containing the keyword curlrc, and only a few of them are failures that might be attributed to curlrc). That doesn't sound fair to me.
Anyway, this is not the thread to argue about curlrc, and I assume `CURL_HOME` can be whitelisted on a case-by-case basis, however that whitelisting is implemented. (This PR only invalidated `$CURL_HOME/.curlrc`, it doesn't even affect `$HOME/.curlrc`; it would take another PR to throw out #516.)
> And environment variables can be added as HOMEBREW_ and used by specific commands on a case-by-case basis which is the option that means they are applied where needed rather than universally.
Sure, then it's just a more conservative and complicated way to do the same thing. "Growing list" concern applies there, too, assuming it is a concern.
> CURL_HOME. Without this user's curlrc may be ignored.
>
> I consider this a feature rather than a bug, I'm afraid.
Except respecting user's curlrc was a design decision. #516. You and most users may not need a curlrc, but it could be very useful for people with bad connections, for instance (I say this from first-hand experience).
> It remains to be seen how many are truly harmless i.e. misconfiguration of them cannot break Homebrew.
They haven't been breaking Homebrew so far, AFAIK. (With the exception of misconfiguring curlrc, which is like misconfiguring your home router and end up not being able to connect to the Internet — there's always a way one can screw themself.)
One can also present convincing arguments on a case-by-case basis.
> It's also going to introduce a list that's going to be endlessly added to.
1. I quote:
> harmless environment variables that are used by external commands invoked by brew that users are allowed to customize (e.g., curl, emacs, zsh)
That list won't be endless (with an asterisk), because brew only ever invokes a handful of external commands that aren't dumb plumbing commands.
2. Even if the list becomes long (it won't), the practical downside (as opposed to theoretical support burden) is ...? It's just a straightforward, highly visible list.
> It's also worth noting that all those variables you mention above would not be used by all Homebrew commands.
Of course. I was proposing the simplest approach (which IMO has the biggest chance of winning favor).
> Let's stick with the current approach for now and we can iterate on it.
What I proposed effortlessly fits into the current approach though.
[common] print user agent when dumping URLs (-u, --url)
=======================================================
This is just a suggestion.
Certain sites (known example: tudou.com; other sites may be affected too, currently or in the future) deny access to video URLs unless user agent matches the one used when retrieving URLs from the API, effectively rendering the URLs useless without the proper user agent. Therefore, exposing the user agent (which may not be even to find for casual users) in `-u, --url` output could come in handy at times.
This commit does not affect `--json` output.
Sample sessions:
```
$ you-get -u http://www.tudou.com/programs/view/QjZZ5dzxR9s/
Site: Tudou.com
Title: 中华人民共和国国歌(义勇军进行曲)
Type: MPEG-4 video (video/mp4)
Size: 0.64 MiB (672037 Bytes)
User Agent: Python-urllib/3.6
Real URLs:
http://58.205.218.5/mp4/61/125003061.h264_1.mp4?key=a0cbde8160cf77affa236658b3b93f0011c406d8cf&qrs=98225466&p=4149345344195641307&playtype=52&tk=142762909734852181110599083&brt=52&bc=0&xid=040052010051A775CC553C6DEE9CA13C870B90-2AF7-AA76-622A-58F2D9D1CABF&nt=0&nw=1&bs=0&ispid=1002&rc=200&inf=12&si=un&npc=1168&pp=0&ul=0&mt=0&sid=0&pc=0&cip=140.180.188.12&id=tudou&hf=0&hd=0&sta=0&ssid=0&cvid=&itemid=87497779&fi=0&sz=672037
```
```
$ you-get -u https://vimeo.com/118190268
Site: Vimeo.com
Title: 1Password 5 for iOS – Setting Up One-Time Passwords from AgileBits on Vimeo
Type: MPEG-4 video (video/mp4)
Size: 6.4 MiB (6710452 Bytes)
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
Real URLs:
https://13-lvl3-pdl.vimeocdn.com/01/3638/4/118190268/331170575.mp4?expires=1488177190&token=0cc335f9ce7c7759a09a2
```
## A note on programmatic parsing of `you-get -u` output
This change slightly breaks the expectations of `you-get -u` output. However, since the output includes video info to begin with, with is rather irregular across different sites or even videos from the same site, hopefully users who parse this output already have proper filtering in place (e.g. `sed -n '/^Real URLs:$/,$ { /^Real URLs:$/n; p; }'`, or even simpler, `grep '^http'`) which won't be affected by this injection.
There's also `--json` which isn't affected.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/soimort/you-get/1713)
<!-- Reviewable:end -->