Sorry for the late reply on a closed issue. But I don't think this is a duplicate of #352.
#352 is about truly invalid characters in hrefs, e.g. `|`, which isn't a valid URI character per either RFC 3986 or RFC 3987. The problem in the case is that tidy isn't spotting the invalid character, and isn't doing anything, not even emitting an error.
The issue here is different. `http://example.com/é` is a valid IRI per RFC 3987, and tidy is (arguably wrongly, by some evolving standards) flagging it as malformed.
1. It's rather pointless to echo user input. Echoing back the URL is at best slightly more clear when users are downloading multiple videos, but even then due to the sequential nature of you-get, it's extremely easy to know exactly which URL the warning applies to.
Note that currently error messages don't contain original URLs either. Not following conventions makes the (frankly already quite inconsistent, due to its nature) project inconsistent.
2. > Users should not be informed with segments.
Users are already exposed to the total number of segments (that are not skipped) and the segment currently being downloaded in the progress bar. So this assertion has no ground.
> Number of segments skipped does not help the user.
As a user myself, it totally helps. When a video claimed to be 1GB (in fact, that number isn't available across the board, or it could be a wrong number) ended up with only 3 segments totally 100MB, knowing that 27 segments were skipped due to paywall instantly makes the situation clear, and helps estimate how much I'm missing out.
Not sure what you mean. This is just a warning, essentially telling the user "look, there are this many segments we can't do anything about". All intermediate URLs are useless without keys.
By the way, checking membership of `None` is perfectly fine. (1) An empty sequence or set type is vastly different from `None`; (2) `in` is a membership test operator; it doesn't test subset/subsequence. (Of course, I'm talking about its original use; `__contains__` can be implemented however you like, `str` being a good example.)
log: mark xterm* terminals as ANSI escape sequences-compatible
==============================================================
`xterm-color`, `xterm-16color`, `xterm-88color` and `xterm-256color` are now covered.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/soimort/you-get/1745)
<!-- Reviewable:end -->
youku: warn about segments skipped due to paywall
=================================================
This is especially helpful in cases where the entire video is blocked by paywall, resulting in an unhelpful error message
```
you-get: [Failed] Cannot extract video source.
```
I spent quite a few minutes debugging a URL like that,^ which could have been saved by the message
```
you-get: Skipping 27 out of 27 segments due to paywall
```
added by this commit.
^The video is completely region-blocked where I am, so I didn't know it was behind a paywall before feeding it to you-get — with a Mainland-based extractor proxy of course. That should explain my initial confusion — it would have been obvious if I checked its availability in a Unblock Youku-enabled browser session first. Still, more self-contained information should make you-get better.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/soimort/you-get/1744)
<!-- Reviewable:end -->