GHC 2020-06-08

1 comment.

, https://git.io/JfD0s in zmwangx/caterpillar
I have implemented the feature requests I deem appropriate.

- ffmpeg_loglevel: appropriate. Revamped in e3ebefc, should be completely consistent now.

  You were seeing extraneous output despite filtering effort since previously suppression was based on a single (repurposed, or half-assed in this context) pattern, meaning a lot of stuff slip through the crack; and the final merging pass wasn't filtered at all, and you were only fixing that. The implementation here fixes everything.

- progress_hooks: appropriate. Instead implemented more general, strictly typed event hooks in bbe9bb4. I wrote a simple adapter to the reporting scheme you implemented:

  ```py
  #!/usr/bin/env python3
  
  import math
  import pathlib
  
  from caterpillar import caterpillar
  from caterpillar.events import (
      Event,
      SegmentDownloadSucceededEvent,
      SegmentDownloadFailedEvent,
      SegmentsDownloadInitiatedEvent,
      SegmentsDownloadFinishedEvent,
  )
  
  
  class ProgressReporter:
      def __init__(self):
          self.segment_count = 0
          self.handled_count = 0
          self.downloaded_bytes = 0
  
      def event_hook(self, event: Event) -> None:
          # An alternative to isinstance: check event.event_type against
          # the EventType enum.
          if isinstance(event, SegmentsDownloadInitiatedEvent):
              self.segment_count = event.segment_count
              return
          elif isinstance(event, SegmentsDownloadFinishedEvent):
              data = dict(
                  status="finished",
                  fragment_index=self.handled_count,
                  fragment_count=self.segment_count,
                  downloaded_bytes=self.downloaded_bytes,
                  # Warning: this is wrong, since errored segments haven't
                  # been counted.
                  total_bytes=self.downloaded_bytes,
              )
          elif isinstance(event, SegmentDownloadSucceededEvent):
              self.handled_count += 1
              self.downloaded_bytes += event.path.stat().st_size
              data = dict(
                  status="downloading",
                  # Warning: caterpillar uses imap_unordered which returns
                  # results out of order, so calling it fragment_index is
                  # not accurate.
                  fragment_index=self.handled_count,
                  fragment_count=self.segment_count,
                  downloaded_bytes=self.downloaded_bytes,
                  total_bytes_estimate=math.ceil(
                      self.downloaded_bytes / self.handled_count * self.segment_count
                  ),
              )
          elif isinstance(event, SegmentDownloadFailedEvent):
              self.handled_count += 1
              return
          else:
              return
          ...
  
  
  def main():
      reporter = ProgressReporter()
      caterpillar.process_entry(
          "...",
          pathlib.Path("..."),
          force=True,
          keep=True,
          event_hooks=[reporter.event_hook],
      )
  
  
  if __name__ == "__main__":
      main()
  ```

  You should probably consider implementing the event hook API natively though.

- download_m3u8_playlist_or_variant: appropriate. Implemented variant streams support in e1dd54c. Instead of blindly selecting the first variant, I try to find the best variant. See `variants.py`.

- invoke: not appropriate. Reason: there's already `caterpillar.caterpillar.process_entry` and `caterpillar.caterpillar.process_batch` that basically do this, the two are different enough that an application programmer probably won't use both indiscriminately and need their own dispatch anyway (if they use both, which they probably don't), so adding another trivial dispatcher or two seems pointless.

I think those are all the substantial requests. Let me know if you have questions or need anything else. I'll cut a release if there are no additional requests.

---

Unrelated Python clear-up:

> I thought using double underscores to hide those function from external code - those functions should not be called directly.

No, you were thinking about name mangling, which only applies to methods, not module functions and variables, and even then only makes access difficult, not impossible. There's a very narrow use case that you'll know when you really, *really* need it (I was exaggerating a bit when I said "it never is", but the point is if you just want to mark something private, you shouldn't do this). https://docs.python.org/3/tutorial/classes.html#private-variables

If you just want to "hide" module functions from `help` and `import *`, then a single underscore is enough.

A simple example you can try:

`mod.py`:

```py
__hidden_var = "nope, not hidden"


def __hidden_func():
    print("nope, not hidden")


class Foo:
    def __hidden_method(self):
        print("nope, still here")
```

`main.py`:

```py
import mod


def main():
    print("mod.__hidden_var:", end="\t")
    print(mod.__hidden_var)
    print("mod.__hidden_func:", end="\t")
    mod.__hidden_func()
    foo = mod.Foo()
    print("mod.Foo.__hidden_method:", end="\t")
    try:
        foo.__hidden_method()
    except AttributeError as e:
        print(f"AttributeError: {e}")
    print("mod.Foo._Foo__hidden_method:", end="\t")
    foo._Foo__hidden_method()


if __name__ == "__main__":
    main()
```

```console
$ python3 main.py
mod.__hidden_var:	nope, not hidden
mod.__hidden_func:	nope, not hidden
mod.Foo.__hidden_method:	AttributeError: 'Foo' object has no attribute '__hidden_method'
mod.Foo._Foo__hidden_method:	nope, still here
```