GHC 2019-06-23

6 comments.

, https://git.io/fjwX9 in jarun/googler
Okay, then one is limited by `ARG_MAX` so many long features probably won't work.

Of course, long features can be chunked.

, https://git.io/fjwXH in jarun/googler
The tts engine leaves much to be desired though...

(Note that my quick and dirty text on command line approach is limited by `ARG_MAX` and company. If pico2wave accepts text on stdin, it should work, too.)

, https://git.io/fjwXQ in jarun/googler
Here's a base64 encoded sample of what I got from https://techcrunch.com/2019/06/23/week-in-review-youtubes-awful-comments-and-googles-1b-tech-free-investment/

[tmpedrzr7pe.wav.b64.txt](https://github.com/jarun/googler/files/3318063/tmpedrzr7pe.wav.b64.txt)

, https://git.io/fjwX7 in jarun/googler
Just tried

```py
#!/usr/bin/env python3

import argparse
import os
import subprocess
import tempfile

import newspaper


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('urls', metavar='URL', nargs='+')
    args = parser.parse_args()

    for url in args.urls:
        article = newspaper.Article(url)
        article.download()
        article.parse()
        fd, path = tempfile.mkstemp(suffix='.wav')
        os.close(fd)
        subprocess.check_call(['pico2wave', '-w', path, '-l', 'en-GB', article.text])
        print(path)


if __name__ == '__main__':
    main()
```

Works for me without a problem.

, https://git.io/fjwXI in jarun/googler
(Apparently since it outputs a wav file I don't need a sound device on Linux. Will give it a spin.)

, https://git.io/fjwXL in jarun/googler
Sorry, unplanned things came up over the weekend so this has to wait till tomorrow, again...

> Would it be possible to read it as well using pico2wave?
>
> I tried this using subprocess but I think the encoding to utf-8 has to be changed to pass the text to pico2wave. For me the output wav is just reading the characters one by one.

I don't have a Linux install connected to a sound device right now so I can't test this (okay, I'm too lazy to reboot into my bare metal Linux install...). What do you mean by "the encoding to utf-8 has to be changed"? You mean it only accepts ASCII, so U+0080 and above need to be stripped? As long as this pico2wave accepts textual input, there's no reason googler, which prints text, can't work with it.