GHC 2020-10-12

2 comments.

, https://git.io/JTL3F in jarun/googler
After stumbling upon a few discussions about Python performance, I tested a few things in a small rush of interest. Here's some pretty boring findings.

Good news, googler works with pypy3 (7.3.1, targeting CPython 3.6.9) seemingly perfectly, if anyone cares. The boring and unsurprising news is that it's frigging slow for our use case. Using the same prefetched HTML fixtures:

```console
$ for f in fixtures/googler-*.html; do time ( repeat 10 python3 ./googler --parse $f >/dev/null 2>&1 ); done
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.35s user 0.14s system 98% cpu 1.515 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  2.37s user 0.16s system 98% cpu 2.563 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.31s user 0.14s system 98% cpu 1.478 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.12s user 0.13s system 98% cpu 1.279 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.28s user 0.14s system 98% cpu 1.439 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.25s user 0.14s system 98% cpu 1.418 total
( repeat 10; do; python3 ./googler --parse $f > /dev/null 2>&1; done; )  1.48s user 0.15s system 98% cpu 1.652 total
```

```console
$ for f in fixtures/googler-*.html; do time ( repeat 10 pypy3 ./googler --parse $f >/dev/null 2>&1 ); done
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  7.76s user 0.49s system 99% cpu 8.290 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  14.35s user 0.62s system 99% cpu 15.006 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  7.46s user 0.49s system 99% cpu 7.972 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  5.89s user 0.48s system 99% cpu 6.400 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  7.15s user 0.48s system 99% cpu 7.661 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  7.04s user 0.49s system 99% cpu 7.564 total
( repeat 10; do; pypy3 ./googler --parse $f > /dev/null 2>&1; done; )  8.70s user 0.49s system 99% cpu 9.221 total
```

Now, a JIT only shines when you run the same code over and over again, so if I patch googler to parse  the same page 100 times per run:

```console
$ # googler patched to parse the same page 100 times.
$ for f in fixtures/googler-*.html; do time ( python3 ./googler --parse $f >/dev/null 2>&1 ); done
( python3 ./googler --parse $f > /dev/null 2>&1; )  6.03s user 0.07s system 99% cpu 6.096 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  16.72s user 0.13s system 99% cpu 16.865 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  6.00s user 0.07s system 99% cpu 6.076 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  3.94s user 0.04s system 99% cpu 3.991 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  5.55s user 0.07s system 99% cpu 5.627 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  5.42s user 0.06s system 99% cpu 5.493 total
( python3 ./googler --parse $f > /dev/null 2>&1; )  8.05s user 0.11s system 99% cpu 8.172 total
```

```console
$ # googler patched to parse the same page 100 times.
$ for f in fixtures/googler-*.html; do time ( pypy3 ./googler --parse $f >/dev/null 2>&1 ); done
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  13.17s user 0.20s system 99% cpu 13.385 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  20.03s user 0.29s system 99% cpu 20.348 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  12.53s user 0.19s system 99% cpu 12.752 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  8.22s user 0.14s system 99% cpu 8.368 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  10.07s user 0.16s system 99% cpu 10.248 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  10.74s user 0.17s system 99% cpu 10.917 total
( pypy3 ./googler --parse $f > /dev/null 2>&1; )  13.37s user 0.18s system 99% cpu 13.560 total
```

the gap becomes a lot closer. But of course there's no point in parsing the same page 100 times in a session, nor does anyone browse to the 100th result page ever, so it's purely an academic curiosity that validate the first thing about how JIT works.

, https://git.io/JTIjj in jarun/googler
It's not like I haven't given this some substantial thought at one point.

- All UI ideas (including fetching the images, stitching them into larger grids and labelling them, opening in image viewer or even video player) I could come up with were utter garbage. Why not use an actual graphical and interactive application — a graphical web browser — for something inherently graphical and interactive?

- Google Images search returns image results including captions in a JS blob, so the architecture would be significantly different. The answer is probably regex (also likely needs to be paired with a safe variant of `eval`), which takes the fragility to a whole new level.

Among all the rejected suggestions, image search would be a hard no.