GHC 2018-10-18

4 comments.

, https://git.io/fxwJj in jarun/googler
Ref #252.

, https://git.io/fxwUe in jarun/googler
[WIP] Parser rewrite
====================

The current version is already feature-complete (I believe).

Some highlights:

- Parser logic is <100 loc now.
- A new option `-D, --debugger` to open an IPython/Python shell to interactively inspect the DOM tree. Good for developing/debugging the parser in the case of upstream changes.
- "No results found for ... Results for ...:" now handled.

, https://git.io/fxVAz in jarun/googler
Google News and more are broken in v3.7.1
=========================================

Google News (`-N`) is broken in v3.7.1: no results. Some smaller things are broken too, e.g., file types are not shown in front of title (e.g. `filetype:pdf` result titles should start with `[PDF] `, but now they are not).

These problems arise from the fact that the user agent `googler/3.7.1` is roughly equivalent to no user agent, and the HTML response needs to be parsed with the `--noua` variant, but the `--noua` variant is guarded by `not ua` conditions.

One way to fix this is to drop the `not ua` guards, but the other branch might need to be reviewed and some code may need to be removed. The other way to fix this is to wait for the v4.0 parser rewrite. I already have a functional version, but it hasn't undergone scrutiny yet.

, https://git.io/fxVWE in jarun/googler
Sure. In fact, these days Google is increasingly liberal with paraphrasing queries, sometimes even quoted ones. It's annoying as hell.