Hey, finally took a look at this.
The result isn't satisfactory. The parser would be rather brittle due to multiple "Top stories" layouts I've witnessed at the same time, and the general indistinguishability with video carousel, Twitter carousel, etc. (I only realized it would pick up video carousel results after I've generated the patch below, and now I can't bother to further develop it.)
Also, you can only see the first three results. The rest are rendered by JS.
Apply the patch if you'd like to include this experimental functionality. I doubt we'll add it.
```diff
From 06e70e23f8086bb68d98893bbaeaa9df3639ef89 Mon Sep 17 00:00:00 2001
From: Zhiming Wang <[email protected]>
Date: Sun, 11 Oct 2020 23:19:12 +0800
Subject: [PATCH] Add experimental support for "Top stories"
---
googler | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/googler b/googler
index b1479a7..e2865d2 100755
--- a/googler
+++ b/googler
@@ -2343,6 +2343,35 @@ class GoogleParser(object):
cw = lambda s: re.sub(r'[ \t\n\r]+', ' ', s) if s is not None else s
index = 0
+
+ # Try to parse "Top stories".
+ #
+ # Detection doesn't work on all pages! E.g. when I search
+ # "covid" the layout for "Top stories" is simply different....
+ carousel = tree.select('g-section-with-header g-scrolling-carousel')
+ if carousel:
+ # Devise a really crappy strategy to tell a "Top stories"
+ # carousel apart from a Twitter carousel, which unfortunately
+ # shares the same structure.
+ section = next(el for el in carousel.ancestors() if el.tag == 'g-section-with-header')
+ if section.first_element_child().select('title-with-lhs-icon'):
+ # This section contains a title-with-lhs-icon (":newspaper
+ # icon: Top stories") which a Twitter carousel doesn't have,
+ # a good sign...
+ for card in carousel.select_all('g-inner-card'):
+ heading = card.select('[role=heading]')
+ title = heading.text
+ a = card.select('a')
+ url = a.attr('href')
+ metadata_node = heading.parent.last_element_child()
+ metadata = metadata_node.text if metadata_node is not heading else ''
+ result = Result(index + 1, cw(title), url, '',
+ metadata=cw(metadata), sitelinks=[], matches=[])
+ if result not in self.results:
+ self.results.append(result)
+ index += 1
+
+ # Regular results.
for div_g in tree.select_all('div.g'):
if div_g.select('.hp-xpdbox'):
# Skip smart cards.
--
2.28.0
```
Also available as a gist: https://gist.github.com/zmwangx/ce643da063bc6b259e83a46dfd719946.
@vstinner Thanks for the review.
> BrokenPipeError is more likely related to stdout than stderr.
Absolutely.
> I understand that your intent is to ignore the warning displayed on stderr when sys.stdout file is closed.
In fact setting `sys.stdout` was enough to suppress the unraisable exception. I threw `sys.stderr` in there without much thought just in case someone has stderr closed too (say if stderr is redirected to stdout). Now I realize it's more trouble than it's worth.
> Since json.tool already has a similar code pattern, maybe it would be worth it to provide a helper function somewhere. I would prefer a private function.
Are you suggesting something like
```py
def run_with_broken_pipe_awareness(func, *args, **kwargs):
try:
return func(*args, **kwargs)
except BrokenPipeError as exc:
sys.stdout = None
sys.exit(exc.errno)
```
? If I were to add such a private helper, where should I place it? I can't seem to find a place in the code base where cross-module helpers live.
> By the way, it might be better to delete sys.stdout or set it to None
I erroneously thought it needs to be set to something that still supports `write()` or it would again cause an exception. Apparently it's not the case, corrected.
Actually it's not in the commit message, I only referenced [bpo-39828](https://bugs.python.org/issue39828) for context in the PR body. Sorry it accidentally confused the bot.
bpo-42005: Fix CLI of cProfile and profile to catch BrokenPipeError
===================================================================
Catch `BrokenPipeError` in the CLI of `cProfile` and `profile` to reduce noise when piping through `head`.
Prior art of catching `BrokenPipeError` in a PSL CLI: [bpo-39828](https://bugs.python.org/issue39828) #18779 targeting `json.tool`.
A test can be added but I've held off for now since it would be rather unwieldy and of pretty limited value.
<!-- issue-number: [bpo-42005](https://bugs.python.org/issue42005) -->
https://bugs.python.org/issue42005
<!-- /issue-number -->