GHC 2020-10-14

1 comment.

, https://git.io/JTm4H in indygreg/PyOxidizer
A mistake in my original report: using `configure_locale` actually does work, unlike `utf8_mode`.

---

Now that I think about it, the mangled version on macOS is UTF-8 encoded string decoded using Latin-1 or something:

```python
>>> '中文'.encode('utf-8').decode('latin-1', 'surrogateescape')
'ä¸\xadæ\x96\x87'
```

and the mangled version on Linux is UTF-8 encoded string decoded as ascii with surrogateescape:

```python
>>> '中文'.encode('utf-8').decode('ascii', 'surrogateescape')
'\udce4\udcb8\udcad\udce6\udc96\udc87'
```

So the problem is `osstr_to_pyobject` tries to use the locale encoding (through `PyUnicode_DecodeLocaleAndSize`), but `sys.argv` is supposed to be decoded "with filesystem encoding and “surrogateescape” error handler", and there could be a mismatch here, which is manifest when [on macOS or using UTF-8 mode elsewhere](https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding), and without locale (as is the default with pyoxidizer if I understand correctly).

Using `PyConfig_SetBytesArgv` with a raw argv pointer seems to be the safer bet here. Or maybe use [`Py_DecodeLocale`](https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale) coupled with [`PyUnicode_FromWideChar`](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_FromWideChar)? Seeing that `PyConfig_SetBytesArgv` calls `Py_DecodeLocale` under the hood.