GHC 2020-10-15

1 comment.

, https://git.io/JT3vq in indygreg/PyOxidizer
PEP 587 has a [nice comparison](https://www.python.org/dev/peps/pep-0587/#comparison-of-python-and-isolated-configurations) of Python and isolated configurations:

|PyPreConfig|Python|Isolated|
|--- |--- |--- |
|coerce_c_locale_warn|-1|0|
|coerce_c_locale|-1|0|
|configure_locale|1|0|
|dev_mode|-1|0|
|isolated|0|1|
|legacy_windows_fs_encoding|-1|0|
|use_environment|0|0|
|parse_argv|1|0|
|utf8_mode|-1|0|

However, the isolated config is meant for embedding, enabling very precise control; the Python interpreter itself never seems to call `PyPreConfig_InitIsolatedConfig`, even though it has an "isolated mode" enabled by the [`-I` option](https://docs.python.org/3/using/cmdline.html#id2). If we look at https://github.com/python/cpython/blob/b67cbbda3a022cec5e2ad929f0531162166e7c8d/Python/pylifecycle.c#L808-L823, the interpreter calls `_PyPreConfig_InitFromPreConfig` which in turn invokes `PyPreConfig_InitPythonConfig` unconditionally. PEP 587 also documents `-I` (as well as `Py_IsolatedFlag`) as setting the `isolated` field to 1, while not explicitly affecting any other preconfig fields.

I patched CPython a bit to check exactly what `-I` affects. The patch:

```diff
diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
index 75d57805c0..0ac3a73661 100644
--- a/Python/pylifecycle.c
+++ b/Python/pylifecycle.c
@@ -817,6 +817,17 @@ _Py_PreInitializeFromPyArgv(const PyPreConfig *src_config, const _PyArgv *args)
         return status;
     }
 
+    printf("\n");
+    printf("coerce_c_locale_warn = %i\n", config.coerce_c_locale_warn);
+    printf("coerce_c_locale = %i\n", config.coerce_c_locale);
+    printf("configure_locale = %i\n", config.configure_locale);
+    printf("dev_mode = %i\n", config.dev_mode);
+    printf("isolated = %i\n", config.isolated);
+    printf("use_environment = %i\n", config.use_environment);
+    printf("parse_argv = %i\n", config.parse_argv);
+    printf("utf8_mode = %i\n", config.utf8_mode);
+    printf("\n");
+
     status = _PyPreConfig_Write(&config);
     if (_PyStatus_EXCEPTION(status)) {
         return status;
```

Now, on macOS, running the patched Python:

```console
$ ./python.exe

coerce_c_locale_warn = 0
coerce_c_locale = 0
configure_locale = 1
dev_mode = 0
isolated = 0
use_environment = 1
parse_argv = 1
utf8_mode = 0

...
```

```console
./python.exe -I

coerce_c_locale_warn = 0
coerce_c_locale = 0
configure_locale = 1
dev_mode = 0
isolated = 1
use_environment = 0
parse_argv = 1
utf8_mode = 0

...
```

so the isolated mode of the python executable seems to have `configure_locale` enabled anyway. I think that's probably a saner default.

UTF-8 mode on the other hand reduces a lot of I/O encoding woes so I personally like it, but I'm neutral on whether to make it default.

Whatever the eventual solution, IMO it's a good idea to handle non-ASCII (and non-Latin-1, I guess) arguments properly by default. Too many developers don't bother to test them, so if they're not handled by default, the problems tend to only be caught by users after the programs are shipped; and sometimes the problems manifest in totally non-obvious ways, causing massive confusion and wasting significant troubleshooting time. I only happened upon this thanks to an abundance of caution which I don't always practice...