> In py27, `json.dumps` on a `str` gives you a `str`, and `json.dumps` on a `unicode` gives you a `unicode`.
Actually that's also false... It seems you only get a `unicode` when `ensure_ascii=False`. Anyway, very confusing. Glad I went all in on Python 3 years ago...
Cool, I'll give it a spin shortly. Regarding this:
> Here is the behavior with Python 2. The unicode version results in the unicode code-point escapes _regardless_ of `ensure_ascii`. It is only by serializing the UTF8-encoded `bytes` value with `ensure_ascii=False` that we get the equivalent of the "raw" characters:
Actually no, you seem to be tricked by py27's confusing return types and the repr representations that look similar...
> ```py
> In [9]: json.dumps(ustr, ensure_ascii=False)
> Out[9]: u'"\u4e2d\u6587"'
> ```
These are the unescaped Unicode code points U+4E2D and U+6587 (equivalent to the Python 3 `ensure_ascii=False` version), escaped for `repr`. In py27, `json.dumps` on a `str` gives you a `str`, and `json.dumps` on a `unicode` gives you a `unicode`.
> Anyways...the end-result is potentially 2 or 3 extra bytes for these code-points. Is this the crux of the issue? The extra bytes?
That's one problem. The other problem is when you look at JSON values directly without `json_extract` (e.g. in a command line shell, or in a GUI like sqlitebrowser), escaped codepoints are not readable.
Sorry, either I wasn't clear or I'm doing something wrong. Adapting your example a little bit:
```py
#!/usr/bin/env python3
import logging
from playhouse.sqlite_ext import *
logger = logging.getLogger("peewee")
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.DEBUG)
db = SqliteDatabase("/tmp/json1.db")
class KeyData(Model):
key = TextField()
data = JSONField()
class Meta:
database = db
KeyData.create_table()
unicode_str = "中文"
data = {"foo": unicode_str}
KeyData.create(key="k1", data=data)
```
Execution output:
```sql
('CREATE TABLE IF NOT EXISTS "keydata" ("id" INTEGER NOT NULL PRIMARY KEY, "key" TEXT NOT NULL, "data" JSON NOT NULL)', [])
('INSERT INTO "keydata" ("key", "data") VALUES (?, ?)', ['k1', '{"foo": "\\u4e2d\\u6587"}'])
```
Note that the value of `data` does NOT go through the `json` function.
Now, dumping the database:
```sql
$ sqlite3 /tmp/json1.db .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "keydata" ("id" INTEGER NOT NULL PRIMARY KEY, "key" TEXT NOT NULL, "data" JSON NOT NULL);
INSERT INTO keydata VALUES(1,'k1','{"foo": "\u4e2d\u6587"}');
COMMIT;
```
Your raw SQL example is exactly what I have in mind (as what should be done)
```sql
insert into foo (data) values (json('{ "foo": "bar" }'));
```
emphasis on `json(...)`.
> the library is doing the right thing:
Definitely, there's no bug here, it's just that JSON is not efficiently stored.
Expose serialization options in sqlite_ext.JSONField (or use better defaults)
=============================================================================
Currently, `sqlite_ext.py` seems to use `json.dumps` without any options, which means the defaults `separators=(', ', ': ')` and `ensure_ascii=True` are assumed. These are a waste of space, and `ensure_ascii` also reduces human readability of non-ASCII text without real world benefit (since SQLite is Unicode-safe).
Compare the result of SQLite JSON1 extension's builtin `json` function:
```sql
INSERT INTO entry(doc) VALUES(json('{ "lang": "中文" }'));
```
=>
```sql
INSERT INTO entry VALUES(1,'{"lang":"中文"}');
```
and peewee:
```py
Entry.insert(doc=dict(lang='中文'))
```
=>
```sql
INSERT INTO entry VALUES(1,'{"lang": "\u4e2d\u6587"}');
```
It would be nice if serialization options are exposed, or just use `separators=(',', ':')` and `ensure_ascii=False`.