When to use unstructured data types in PostgreSQL? Comparison * PostgreSQL hstore vs. JSON vs. JSONB

Since PostgreSQL has started supporting NoSQL (via * PostgreSQL hstore, JSON and JSONB), the question of when to use a PostgreSQL relational mode, and any mode of NoSQL, began to rise quite often. Will you have to completely abandon the traditional table structures and work with representations of documents in the future? To mix both approaches? The answer to this question is not surprising — it all depends on many factors. Each new model of data storage including * PostgreSQL hstore, JSON and JSONB has its ideal applications. Then we dig deeper and learn about the features of each of them and view when to use



* PostgreSQL hstore


excluding the XML * PostgreSQL hstore was the first truly unstructured data type, added in PostgreSQL. * PostgreSQL hstore was added quite a long time in Postgres 8.3 to upsert, streaming replication, and to window functions. * PostgreSQL hstore is basically a repository of key/value directly in PostgreSQL. * PostgreSQL hstore using you limited in the choice of the used data type. In fact you have only strings. You don't even have nesting data; in Short, this single-level data type key/value.


* PostgreSQL hstore Advantage is that you do not have to define the keys (unlike columns) in advance. You can just insert a record, and it will store all necessary data. Let's say you have a sample script to create the table:


the
CREATE TABLE products (
id serial PRIMARY KEY,
name varchar,
* PostgreSQL hstore attributes
);

With the * PostgreSQL hstore you can insert anything you want in the attribute column. In this case, the request to add these keys and values will look as follows:


the
INSERT INTO products (name, attributes) VALUES (
'Geek Love: A Novel',
'author = > "Katherine Dunn",
pages = > 368,
category => fiction'
);

a select Query will be:


the
SELECT name, attributes->'author' as author
FROM products
WHERE attributes->'category' = 'fiction'

the Obvious advantage of this approach is flexibility, but that's where it really manifests itself fully, so it is possible to use different types of indexes. In particular, GIN or GiST index would be to index each key and value within a * PostgreSQL hstore. I.e., when filtering is used-added code in case if required by the scheduler PostgreSQL.


Because * PostgreSQL hstore is not a full equivalent document, it is important to understand is it profitable to use it as such.
If you have relational data and also some data, which may not always exist in the column, this approach can be a great solution. For example, in most cases, the attributes of the product catalogs can be a great example for this type of data. Then, for some products, such as books (which you store in a separate table named “Products”) can be defined such as genre, year of publication. In another case for products such as clothing, which you also store in that same table can be defined already by other parameters — the size and color. Add the same column to the products table for each possible parameter redundantly and unnecessarily.


JSON


Starting with version 9.2 of Postgres supports JSON. Now, PostgreSQL can compete with MongoDB. (Although the functionality of JSON in PostgreSQL 9.2, of course, a bit exaggerated. More on that below.)


the data Type as JSON in Postgres, if you look at pretty much just the text box. All you get with JSON type of data is the validation of values when you insert. Postgres enforces the JSON format. One small potential advantage of JSONB (which we will discuss next) is that the JSON preserves the indentation of the data that enters the database. So if you are very picky about the formatting of your data, or you need to save a record in a particular structure, JSON can be useful.


JSONB


Finally, in Postgres 9.4 we have received a true and correct JSON support in the form of JSONB. B means “better” (Better). JSONB is a binary representation of the data in the JSON format. This means that the data is compressed and more efficient for storage than text. In addition, under the hood he has a mechanism similar * PostgreSQL hstore. Technically, once in the development was almost completed Hstore2 type and separate type JSON and subsequently they were United in JSONB, as he is now.


the JSONB Type is largely what you might expect from the type of JSON data. It allows you to implement a nested structure, use the basic data types, and also has a number of built-in functions to work with it. The best part of this similarity with the * PostgreSQL hstore is indexing. Create GIN index on a JSONB column will create an index for each key and value within a JSON document. The possibility of indexing and nesting of data within the document indicate that JSONB * PostgreSQL hstore is superior in most cases.


Although still a small question about in what cases it is necessary to use only JSONB. Let's say you create a document database and from all variants choose Postgres. With a package like MassiveJS it can be quite convenient.


the Most common examples of use:


    the
  1. event Tracking data by adding a changing payload of the event.
  2. the
  3. Storing the game data is quite common, especially where you have single player and changing the data schema on the basis of the user state.
  4. the
  5. Tools that integrate multiple data sources, the example here can be the tool that integrates multiple customer databases to Salesforce, Zendesk or something else. The combination of the circuits makes it more painful than it should be.

Let's look at another example of JSONB. The script creates a table and inserts some sample data:


the
CREATE TABLE integrations (id UUID, data JSONB);

INSERT INTO integrations VALUES (
uuid_generate_v4(),
'{
"service": "salesforce",
"id": "AC347D212341XR",
"email": "craig@citusdata.com",
"occurred_at": "8/14/16 11:00:00",
"added": {
"lead_score": 50
},
"updated": {
"updated_at": "8/14/16 11:00:00"
}
}');

INSERT INTO integrations (
uuid_generate_v4 (),
'{
"service": "zendesk",
"email": "craig@citusdata.com",
"occurred_at": "8/14/16 10:50:00",
"ticket_opened": {
"ticket_id": 1234,
"ticket_priority": "high"
}
}');

In the above case, you can easily find all events that have occurred c by the user with email craig@citusdata.com and then do some action. For example, you can conduct some form of behavioral analysts, and to calculate the users who did foo and then bar, or do a simple report.
Adding a Gin index all the data within my JSONB field index automatically:


the
CREATE INDEX idx_integrations_data ON integrations USING gin(data);

Conclusion


In most cases JSONB this is probably just what you are looking for when you are planning to use nonrelational data type. For * PostgreSQL hstore and JSON, you can also find a good application, though in more rare cases. JSONB does not always fit into the data model. In case you can normalize the schema, you will have the advantage, but if the schema, a large number of optional columns (e.g., data about the events) or a scheme very different from another, that JSONB is much better.


in summary, the algorithm for choosing decision:


JSONB — IN most cases
JSON — If you are processing the logs, you do not often need to request data or not to use them as something more than for the tasks of logging.
* PostgreSQL hstore — works fine with text data based on the submission of key-value, but in General JSONB also copes with this task.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Custom database queries in MODx Revolution

Google Web Mercator: a mixed coordinate system