Comparing Windows Azure Table Storage and Amazon DynamoDB

Hello.
I offer a translation of the first article of the cycle comparison of the services provided by Windows Azure and Amazon that is quite famous in the circles of the cloud specialist — Gaurav Mantri.

In this article, I compare Windows Azure Table Storage and Amazon DynamoDB – WATS and ADDB respectively.

From the point of view of functionality WATS and ADDB provide similar opportunities. Both of NoSQL systems designed to store large amounts of data. Amazon also has another database NoSQL SimpleDB.

An important point to be noted is the fact that ADDB is not just a NoSQL database. This database service. Yes, it is true that it is used to manage data, but you control the scalability of the system using the bandwidth that you need. In this sense this is very similar to instances of the compute service Amazon or Windows Azure. In the case of the instance of the compute service you choose what instance size you need and the system responds to the request. Similarly in case of ADDB – you tell the system how many reads and write will produce your application in the table of ADDB and ADDB provides the necessary power.

Conceptually both systems are similar:

Both systems – non-relational NoSQL.
Both systems in General are repositories of the records of key-value.
does not support relations, which are available in a relational database.
Implicit support for high availability and flexibility.
Both systems provide a REST API for working with queues and messages and other libraries of high-level languages, which are usually wrappers implementing REST API. In both systems, each release has its own API version is expressed in the date. At the time of writing these versions is equal to: WATS — 2011-08-18, ADDB — 2011-12-05.

Of course, there are some major differences:

IN ADDB bandwidth that you need, this is what stands out to you when you start working with the system, in the case WATS the bandwidth is controlled by the system. Therefore, the system ADDB is more flexible, but requires a more “thorough” work.
In contrast to SimpleDB, where the domain is a limit of 10 GB, ADDB does not have this limitation – you can store as many data. WATS also puts hard limits on the data in the table, but you are limited by the size of the account of storage (now 100 TB).
ADDB can according to your desire to index your data, in contrast to WATS. Technically WATS also indexes your data, but only on certain attributes (PartitionKey, RowKey), and that is the ability to have secondary indexes in WATS is one of the most-requested functions.

Concept

Table: when we think about the table, the first thing that comes to mind is the assertion that it is “something that consists of rows and columns”. A table in WATS and ADDB may look like we imagine, but in fact it is not. Consider a table as a container that contains the collection of pairs key-value that displays the data. In a relational model, we define columns for tables and rows contain the data. To store data in a table you must define the columns. A table in WATS and ADDB does not contain the schema that is to determine the columns required. In short, consider the table as a bag where you put the data you want.

Despite the fact that the conceptual table in both systems are containers to store data, between them there are a few differences:

By default, ADDB is a limit on the number of tables in 256 pieces (can be increased on request, in WATS, there are no restrictions. In WATS you can have any number of tables, given a limit on an account of storage (now 100 TB).
When creating a table in ADDB, you must specify the allocated bandwidth (number of reads and writes) that are not available in WATS. You can later using the ADDB API to change this capacity. At the end of the allocated bandwidth ADDB begins to limit requests (throttling).

the essence of the object: that defines the data in the table. Each entity (in WATS) and the object (item, ADDB) consists of one or more attributes. The attribute is a collection of pairs key-values (key-value-data in WATS). In relational databases this would be a string. Here each row in the table or domain do not have links with other lines. Each entity in WATS is unique identificireba two attributes: PartitionKey and RowKey — think of it as composite primary key. Each entity must have a unique combination of these attributes. In ADDB, each object is unique identificireba primary key, which is one of the attributes of the object. All objects in the table ADDB must have a primary key.

Between an entity and a object there are a few differences:

WATS the entity has a maximum of 256 attributes, in ADDB there are no restrictions. Each entity in WATS system has three attributes: PartitionKey, RowKey, and Timestamp, thus the number of user-defined attributes is reduced to 253. The values of attributes of PartitionKey and RowKey can be determined independently, the same Timestamp is determined by the system and contains the date and time (UTC) of creation or update of entity. The PartitionKey and RowKey attributes contain a String.
Maximum size of an entity in WATS billed in 1Mb, object ADDB – 64 KB.
IN WATS of value of attributes can have one of 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String that provides a rich data model. In ADDB the available set of types contains: String, Number and String/Number Sets (arrays of strings or numbers).
IN WATS the data is indexed only on PartitionKey and RowKey, indexing on other attributes is not yet available. Data in Windows Azure particioniranja the PartitionKey value, which necessitates a careful choice, as choosing the wrong value can significantly reduce performance. Great article you can read SDAs. In ADDB indexed data for the attributes that comprise the primary key of the table.

ADDB supports two types of primary keys:

Hash Type Primary Key: In this case the primary key consists of one attribute, hash. ADDB builds an unstructured hash-index attribute of this primary key.
Hash and Range Type Primary Key: In this case the primary key consists of two attributes. The first attribute is the hash attribute, and the second range-attribute. ADDB builds an unstructured hash index on the hash attribute and a sorted range index on the range attribute.

provisioned throughput

One of the most important functions in ADDB is the allocation of bandwidth, allowing you to customize the bandwidth needed for the application. In short – the allocation of bandwidth determines how many reads and writes per minute can be made to table ADDB. Based on your provided value-ADDS and allocates the appropriate resources, you can update the configuration on the fly using the API or the Amazon Management Console.

The allocation of bandwidth to operate with two terms — Read Capacity Units for read operations and Write Capacity Units for write operations.

Read Capacity Unit is defined as the number of operations consistent read per second in the unit of 1 KB. So, if you request 10 RCU, this means that you can perform a consistent read on 10 objects to 1 KB in size per second. If the object size more than 1 KB, the number of objects that you can read per second will be less. For example, if your objects have a size 1 and 2 KB, you can make only 5 transactions consistent read per second before the system will start to limit you.If you want is a consistent read a consistent eventually read (eventually consistent read), the capacity is usually doubled if you request 10 RCU, you will be able to do 20 operations agreed in the end reading on the objects to 1 KB and less.
Similarly, Write Capacity Unit – number of operations to read or write 1 KB. If the requested WCU 10, you can make a record of 10 objects to 1 KB in size per second. If the object size exceeds 1 KB, the number of objects for writes per second decreases. For example, if the size of objects between 1 and 2 KB, you can travel 5 write operations per second before the system will start to limit.

Please note that the allocation of bandwidth has especially in the issue of pricing, as the prices of ADDB is formed separately from the other services. In fact, you pay for read operations and write reserved you. At the time of writing, you would pay $0.01 / hour for every 10 unit of write capacity and $0.01 / hour for every 50 units of write capacity in the data center in US East (Virginia). In principle, pricing same as the pricing for instances of computing services, in which case you request the virtual machine of a certain size (with defined capacities and RAM) and the hourly pay for this virtual machine, regardless of whether you download it or not. Similarly, in ADDB you pay hourly for the bandwidth that you have requested from Amazon, regardless of the extent of its use.

When it comes to the allocation of bandwidth, there are several points to consider:

This is configured for each table. the

Minimum bandwidth – 5 RCU and WCU 5 per table for each table you pay the minimum $0.001 ($0.01 * 5 / 50) for consistent read operations and $0.005 for operations coordinated the recording for an hour, even if you do not use this table.
Increase or decrease the allocated bandwidth should at least 10% different from previous values – for example, if you have 100 read capacity units and you want to increase this value, the new value must be equal to or greater than 110.
When you increase or decrease bandwidth in a single request you can increase the value of the maximum half – for example, if you have 100 read capacity units, this value can be increased to a maximum of 200.
to Reduce the allocated bandwidth can be used once a day.
On the table, you can allocate a maximum of 10,000 read capacity units and 10,000 write capacity units (default). By default between all tables in your account — a maximum of 20,000 read capacity units and 20,000 write capacity units. These values can be increased by writing Amazon.

Price

Before we talk about the functionality provided by each system, let's look at pricing. In both systems there is no “capital” costs. To components included in the pricing are:

Transaction: WATS you pay for the number of transactions and their cost is fixed ($0.01 for 10 000 transactions). Thus, it appears that the calculation of the final price, multiply the number of transactions on their cost.

provisioned throughput: IN ADDB you pay for provisioned throughput at fixed prices for operations of reading and writing. To calculate the total price by multiplying the number of allocated RCU and WCU at the price per hour.

transfer: You pay for the amount of data transferred in and out of the system. At the time of writing this post, both systems provide free incoming bandwidth. Data transferred between ADDB and Amazon EC2 within a single region is free. Data transferred between ADDB and Amazon EC2 in different regions are paid according to the tariffs. In WATS you pay only for outgoing traffic.

Pricing in ADDB are more predictable than pricing in WATS, however, you must correctly calculate the necessary bandwidth to avoid paying for the extra queries or to cause limitation of the system.

the List of features
the the the the the the the the the

WATS	ADDB
Create Table/CreateTable	Yes	Da
Query Tables/ListTables	Yes	Da
Delete Table/DeleteTable	Yes	Da
UpdateTable	No	Da
DescribeTable	No	Da
CRUD on one entity/object	Yes	Da
CRUD operations on multiple entities/objects	Yes	Da
Query Entities/Query (Scan)	Yes	Da

Let us consider all the functions from the list.

the the

	WATS	ADDB
Create Table/CreateTable	Yes	Da

As the name suggests this function, it creates a table in WATS and ADDB. Unlike SimpleDB where the CreateDomain operation is an idempotent in ADDB it is not – if you try to create a table with the name of an existing table, the system will throw error.

There are several conventions for the naming of the table/domain, they are summarized in the table below.

the the the the

	WATS	ADDB
Minimum/maximum length	3/63	3/255
case Sensitivity	Mixed case	Mixed case
Allowed characters	Alphanumeric	Alphanumeric, hyphen (-), dash (_), period (.)

There are a few things:

the

WATS the name of the table cannot start with a digit, moreover, the case of the table names preserves the case in which they were created, but when using the register is unimportant. As mentioned above, by default you can create up to 256 tables per account ADDB. To increase this value, you can write a request to Amazon: (http://www.amazon.com/gp/html-forms-controller/DynamoDB_Limit_Increase_Form).

This operation in ADDB are asynchronous, whereas in WATS on the contrary – it is synchronous. When you receive ADDB query to create a table to create multiple processes (allocation of resources) and you are not allowed to use this table to complete all processes and translation tables in the Active state. the

When creating a table in ADDB, you must specify a primary key for this table and the necessary bandwidth, which can later be modified using the updatetable use (but primary key can not be changed).

	WATS	ADDB
Query Tables/ListTables	Yes	Da

returned

	WATS	ADDB
Maximum number of entries to the function call	1000
Return continuation token.

	WATS	ADDB
Delete Table/DeleteTable	Yes	Da

	WATS	ADDB
use the updatetable		Da

the New bandwidth value should be within limits and without breaking the rules (see above under “allocation of throughput capacity.”) the

table is in the Active state.

	WATS	ADDB
DescribeTable		Da

CreationDateTime: Dana creation in UNIX epoch time.

ItemCount: Number of objects in a table that is updated approximately every 6 hours, so changes may not immediately result in the update of this value.

KeySchema: Structure of the primary key (simple or composite).

ProvisionedThroughput: Bandwidth for the table, consisting of values LastIncreaseDateTime (if available), LastDecreaseDateTime (if applicable), ReadCapacityUnits and WriteCapacityUnits. If the throughput for the table has never changed, ADDB does not return values for those elements.

TableSizeBytes: the Overall size of the table in bytes. Amazon DynamoDB updates this value approximately every 6 hours, so changes may not immediately result in the update of this value.

TableStatus - Current state of the table (CREATING, ACTIVE, DELETING or UPDATING).

	WATS	ADDB
CRUD on one entity/object	Yes	Da

Limit of 256 attributes for an entity in WATS and lack of limits in ADDB. In WATS at 3 existing system attributes (PartitionKey, RowKey, and Timestamp) you can define up to 253 attributes.
IN WATS of value of attributes can be 8 types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, String. In ADDB: String, Number and String/Number Sets (arrays of strings or numbers).
Maximum size of object in ADDB 64 KB, WATS the essence can be up to 1 MB in size.

Create

PutItem

Insert Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, thrown an error.
Insert or Merge Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be merged with the new entity, i.e. values existing in both the entity attributes are updated, the same attributes that exist only in new entity, will be added, and the attributes exist only in old entity, will be left in the old state.
Insert or Replace Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be replaced by a new entity by deleting the old entity and creating a new entity with the specified values Partition and RowKey.

Read

Query Entities

Partition

RowKey

GetItem

PutItem: Operation PutItem creates an object or if the object with the specified primary key already exists, replaces it completely.
UpdateItem: If you want to change multiple attributes of an existing object instead of a complete replacement, you can use this functionality, providing a flexible control over the change attributes.

Merge Entity: If the entity with the specified values Partition and RowKey already exists, this entity will be merged with the new entity, i.e. values existing in both the entity attributes are updated, the same attributes that exist only in new entity, will be added, and the attributes exist only in old entity, will be left in the old state.
Update the Entity: the Operation replaces an existing entity to a new entity, deleting the old entity and creating a new one with the specified values Partition and RowKey.
Insert or Replace Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be replaced by a new entity by deleting the old entity and creating a new entity with the specified values Partition and RowKey.

Update according to the condition (Conditional Updates)

Delete Entity

Partition

RowKey

DeleteItem

idempotent

Removal according to the condition

	WATS	ADDB
CRUD operations for multiple entities/objects	Yes	Da

Entity Group Transactions

BatchWriteItem

BatchGetItem

WATS this is a limited transaction operation, i.e. the operation is either executed or not. In ADDB it is not. It is possible that some objects may not return, and in this case, ADDB will return a list of objects for which the operation failed and you can process them later.
you Cannot update the object, using the batchwriteitem APIs, only create and delete.
IN ADDB there are a limit of 25 objects in a single operation batchwriteitem APIs. In WATS within a single entity group transaction this limit is equal to 100.
batchwriteitem APIs in ADDB allows you to work with multiple tables in a single query, but entity group transaction in WATS requires both work within the same table, so that all entities in this operation have the same PartitionKey value.
BatchGetItem in ADDB provides consistent eventually reading. A consistent reading is not possible.

	WATS	ADDB
Query Entities/Query (Scan)	Yes	Da

WATS Query Entities is used to retrieve a list of entities, in ADDB for this using two functions Query and Scan. The difference between the calls to these functions is that for the Scan function does not need to transmit the primary key. As the data in ADDB indexed by primary key, the Query speed is much higher Scan that just scans the whole table. Query is only available on tables with a primary key of hash-and-range. the I think and could be wrong that if you use the function in ADDB Query, you can use the filter only on the attributes that comprise the primary key for the filter on other attributes must use a Scan. IN WATS for the filter, you specify a query using WCF ($filter).
Both systems are designed for high availability, delaying on timeout your queries and returning the results in parts. From the documentation it is not clear when the request will be delayed for timeout (in ADDB). WATS the delay to timeout the request after 5 seconds of execution.
IN WATS it is possible to get the request back if the service crosses the boundary value of the PartitionKey.
IN ADDB there are a limit of 1 MB on the size of the response, for example, the maximum response size may be 1 MB. If the query can return a larger dataset, it will return the result.
You can obtain all of the requested records or a part of the result or not to result, even with the availability of relevant data. When it returns the result or an empty set, regardless of the availability of data matching the query service will return a continuation token. Therefore, the development of logic processing of this continuation token in the application is very important.
Both systems allow to return all attributes or a part of all attributes in the query. In ADDB, you define for this option is “AttributesToGet”. In WATS, it can be implemented by using the names of the attributes that you want to return in the query $select e.g. $select=PartitionKey,RowKey,Attribute1,...

Summary

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express