Comparing Windows Azure Table Storage and Amazon DynamoDB

Hello.
I offer a translation of the first article of the cycle comparison of the services provided by Windows Azure and Amazon that is quite famous in the circles of the cloud specialist — Gaurav Mantri.

In this article, I compare Windows Azure Table Storage and Amazon DynamoDB – WATS and ADDB respectively.


From the point of view of functionality WATS and ADDB provide similar opportunities. Both of NoSQL systems designed to store large amounts of data. Amazon also has another database NoSQL SimpleDB.

An important point to be noted is the fact that ADDB is not just a NoSQL database. This database service. Yes, it is true that it is used to manage data, but you control the scalability of the system using the bandwidth that you need. In this sense this is very similar to instances of the compute service Amazon or Windows Azure. In the case of the instance of the compute service you choose what instance size you need and the system responds to the request. Similarly in case of ADDB – you tell the system how many reads and write will produce your application in the table of ADDB and ADDB provides the necessary power.

Conceptually both systems are similar:
    the
  1. Both systems – non-relational NoSQL.
  2. the
  3. Both systems in General are repositories of the records of key-value.
  4. the
  5. does not support relations, which are available in a relational database.
  6. the
  7. Implicit support for high availability and flexibility.
  8. the
  9. Both systems provide a REST API for working with queues and messages and other libraries of high-level languages, which are usually wrappers implementing REST API. In both systems, each release has its own API version is expressed in the date. At the time of writing these versions is equal to: WATS — 2011-08-18, ADDB — 2011-12-05.

Of course, there are some major differences:
    the
  1. IN ADDB bandwidth that you need, this is what stands out to you when you start working with the system, in the case WATS the bandwidth is controlled by the system. Therefore, the system ADDB is more flexible, but requires a more “thorough” work.
  2. the
  3. In contrast to SimpleDB, where the domain is a limit of 10 GB, ADDB does not have this limitation – you can store as many data. WATS also puts hard limits on the data in the table, but you are limited by the size of the account of storage (now 100 TB).
  4. the
  5. ADDB can according to your desire to index your data, in contrast to WATS. Technically WATS also indexes your data, but only on certain attributes (PartitionKey, RowKey), and that is the ability to have secondary indexes in WATS is one of the most-requested functions.

Concept

Table:
when we think about the table, the first thing that comes to mind is the assertion that it is “something that consists of rows and columns”. A table in WATS and ADDB may look like we imagine, but in fact it is not. Consider a table as a container that contains the collection of pairs key-value that displays the data. In a relational model, we define columns for tables and rows contain the data. To store data in a table you must define the columns. A table in WATS and ADDB does not contain the schema that is to determine the columns required. In short, consider the table as a bag where you put the data you want.

Despite the fact that the conceptual table in both systems are containers to store data, between them there are a few differences:
    the
  1. By default, ADDB is a limit on the number of tables in 256 pieces (can be increased on request, in WATS, there are no restrictions. In WATS you can have any number of tables, given a limit on an account of storage (now 100 TB).
  2. the
  3. When creating a table in ADDB, you must specify the allocated bandwidth (number of reads and writes) that are not available in WATS. You can later using the ADDB API to change this capacity. At the end of the allocated bandwidth ADDB begins to limit requests (throttling).

the essence of the object: that defines the data in the table. Each entity (in WATS) and the object (item, ADDB) consists of one or more attributes. The attribute is a collection of pairs key-values (key-value-data in WATS). In relational databases this would be a string. Here each row in the table or domain do not have links with other lines. Each entity in WATS is unique identificireba two attributes: PartitionKey and RowKey — think of it as composite primary key. Each entity must have a unique combination of these attributes. In ADDB, each object is unique identificireba primary key, which is one of the attributes of the object. All objects in the table ADDB must have a primary key.

Between an entity and a object there are a few differences:
    the
  1. WATS the entity has a maximum of 256 attributes, in ADDB there are no restrictions. Each entity in WATS system has three attributes: PartitionKey, RowKey, and Timestamp, thus the number of user-defined attributes is reduced to 253. The values of attributes of PartitionKey and RowKey can be determined independently, the same Timestamp is determined by the system and contains the date and time (UTC) of creation or update of entity. The PartitionKey and RowKey attributes contain a String.
  2. the
  3. Maximum size of an entity in WATS billed in 1Mb, object ADDB – 64 KB.
  4. the
  5. IN WATS of value of attributes can have one of 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String that provides a rich data model. In ADDB the available set of types contains: String, Number and String/Number Sets (arrays of strings or numbers).
  6. the
  7. IN WATS the data is indexed only on PartitionKey and RowKey, indexing on other attributes is not yet available. Data in Windows Azure particioniranja the PartitionKey value, which necessitates a careful choice, as choosing the wrong value can significantly reduce performance. Great article you can read SDAs. In ADDB indexed data for the attributes that comprise the primary key of the table.

ADDB supports two types of primary keys:
    the
  1. Hash Type Primary Key: In this case the primary key consists of one attribute, hash. ADDB builds an unstructured hash-index attribute of this primary key.
  2. the
  3. Hash and Range Type Primary Key: In this case the primary key consists of two attributes. The first attribute is the hash attribute, and the second range-attribute. ADDB builds an unstructured hash index on the hash attribute and a sorted range index on the range attribute.

provisioned throughput

One of the most important functions in ADDB is the allocation of bandwidth, allowing you to customize the bandwidth needed for the application. In short – the allocation of bandwidth determines how many reads and writes per minute can be made to table ADDB. Based on your provided value-ADDS and allocates the appropriate resources, you can update the configuration on the fly using the API or the Amazon Management Console.

The allocation of bandwidth to operate with two terms — Read Capacity Units for read operations and Write Capacity Units for write operations.

Read Capacity Unit is defined as the number of operations consistent read per second in the unit of 1 KB. So, if you request 10 RCU, this means that you can perform a consistent read on 10 objects to 1 KB in size per second. If the object size more than 1 KB, the number of objects that you can read per second will be less. For example, if your objects have a size 1 and 2 KB, you can make only 5 transactions consistent read per second before the system will start to limit you.If you want is a consistent read a consistent eventually read (eventually consistent read), the capacity is usually doubled if you request 10 RCU, you will be able to do 20 operations agreed in the end reading on the objects to 1 KB and less.
Similarly, Write Capacity Unit – number of operations to read or write 1 KB. If the requested WCU 10, you can make a record of 10 objects to 1 KB in size per second. If the object size exceeds 1 KB, the number of objects for writes per second decreases. For example, if the size of objects between 1 and 2 KB, you can travel 5 write operations per second before the system will start to limit.

Please note that the allocation of bandwidth has especially in the issue of pricing, as the prices of ADDB is formed separately from the other services. In fact, you pay for read operations and write reserved you. At the time of writing, you would pay $0.01 / hour for every 10 unit of write capacity and $0.01 / hour for every 50 units of write capacity in the data center in US East (Virginia). In principle, pricing same as the pricing for instances of computing services, in which case you request the virtual machine of a certain size (with defined capacities and RAM) and the hourly pay for this virtual machine, regardless of whether you download it or not. Similarly, in ADDB you pay hourly for the bandwidth that you have requested from Amazon, regardless of the extent of its use.

When it comes to the allocation of bandwidth, there are several points to consider:

    This is configured for each table. the

  1. Minimum bandwidth – 5 RCU and WCU 5 per table for each table you pay the minimum $0.001 ($0.01 * 5 / 50) for consistent read operations and $0.005 for operations coordinated the recording for an hour, even if you do not use this table.
  2. the
  3. Increase or decrease the allocated bandwidth should at least 10% different from previous values – for example, if you have 100 read capacity units and you want to increase this value, the new value must be equal to or greater than 110.
  4. the
  5. When you increase or decrease bandwidth in a single request you can increase the value of the maximum half – for example, if you have 100 read capacity units, this value can be increased to a maximum of 200.
  6. the
  7. to Reduce the allocated bandwidth can be used once a day.
  8. the
  9. On the table, you can allocate a maximum of 10,000 read capacity units and 10,000 write capacity units (default). By default between all tables in your account — a maximum of 20,000 read capacity units and 20,000 write capacity units. These values can be increased by writing Amazon.

Price

Before we talk about the functionality provided by each system, let's look at pricing. In both systems there is no “capital” costs. To components included in the pricing are:

    Transaction: WATS you pay for the number of transactions and their cost is fixed ($0.01 for 10 000 transactions). Thus, it appears that the calculation of the final price, multiply the number of transactions on their cost.

    provisioned throughput: IN ADDB you pay for provisioned throughput at fixed prices for operations of reading and writing. To calculate the total price by multiplying the number of allocated RCU and WCU at the price per hour.

    transfer: You pay for the amount of data transferred in and out of the system. At the time of writing this post, both systems provide free incoming bandwidth. Data transferred between ADDB and Amazon EC2 within a single region is free. Data transferred between ADDB and Amazon EC2 in different regions are paid according to the tariffs. In WATS you pay only for outgoing traffic.


Pricing in ADDB are more predictable than pricing in WATS, however, you must correctly calculate the necessary bandwidth to avoid paying for the extra queries or to cause limitation of the system.

the List of features
the the the the the the the the the
WATS
ADDB
Create Table/CreateTable
Yes
Da
Query Tables/ListTables
Yes
Da
Delete Table/DeleteTable
Yes
Da
UpdateTable
No
Da
DescribeTable
No
Da
CRUD on one entity/object
Yes
Da
CRUD operations on multiple entities/objects
Yes
Da
Query Entities/Query (Scan)
Yes
Da

Let us consider all the functions from the list.

the the
WATS
ADDB
Create Table/CreateTable
Yes
Da

As the name suggests this function, it creates a table in WATS and ADDB. Unlike SimpleDB where the CreateDomain operation is an idempotent in ADDB it is not – if you try to create a table with the name of an existing table, the system will throw error.

There are several conventions for the naming of the table/domain, they are summarized in the table below.

the the the the
WATS
ADDB
Minimum/maximum length
3/63
3/255
case Sensitivity
Mixed case
Mixed case
Allowed characters
Alphanumeric
Alphanumeric, hyphen (-), dash (_), period (.)


There are a few things:

the
    the
  • WATS the name of the table cannot start with a digit, moreover, the case of the table names preserves the case in which they were created, but when using the register is unimportant. As mentioned above, by default you can create up to 256 tables per account ADDB. To increase this value, you can write a request to Amazon: (http://www.amazon.com/gp/html-forms-controller/DynamoDB_Limit_Increase_Form).
  • This operation in ADDB are asynchronous, whereas in WATS on the contrary – it is synchronous. When you receive ADDB query to create a table to create multiple processes (allocation of resources) and you are not allowed to use this table to complete all processes and translation tables in the Active state. the

  • When creating a table in ADDB, you must specify a primary key for this table and the necessary bandwidth, which can later be modified using the updatetable use (but primary key can not be changed).
  • the the
    WATS
    ADDB
    Query Tables/ListTables
    Yes
    Da

    The function returns a list of tables. One request function returns up to 1000 tables in WATS and all tables in ADDB, if there are still tables or domains also returned continuation token that allows to access the next set of tables or domains.

    the the the
    WATS
    ADDB
    Maximum number of entries to the function call
    1000

    Return continuation token.



    the the
    WATS
    ADDB
    Delete Table/DeleteTable
    Yes
    Da

    The function removes the table. In ADDB is not idempotent.
    To delete a table in ADDB table must be in the Active state. This operation in ADDB are asynchronous. In WATS, despite the fact that it seems that it is synchronous, it is also asynchronous. When sending a request to delete a table in WATS, the table is marked with the system for removal and inaccessible and can only be removed in the garbage collection process, so now the table can vary depending on the size of the data in this table. In my experience, deleting a very large table can take hours. At this time, the attempt to create table with the same name will result in an error (Conflict error – HTTP Status Code 409).

    the the
    WATS
    ADDB
    use the updatetable

    Da

    Function is used to update the provisioned throughput of a table in ADDB. You can increase or decrease the provisioned throughput:

    the

      the New bandwidth value should be within limits and without breaking the rules (see above under “allocation of throughput capacity.”) the

    • table is in the Active state.


    the the
    WATS
    ADDB
    DescribeTable

    Da

    The DescribeTable function is used to obtain the following information about the table:
    the

      CreationDateTime: Dana creation in UNIX epoch time.

      ItemCount: Number of objects in a table that is updated approximately every 6 hours, so changes may not immediately result in the update of this value.

      KeySchema: Structure of the primary key (simple or composite).

      ProvisionedThroughput: Bandwidth for the table, consisting of values LastIncreaseDateTime (if available), LastDecreaseDateTime (if applicable), ReadCapacityUnits and WriteCapacityUnits. If the throughput for the table has never changed, ADDB does not return values for those elements.

      TableSizeBytes: the Overall size of the table in bytes. Amazon DynamoDB updates this value approximately every 6 hours, so changes may not immediately result in the update of this value.

      TableStatus - Current state of the table (CREATING, ACTIVE, DELETING or UPDATING).



    Please note that the results of this operation is eventually consistent, so it is not guaranteed that you will receive the latest updates.

    the the
    WATS
    ADDB
    CRUD on one entity/object
    Yes
    Da

    Both systems allow you to perform Create, Read, Update, Delete (CRUD) operations on a single entity/object.
    Things to remember:
    the
      the
    • Limit of 256 attributes for an entity in WATS and lack of limits in ADDB. In WATS at 3 existing system attributes (PartitionKey, RowKey, and Timestamp) you can define up to 253 attributes.
    • the
    • IN WATS of value of attributes can be 8 types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, String. In ADDB: String, Number and String/Number Sets (arrays of strings or numbers).
    • the
    • Maximum size of object in ADDB 64 KB, WATS the essence can be up to 1 MB in size.


    Create


    In WATS for create you can use several operations ADDB they obyedinenie in one function (PutItem). Operation PutItem creates an object or if the table contains object with the specified primary key, the object is completely replaced. In WATS, there are three functions for creating entity:

      the
    1. Insert Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, thrown an error.
    2. the
    3. Insert or Merge Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be merged with the new entity, i.e. values existing in both the entity attributes are updated, the same attributes that exist only in new entity, will be added, and the attributes exist only in old entity, will be left in the old state.
    4. the
    5. Insert or Replace Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be replaced by a new entity by deleting the old entity and creating a new entity with the specified values Partition and RowKey.


    Read


    In both systems the read operations are to query the attributes of an entity/object. In WATS it is implemented through Query Entities and the transfer Partition and RowKey as arguments. In ADDB it is implemented through GetItem and pass as argument the primary key object.

    Please note that by default, the GetItem operation makes consistent, ultimately, reading. However, you can specify this feature to implement a consistent reading using the optional parameter ConsistentRead.



    In WATS, there are several ways to update entities, in ADDB, only two:

      the
    1. PutItem: Operation PutItem creates an object or if the object with the specified primary key already exists, replaces it completely.
    2. the
    3. UpdateItem: If you want to change multiple attributes of an existing object instead of a complete replacement, you can use this functionality, providing a flexible control over the change attributes.


    WATS is available in four functions to update the entity:

      the
    1. Merge Entity: If the entity with the specified values Partition and RowKey already exists, this entity will be merged with the new entity, i.e. values existing in both the entity attributes are updated, the same attributes that exist only in new entity, will be added, and the attributes exist only in old entity, will be left in the old state.
    2. the
    3. Update the Entity: the Operation replaces an existing entity to a new entity, deleting the old entity and creating a new one with the specified values Partition and RowKey.
    4. the
    5. Insert or Replace Entity: Creates a new entity in the table. If the entity with the specified values Partition and RowKey already exists, this entity will be replaced by a new entity by deleting the old entity and creating a new entity with the specified values Partition and RowKey.


    the Update according to the condition (Conditional Updates): Both systems support the upgrade according to the condition, but these mechanisms operate differently. In ADDB, you determine conditions on the values of existing attributes, that is, determine what ADDB will update the value атрибута1 only if the value of another attribute attribute2 equal to some value. Update according to a condition in ADDB supported check of existence of attribute. In WATS all different. In WATS all depends on the value of the ETag of the entity. To update the entity according to the condition you need to provide the ETag of entity in one of the request headers (when using REST API) then WATS then compares this value with the current ETag value of the entity is updatable and the update is committed only if these values coincide.



    To delete an entity in WATS, you can use Delete Entity passing Partition and RowKey of this entity as input arguments. Similarly, removal of object in ADDB, you use DeleteItem transfer the primary key of this object as input argument.

    DeleteAttributes in ADDB idempotent, that is, if you are trying to remove a non-existent entity, ADDB will not throw an error as long as you don't use the delete based on the conditions. If you do the removal according to the condition in ADDB operation is not idempotent. In WATS when trying to remove a non-existent entity will be thrown an error (NotFound error – HTTP Status Code 404) .

    the Removal according to the condition: Both systems support the removal according to the condition, but these mechanisms operate differently. In ADDB, you determine conditions on the values of existing attributes, that is, determine what ADDB will remove the object only if the attribute value attribute2 equal to some value. Removal according to the condition in ADDB supported check of existence of attribute. In WATS all different. In WATS all depends on the value of the ETag of the entity. To update the entity according to the condition you need to provide the ETag of entity in one of the request headers (when using REST API) then WATS then compares this value with the current ETag value of the entity to be deleted and the deletion is performed only if the values match.
    the the
    WATS
    ADDB
    CRUD operations for multiple entities/objects
    Yes
    Da

    Both systems support the implementation of the CRUD operations for multiple entities/objects within a single service call.

    In WATS for the CRUD operations you can use Entity Group Transactions. In ADDB, you can also use BatchWriteItem. You can also use BatchGetItem to read multiple objects from multiple tables using primary keys.

    Comments:

    the
      the
    • WATS this is a limited transaction operation, i.e. the operation is either executed or not. In ADDB it is not. It is possible that some objects may not return, and in this case, ADDB will return a list of objects for which the operation failed and you can process them later.
    • the
    • you Cannot update the object, using the batchwriteitem APIs, only create and delete.
    • the
    • IN ADDB there are a limit of 25 objects in a single operation batchwriteitem APIs. In WATS within a single entity group transaction this limit is equal to 100.
    • the
    • batchwriteitem APIs in ADDB allows you to work with multiple tables in a single query, but entity group transaction in WATS requires both work within the same table, so that all entities in this operation have the same PartitionKey value.
    • the
    • BatchGetItem in ADDB provides consistent eventually reading. A consistent reading is not possible.

    the the
    WATS
    ADDB
    Query Entities/Query (Scan)
    Yes
    Da

    Is used to retrieve one or more entities/objects from tables based on criteria.

    Comments:

    the
      the
    • WATS Query Entities is used to retrieve a list of entities, in ADDB for this using two functions Query and Scan. The difference between the calls to these functions is that for the Scan function does not need to transmit the primary key. As the data in ADDB indexed by primary key, the Query speed is much higher Scan that just scans the whole table. Query is only available on tables with a primary key of hash-and-range. the I think and could be wrong that if you use the function in ADDB Query, you can use the filter only on the attributes that comprise the primary key for the filter on other attributes must use a Scan. IN WATS for the filter, you specify a query using WCF ($filter).
    • the
    • Both systems are designed for high availability, delaying on timeout your queries and returning the results in parts. From the documentation it is not clear when the request will be delayed for timeout (in ADDB). WATS the delay to timeout the request after 5 seconds of execution.
    • the
    • IN WATS it is possible to get the request back if the service crosses the boundary value of the PartitionKey.
    • the
    • IN ADDB there are a limit of 1 MB on the size of the response, for example, the maximum response size may be 1 MB. If the query can return a larger dataset, it will return the result.
    • the
    • You can obtain all of the requested records or a part of the result or not to result, even with the availability of relevant data. When it returns the result or an empty set, regardless of the availability of data matching the query service will return a continuation token. Therefore, the development of logic processing of this continuation token in the application is very important.
    • the
    • Both systems allow to return all attributes or a part of all attributes in the query. In ADDB, you define for this option is “AttributesToGet”. In WATS, it can be implemented by using the names of the attributes that you want to return in the query $select e.g. $select=PartitionKey,RowKey,Attribute1,...

    Summary


    Summing up – both systems are comparable in functionality. There are some differences in functionality and the developer keep them in mind during the development process and planning, it is possible to create a system that will use both services, maybe even with integration. Each system has its advantages and disadvantages and we must use these advantages and disadvantages to decide which system better suits our needs.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Custom database queries in MODx Revolution

Parse URL in Zend Framework 2