Testing Amazon SQS

In a network there are already several reviews of the performance of this solution from Amazon, in this article, I did not pursue the goal of validation of the obtained results, I was interested in some features not covered in other sources, namely:
    the
  1. in the documentation it is said that Amazon tries to keep the order of the posts, how well will it last?
  2. the
  3. how quickly does receive a message when using Long Polling?
  4. the
  5. as far as speeds up the process of batch processing?




problem Statement


The support library for AWS in erlang is erlcloud [1], to initialize the library, you simply call methods start and configure, as indicated on github. My posts will contain a set of random characters generated by the following function:

the
random_string(0) -> [];
random_string(Length) -> [random_char() | random_string(Length-1)].
random_char () - > random:uniform(95) + 31 .


to measure speed we use the famous function using timer:tc, but with some changes:

the
test_avg(M, F, A, R, N) when N > 0 ->
{Ret, L} = test_loop(M, F, A, R, N, []),
Length = length(L),
Min = lists:min(L),
Max = lists:max(L),
Med = lists:nth(round((Length / 2)), lists:sort(L)),
Avg = round(lists:foldl(fun(X, Sum) - > X + Sum end, 0, L) / Length),
io:format("Range: ~b - ~b mics~n"
"Median: ~b mics~n"
"Average: ~b mics~n",
[Min, Max, Med, Avg]),
Ret.

test_loop(_M, _F, _A, R, 0, List) ->
{R, List};
test_loop(M, F, A, R, N, List) ->
{T, Result} = timer:tc(M, F, [R|A]),
test_loop(M, F, A, Result, N - 1, [T|List]).


the changes relate to the calling test functions in this version I added the argument R, which allows you to use the value returned on a previous run, this is necessary in order to generate the message numbers and collect additional information on the mixing upon receipt of the message. Therefore, the function of sending a message with number will look as follows:

the
send_random(N, Queue) ->
erlcloud_sqs:send_message(Queue, [N + 1 | random_string(6000 + random:uniform(6000))]),
N + 1 .


And her challenge with the collection of statistics:

the
test_avg(?MODULE send_random, [QueueName], 31, 20)


here 31 — the number of the first message, the number is not chosen randomly, the fact that erlang is not too well distinguish between sequences of numbers and strings, and in the message it will be a symbol number 31, smaller numbers can be passed to SQS, but contiguous ranges are small (#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF], details [2]) and at the exit from the allowed range you will get an exception. Therefore, the function send_random generates and sends a message to the queue named Queue, the beginning of which is the number identifying the number, the function returns the number of the next number, which is used then the following generation function. Function test_avg takes QueueName, which becomes the second argument of the function send_random, the first argument number and the number of repetitions.

A function that will receive messages and check their order will look as follows:

the
checkorder(N, []) -> N;
checkorder(N, [H | T]) ->
[{body, [M | _]}|_] = H,
K = if M > N> M;
true - > io:format("Wrong ~b less than ~b~n", [M, N]),
N
end
checkorder(K, T).

receive_checkorder(LastN, Queue) ->
[{messages, List} | _] = erlcloud_sqs:receive_message(Queue),
remove_list(Queue, List),
checkorder(LastN, List).


Deleting messages:

the
remove_msg(_, []) -> wrong;
remove_msg(Q, [{receipt_handle, Handle} | _]) -> erlcloud_sqs:delete_message(Q, Handle);
remove_msg(Q, [_ | T]) -> remove_msg(Q, T).

remove_list (_, []) - > ok;
remove_list(Q, [H | T]) -> remove_msg(Q, H), remove_list(Q, T).


in the list that is passed to delete contains a lot of unnecessary information (message body, etc.), delete function finds receipt_handle, which is required for formation of the request or returns wrong if receipt_handle not found

Mixing messages


Looking ahead, I can say that even on a small number of messages, the stirring was quite substantial and there was an additional objective: to assess the degree of mixing. Unfortunately a good criteria could not be found and it was decided to display the maximum and average deviation from the correct position. Knowing the size of this window, you can restore the order of messages upon receipt, of course, deteriorating the processing speed.
To calculate this difference is enough to change only one function of checking the order of messages:

the
checkorder(N, []) -> N;
checkorder({N, Cnt, Sum, Max}, [H | T]) ->
[{body, [M | _]}|_] = H,
{N1, Cnt1, Sum1, Max1} = if M < N>
{N, Cnt + 1, Sum + N - M, if Max < N - M> N - M; true - > Max end };
true -> {M, Cnt, Sum, Max}
end
checkorder({N1, Cnt1, Sum1, Max1}, T).


call function run-series will look like the following:

the
{_, Cnt, Sum, Max} = test_avg(?MODULE receive_checkorder, [QueueName], {0, 0, 0, 0}, Size)


I get the number of elements that came later than you need the sum of their distances from the largest of the received elements and the maximum displacement. Most interesting to me here is the maximum displacement, the other characteristics can be called controversial, and they're probably not too successfully calculated (for example, if one item is read before all the elements that should go before it will be considered misplaced in this case). To results:

the the the
Size (PCs) 20 50 100 150 200 250 300 400 500 600 700 800 900 1000
the Maximum offset (PC) 11 32 66 93 65 139 184 155 251 241 218 249 359 227
Secondary offset (PCs) 5.3 10.5 23.9 43 25.6 45.9 48.4 65.6 74.2 74.2 78.3 72.3 110.8 82.8


The first line is the number of messages in the queue, and the second maximum displacement, and the third is the mean offset.

The results surprised me, the message isn't just mixed, it simply has no boundaries, that is, with the increase of the number of messages we need to increase the size of the viewable window. The same as a graph:



Long Polling


As I wrote, Amazon SQS does not support subscriptions, you can use Amazon SNS, but if you need a quick queue with multiple processors is not suitable, in order not to pull method of receiving messages Amazon implemented Long Polling, which allows you to hang, waiting for messages to twenty seconds, and since SQS is charged by the number of called methods that would significantly reduce the cost of queue, but what is the problem: for a small number of messages (according to the official documentation) would not return anything. This behavior is critical for queues that require you to react quickly to an event, and generally speaking, if this happens often then Long Polling does not make much sense, as it becomes equivalent to periodic surveys with a reaction time of SQS.

To test, create two processes, one of which will be at random times, send messages, and the second to reside in Long Polling, the moments of sending and receiving messages will be stored for later comparison. In order to enable this mode, set the Receive Message Wait Time = 20 seconds in the queue settings.

the
send_sleep(L, Queue) ->
timer:sleep(random:uniform(10000)),
Call = erlang:now(),
erlcloud_sqs:send_message(Queue, random_string(6000 + random:uniform(6000))),
[Call | L].


this function sleeps for a random number of milliseconds, and then remembers the moment and sends the message

the
remember_moment(L, []) -> L;
remember_moment(L, [ _ | _ ]) - > [erlang:now() | L].

receive_polling(L, Queue) ->
[{messages, List} | _] = erlcloud_sqs:receive_message(Queue),
remove_list(Queue, List),
remember_moment(L, List).


these two functions allow you to retrieve messages and to remember the moments in which it happened. After the simultaneous performance of these functions by means of the spawn I get two lists, the difference between which shows the response time to a message. It fails to recognize that the messages can get mixed up, the whole thing was just to further increase reaction time.

Let's see what happened:

the the the the
Interval falling asleep 10000 7500 5000 2500
Minimum time (sec) 0.28 0.27 0.66
Maximum time (sec) 10.25 7.8 5.36 5.53
Average time (sec) 1.87 1.87 1.84 1.88


the first line is the value exhibited the maximum delay of the sending process. That is: 10 seconds, 7.5 seconds... the Rest of the line — minimum, maximum, and average waiting time the message was received.

The same as a graph:



The average time turned out the same in every case, we can say that, on average, between the sending of single messages until their receipt is two seconds. Long enough. In this test, the sample was quite small, 20, this min-max values is rather a matter of luck, rather than some kind of dependence.

Batch upload


First check how important the effect of “warming up” of the queue when sending messages:

the the the the
Number of records 20 50 100 150 200 250 300 400 500 600 700 800 900 1000
Minimum time (sec) 0.1 0.1 0.1 0.09 0.09 0.09 0.09 0.1 0.09 0.1 0.1 0.09 0.09 0.09
Maximum time (sec) 0.19 0.37 0.41 0.41 0.37 0.38 0.37 0.43 0.39 0.66 0.74 0.48 0.53 0.77
Average time (sec) 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12


The same as a graph:



we can say that no warming is not observed, that is, the queue behaves about the same on these volumes of data, only the maximum somehow increased, but the average and minimum remain in place.
Same for reading

the the the the
Number of records 20 50 100 150 200 250 300 400 500 600 700 800 900 1000
Minimum time (sec) 0.001 0.14 0 0.135 0 0.135 0 0 0 0 0 0 0 0
Maximum time (sec) 0.72 0.47 0.65 0.65 0.69 0.51 0.75 0.75 0.76 0.73 0.82 0.79 0.74 0.91
Average time (sec) 0.23 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.2 0.2 0.2 0.2 0.21




There is also no saturation, the average in the area of 200ms. Sometimes the reading was immediate (faster than 1ms), but this means that the message has not been received, according to the documentation, the server SQS can do that, you just need to request the message again.

Proceed directly to block and multi-threaded testing

Unfortunately, the erlcloud library contains functions for batch messaging, but such features are not difficult to implement at existing in functions sending a message you need to change the query to the following:

the
Doc = sqs_xml_request(Config QueueName, "SendMessageBatch",
encode_message_list(Messages, 1)),


and to finish the function of a query:

the
encode_message_list ([],_) - > [];
encode_message_list([H | T], N) ->
MesssageId = string:concat("SendMessageBatchRequestEntry.", integer_to_list(N)),
[{string:concat(MesssageId, ".Id"), integer_to_list(N)}, {string:concat(MesssageId, ".MessageBody"), H} | encode_message_list(T, N + 1)].


the library should also fix the version of the API for example on 2011-10-01, otherwise Amazon will return Bad request in response to your requests.

test functions similar to those used in other tests:

the
gen_messages(0) -> [];
gen_messages(N) - > [random_string(5000 + random:uniform(1000)) | gen_messages(N - 1)].

send_batch(N, Queue) ->
erlang:display(erlcloud_sqs:send_message_batch(Queue, gen_messages(10))),
N + 1 .


Here only had to change the message length so that the entire package fit into 64KB, otherwise an exception is thrown.
The following data were obtained for the record:

the the the the
Number of threads 0 1 2 4 5 10 20 50 100
Maximum delay (sec) 0.452 0.761 0.858 1.464 1.698 3.14 5.272 11.793 20.215
Average latency (sec) 0.118 0.48 0.436 0.652 0.784 1.524 3.178 9.1 19.889
Time per message (sec) 0.118 0.048 0.022 0.017 0.016 0.016 0.017 0.019 0.02


here 0 means read one in 1 stream, followed by 1 reading 10 in 1 torrent 10 2 torrent 10 4 threads and so on

To read:

the the the the
Number of threads 0 1 2 4 5 10 20 50 100
Maximum delay (sec) 0.762 2.998 2.511 2.4 2.606 2.751 4.944 11.653 18.517
Average latency (sec) 0.205 1.256 1.528 1.566 1.532 1.87 3.377 7.823 17.786
Time per message (sec) 0.205 0.126 0.077 0.04 0.031 0.02 0.019 0.017 0.019


the graphs showing the throughput for read and write (messages per second)



Blue color — recording, red — read.

From these data we can conclude that the maximum throughput is achieved for district 10 threads and read about 50, with a further increase of the number of threads the given number of messages per unit time is not increased.

Conclusions


It turns out that Amazon SQS significantly changes the order of messages is not a very good response time and throughput, to counter this, you can only reliability and small (in the case of a small number of messages) fee. That is, if you don't speed critical, no matter what the message mixed up and you don't want to administer or hire a server administrator queues — it's your choice.

Links


    the
  1. Erlcloud on github github.com/gleber/erlcloud
  2. the
  3. www.w3.org/TR/REC-xml/#charsets
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Custom database queries in MODx Revolution

2 years Kartavykh reviews — the story of an Amateur show Old-Hard