Aspera and Concurrency

by John Heaton

Concurrency

Depending on the particular workflow for an Aspera system, the need to optimize several aspects of the system is necessary to support certain concurrency loads. Systems that have to support 5 concurrent transfers need to have vastly different hardware and software configurations when compared to systems needing to support 50 concurrent connections. Below will be a guide to how to size hardware, configure the software, and other factors that need to be considered.

Network

From a network perspective, concurrency is a simple equation. The total number of flows have to split the available bandwidth equally, less congestion. Put another way: if there is a desire to support 50 users with a minimum bandwidth per transfer at 2Mb/s, a minimum of 100Mb/s is required for the available bandwidth. While simplistic, it does set a framework for how to determine the minimum bandwidth requirements to support business processes that require fast file transfer.

Hardware

This advice is going to be generic. A lot of factors go into correctly sizing a system for a given workload, including simple preference, but the following information should always hold:

  • More processors: More processors or cores will correspond to the ability to support more concurrent transfers. While a single core can support multiple transfers, the CPU will eventually reach a point where it cannot handle all transferevents. A system that supports multiple processors running simultaneously will scale well in high concurrency scenarios. Many x86_64 processors can support 2-8 cores, and newer UltraSPARC T2 systems can support 32-128 simultaneous executions.
  • Optimize Storage for IOPS: Each storage system has multiple requirements, from raw size and data availability to pure read and write performance. Modern storage systems are also making use of solid state "disks" that have vastly different performance characteristics than their spinning platter relatives. Then there is the issue of how that storage is accessed: direct attached or via the network, like a NAS. To support concurrency, a simple way to consider storage is the number of IOPS (I/O Operations Per Second) that can be supported. Systems that can support high IOPS typically can perform well in high concurrency read, write, or mixed read/write loads. Many high IOPS capable systems achieve performance through massive parallelization; this parallelization can also make handling concurrent workloads easier. As always, a benchmarking tool like iozone should always be used to truly know the capabilities of the storage.
  • Make Crypto easier: While not concurrency related, it should be noted: the cryptographic operations used byfasp™ can be quite intensive on the CPU. The newer Xeon processors with the AES-NI instruction set will allo the processor to handle more by offloading crypto operations.

The OS

Each Operating System is different, and one system that requires additional configuration is Windows. Please see the following KB article on setting the Windows Heap Sizing for Concurrency

Knowing and Tuning Aspera

Basic info

  • The sender side is very lightweight. The focus on tuning for concurrency happens on the receive side.
    • The most CPU intensive operations on the sender side isencryption
  • The lion-share of load happens on the receive side
    • Along with decryption, memory and CPU are used in rate control and file operations
    • Once memory is allocated the ascp processes will not grow
    • Tuning for concurrency will deal with, in part, setting correct sizes for buffers (file, ring and socket)
    • The other component to tune is the type and extent of congestion control used.

Aspera Configurations

1-2 concurrent transfers, high speed

In this example, this shows a baseline low concurrency, high speed transfer configuration.

Settings
  • Socket buffers – recommended setting: 20MB
  • Application level buffers – recommended setting: 48 units
  • Application level file cache – recommended setting: max size 128 MB
  • Disk read/write buffer – large if iozonesays it's better (1-2MB, not more)
  • Storage rate control – recommended setting: adaptive

How to set it in aspera.conf

... 
<default>
<transfer>
<protocol_options>
<max_sock_buffer>20971520</max_sock_buffer>
<min_sock_buffer>20971520</min_sock_buffer>
</protocol_options>
</transfer>
<file_system>
<read_block_size>1048576</read_block_size>
<write_block_size>1048576</write_block_size>
<use_file_cache>yes</use_file_cache>
<max_file_cache_buffer>134217728</max_file_cache_buffer>
<ring_buf_units>48</ring_buf_units>
<storage_rc>
<adaptive>true</adaptive>
</storage_rc>
</file_system>
...
</default>
...

10-20 concurrent transfers, 45Mb/s

The following configuration shows how to set the transfers to work with 10-20 concurrent connections.

Settings
  • Socket buffers – recommended setting: 5MB
  • Application level buffers – recommended setting: 12 units
  • Application level file cache – recommended setting: max size 16 MB
  • Storage rate control – recommended setting: adaptive
  • Windows OS desktop heap – recommended setting: 192 MB total, 256K per process (see References)

How to set it in aspera.conf

... 
<default>
<transfer>
<protocol_options>
<max_sock_buffer>5592064</max_sock_buffer>
</protocol_options>
</transfer>
<file_system>
<use_file_cache>yes</use_file_cache>
<max_file_cache_buffer>16777216</max_file_cache_buffer>
<ring_buf_units>12</ring_buf_units>
<storage_rc>
<adaptive>true</adaptive>
</storage_rc>
</file_system>
...
</default>
...

50+ concurrent transfers at low speed

High concurrency, low-speed transfers (50+ transfers, 1-10 Mbps)

Settings
  • Socket buffers – recommended setting: 500 KB
  • Application level buffers – recommended setting: 12 units
  • Application level file cache – recommended setting: max size 2 MB
  • Storage rate control – recommended setting: adaptive
  • Disk read/write buffers – small (64KB – which is the default)
  • Windows OS desktop heap – recommended setting: 192 MB total, 256K per process (see References)

How to set it in aspera.conf

... 
<default>
 <transfer>
<protocol_options>
<max_sock_buffer>584000</max_sock_buffer>
</protocol_options> </transfer>
<file_system>
<use_file_cache>yes</use_file_cache>
<max_file_cache_buffer>2097152</max_file_cache_buffer>
<ring_buf_units>12</ring_buf_units>
<storage_rc>
<adaptive>true</adaptive>
 </storage_rc>
</file_system>
...
</default>
...

Conclusion

As in all engineering efforts, individual workflows and usage will ultimately determine exact configurations. The information provided here is to help guide starting points for tuning of the system.

References

Have more questions? Submit a request

0 Comments

Article is closed for comments.
Powered by Zendesk