Thursday, June 21, 2007

Speed of disk subsystem can be business critical

As a DELL System/Solution Consultant I designed hardware infrastructure for one Czech commercial ISP which wants to provide IPTV and VoD. ISP choosed software IPTV/VoD/DRM solution based on Linux OS. Together with software provider we choosed several servers PE 2970 which are AMD (x86_64) based servers. Streamer server needs cost-effective however fast enough disk subsystem. Unfortunatly, 2.5" hdd are available only as 10k rpm so streamer has eight 2.5" SAS 73GB 10k rpm in RAID5 where RAID is provided by internal DELL Power Edge RAID controller PERC 5/i (re-branded LSI MegaRAID SAS). Software provider has load tests which are able to recognized if system is good enough for requested solution. Software provider has reference hardware configuration with reference results. Reference hardware is server SuperMicro with eight 3.5" SCSI 36GB 10k rpm connected to RAID controller Areca ARC-1260. Disk subsystem is tested by real load test and also by synthetical disk benchmark utility "iozone". We can try to tune disk subsystem and check progress by synthetical benchmark tool.

Here are results from reference hardware (SuperMicro):
iozone -s 32g -r 1m -i 0 -i 1 -t 1 -b /tmp/test.xls
Initial write 299951.25 kB/s
Rewrite 254075.39 kB/s
Read 386271.91 kB/s
Re-read 388288.41 kB/s

iozone -s 16g -r 1m -i 0 -i 1 -t 2 -b /tmp/test.xls
Initial write 327977.77 kB/s
Rewrite 321502.41 kB/s
Read 312530.73 kB/s
Re-read 315091.88 kB/s

iozone -s 4g -r 1m -i 0 -i 1 -t 10 -b /tmp/test.xls
Initial write 293753.92 kB/s
Rewrite 281009.19 kB/s
Read 262086.25 kB/s
Re-read 260864.43 kB/s

Software provider tested DELL PE 2970 with these results:
iozone -s 32g -r 1m -i 0 -i 1 -t 1 -b /tmp/test.xls
Initial write 339329.00 kB/s
Rewrite 325063.56 kB/s
Read 337726.91 kB/s
Re-read 320971.62 kB/s

iozone -s 16g -r 1m -i 0 -i 1 -t 2 -b /tmp/test.xls
Initial write 356046.38 kB/s
Rewrite 359441.64 kB/s
Read 193787.55 kB/s
Re-read 194154.85 kB/s

iozone -s 4g -r 1m -i 0 -i 1 -t 10 -b /tmp/test.xls
Initial write 296494.20 kB/s
Rewrite 281730.82 kB/s
Read 147723.06 kB/s
Re-read 148728.67 kB/s

We can see significant difference between reference hardware and DELL hardware. DELL is better in write throughtput but worse in read throughtput. Default parameters of DELL servers are preconfigured for database servers where are usually different requirements then for streaming applications. Streaming applications don't need write but read performace. DELL received request from software provider to optimize infrastructure for better read performance.

Tunning of disk subsystem is not easy task and it's depended on lot of aspects. In this particular environment we have OS Linux Debian 4.0 (kernel 2.6.x), filesystem XFS, Raid 5, PERC 5/i. Debian is not certified and supported operating system so customer cannot use standard DELL tech-support but sometimes DELL can help to their customers in some particular complex enterprise solutions.

For increase read performace - especially sequence reads - is very important to set up read-ahead cache. PERC 5/i can be in three modes - adaptive, read ahead and non-read ahead. PERC is by default in adaptive mode which means that PERC use internal algorithm to automaticaly recognize when to use read-ahead. In this particular solution we can explicitly set up "read ahead" mode in RAID management. Another very important point is to set up read-ahead in operating system Linux block device layer.

Linux kernel 2.6
Set the value to 8192 blocks using the blockdev command, for example
blockdev --setra 8192 /dev/sda
this example is setting up 4MB Cache (8192 blocks of 512-byte sector)
which is aligned with default XFS parameters see. xfs_info for current XFS parameters

DELL IOZONE tests reults on DELL PE 2970 with tuned block device layer:
iozone -s 16g -r 1m -i 0 -i 1 -t 2 -b /tmp/test.xls
Initial write 290692.31 kB/s
Rewrite 359531.20 kB/s
Read 503044.62 kB/s
Re-read 496045.61 kB/s

iozone -s 4g -r 1m -i 0 -i 1 -t 10 -b /tmp/test.xls
Initial write 297497.92 kB/s
Rewrite 279933.16 kB/s
Read 473969.33 kB/s
Re-read 481384.16 kB/s

It's possible to successfully tuned up disk subsystem by set up read-ahead parameters. Read throughtput of DELL PE 2970,PERC 5/i,8xhdd 10k rpm was increased approximately 3 times so we achieved great results in synthetical benchmark (iozone) and overcome reference hardware results. What we can see is that disk performance lowers with more concurrent threads. Local RAID controllers are designed as disk storage for one server where I/O stress is not so high. If someone is looking for storage without I/O stress issues he should focus to SAN disk arrays which are designed for environments with lot of servers, proceses and threads.

1 comment:

Anonymous said...

Great post, seams to help my filesystem from chooking (pe1950 with md1000 as nas server) during heavy reading. Still need to findout the optimal value for setra when using reiserfs...