DeepBench GEMM Benchmark

Submitted by prabindh on Sun, 08/27/2017 - 19:05 / /

After a bit of changes required for running DeepBench / gemm_bench specifically for MPI and NCCL, got below numbers on a Quadro M1000M.
Running training benchmark
Times
----------------------------------------------------------------------------------------
m n k a_t b_t precision time (usec)
1760 16 1760 0 0 half 512
1760 32 1760 0 0 half 464
1760 64 1760 0 0 half 693
1760 128 1760 0 0 half 943
1760 7000 1760 0 0 half 46700
2048 16 2048 0 0 half 664
2048 32 2048 0 0 half 544
2048 64 2048 0 0 half 1141
2048 128 2048 0 0 half 1103
2048 7000 2048 0 0 half 62164
2560 16 2560 0 0 half 1047
2560 32 2560 0 0 half 792
2560 64 2560 0 0 half 1806
2560 128 2560 0 0 half 1798
2560 7000 2560 0 0 half 96304
4096 16 4096 0 0 half 2629
4096 32 4096 0 0 half 2326
4096 64 4096 0 0 half 4037
4096 128 4096 0 0 half 4743
4096 7000 4096 0 0 half 245408
1760 16 1760 1 0 half 810
1760 32 1760 1 0 half 808
1760 64 1760 1 0 half 812
1760 128 1760 1 0 half 953
1760 7000 1760 1 0 half 48004
2048 16 2048 1 0 half 1052
2048 32 2048 1 0 half 985
2048 64 2048 1 0 half 996
2048 128 2048 1 0 half 1150
2048 7000 2048 1 0 half 64429
2560 16 2560 1 0 half 1636
2560 32 2560 1 0 half 1477
2560 64 2560 1 0 half 1510
2560 128 2560 1 0 half 1865
2560 7000 2560 1 0 half 100466
4096 16 4096 1 0 half 4050
4096 32 4096 1 0 half 3590
4096 64 4096 1 0 half 3703
4096 128 4096 1 0 half 4773
4096 7000 4096 1 0 half 256347
1760 7133 1760 0 1 half 46544
2048 7133 2048 0 1 half 63640
2560 7133 2560 0 1 half 99340
4096 7133 4096 0 1 half 248793
5124 9124 1760 0 0 half 191160
35 8457 1760 0 0 half 3392
5124 9124 2048 0 0 half 212186
35 8457 2048 0 0 half 3920
5124 9124 2560 0 0 half 264667
35 8457 2560 0 0 half 4923
5124 9124 4096 0 0 half 421545
35 8457 4096 0 0 half 7816
5124 9124 1760 1 0 half 188197
35 8457 1760 1 0 half 3506
5124 9124 2048 1 0 half 222694
35 8457 2048 1 0 half 3943
5124 9124 2560 1 0 half 282832
35 8457 2560 1 0 half 4861
5124 9124 4096 1 0 half 444422
35 8457 4096 1 0 half 7649
7680 16 2560 0 0 half 3125
7680 32 2560 0 0 half 2556
7680 64 2560 0 0 half 4630
7680 128 2560 0 0 half 5649
7680 16 2560 1 0 half 4863
7680 32 2560 1 0 half 4219
7680 64 2560 1 0 half 4239
7680 128 2560 1 0 half 6120
3072 16 1024 0 0 half 572
3072 32 1024 0 0 half 384
3072 64 1024 0 0 half 778
3072 128 1024 0 0 half 893
3072 16 1024 1 0 half 841
3072 32 1024 1 0 half 690
3072 64 1024 1 0 half 708
3072 128 1024 1 0 half 920
3072 7435 1024 0 1 half 50562
7680 5481 2560 0 1 half 226288
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: an illegal memory access was encountered
512 8 500000 0 0Aborted (core dumped)