tugrul_448cores
n00b
- Joined
- Apr 13, 2017
- Messages
- 2
This is an open source project which I pushed to github days ago:
wiki: https://github.com/tugrul512bit/Cekirdekler/wiki
download: https://github.com/tugrul512bit/Cekirdekler/wiki
tutorial: https://www.codeproject.com/Articles/1181213/Easy-OpenCL-Multiple-Device-Load-Balancing-and-Pip
the API balances a workload on all selected OpenCL-enabled devices such as 2xHD7870 + 1xHD7970 + 3xHD7770 then if performance is not satisfactory, developer can enable pipelining switch and embarrassingly parallelize some C# codes.
Very simple example to compute sine of array elements:
wiki: https://github.com/tugrul512bit/Cekirdekler/wiki
download: https://github.com/tugrul512bit/Cekirdekler/wiki
tutorial: https://www.codeproject.com/Articles/1181213/Easy-OpenCL-Multiple-Device-Load-Balancing-and-Pip
the API balances a workload on all selected OpenCL-enabled devices such as 2xHD7870 + 1xHD7970 + 3xHD7770 then if performance is not satisfactory, developer can enable pipelining switch and embarrassingly parallelize some C# codes.
Very simple example to compute sine of array elements:
Code:
ClNumberCruncher cr = new ClNumberCruncher(
AcceleratorType.GPU|AcceleratorType.CPU, // 2 device types to distribute load
@"
__kernel void loadBalanceTest(__global float * a)
{
int i=get_global_id(0); // workitem id: 0,1,2,3,....., 1M
a[i]+=sin(i); // sine function: somewhat hard to compute
}
");
cr.performanceFeed = true; // report to consolet that what each device is doing
ClArray<float> a = new float[1024*1024]; // 1M elements = 4MB data
a.partialRead = true; // efficient array partitioning among all devices, just as work partitioning
for(int i=0;i<100;i++)
a.compute(cr, 1, "loadBalanceTest", 1024*1024, 64); // 1M workitems, 64 local range