Computing Anything related to trading algorithm, computers, C++, C#, Excel, VBA, Matlab, SAS, S+, R programming, etc

Reply
 
Thread Tools
  #1  
Old 05-12-2008, 03:13 PM
Wallstyouth's Avatar
Wallstyouth Wallstyouth is offline
Vice President
View Wallstyouth's LinkedIn Profile
 
Join Date: May 2007
Location: Queens
Job: A Large IB
113 Posts, ranked 42
Nvidia - Cuda Toolkit for options pricing

We've been looking at Cuda for some options pricing application we run here and was wondering how many other shops on the street are using this toolkit.

What makes Cuda very attrafictive is that it executes code directly on the GPU instead of the CPU. Code executed on the GPU seems to run many times faster than traditional CPU.

Some good applications for Cuda seems to be:
Binomial Option Pricing
Black-Scholes Option Pricing
Monte-Carlo Option Pricing
Parallel Mersenne Twister (random number generation)
Parallel Histogram
Image Denoising
Sobel Edge Detection Filter
Computational Finance
CUDA Zone - resource for C developers of applications that solve computing problems
Learn More about CUDA - NVIDIA
CUDA - Wikipedia, the free encyclopedia
Reply With Quote
  #2  
Old 05-12-2008, 10:36 PM
Andy's Avatar
Andy Andy is online now
Quant Network
View Andy's LinkedIn Profile Follow Andy's Twitter
 
Join Date: May 2006
Location: NY
4,409 Posts, ranked 1
Blog Entries: 29
I know at least one option stat arb prop desk running CUDA for their real time pricing engine. I sit so close that everytime their open their computer room, it sounds like an airplane taking off. That thing needs serious cooling.
Reply With Quote
  #3  
Old 05-13-2008, 12:43 AM
alain's Avatar
alain alain is offline
Older and Wiser
 
Join Date: Mar 2004
1,448 Posts, ranked 2
We are planning to use it. We are hiring a consultancy company that uses it already. I will let you know.
______________________________
"Greatness is not about someone who has the ability to be great. Greatness shows up when someone might not have that ability but finds a way to succeed. They outwork their opponents, they outhit their opponents, they outfight their opponents. They want it more."

Last edited by alain; 08-05-2008 at 02:21 PM.
Reply With Quote
  #4  
Old 08-05-2008, 02:04 PM
Wallstyouth's Avatar
Wallstyouth Wallstyouth is offline
Vice President
View Wallstyouth's LinkedIn Profile
 
Join Date: May 2007
Location: Queens
Job: A Large IB
113 Posts, ranked 42
Alan Hanweck Associates, LLC ?
Reply With Quote
  #5  
Old 08-06-2008, 11:07 AM
Bastian Gross's Avatar
Bastian Gross Bastian Gross is offline
German Mathquant
 
Join Date: Jan 2008
Location: Trier, Germany
Job: PHD student in research
179 Posts, ranked 26
Blog Entries: 2
I'm very interested, so tell me more!
______________________________
Energy can be likened to the bending of a bow, decision to letting the arrow fly. (Sun Tzu)
Prediction is very difficult, especially about the future. (Confucius / Mark Twain / Niels Bohr)
Reply With Quote
  #6  
Old 08-09-2008, 01:55 PM
Andy's Avatar
Andy Andy is online now
Quant Network
View Andy's LinkedIn Profile Follow Andy's Twitter
 
Join Date: May 2006
Location: NY
4,409 Posts, ranked 1
Blog Entries: 29
Quote:
Originally Posted by Wallstyouth View Post
Interestingly enough, I went to their website and in the first article on their website, the guy who ran the option stat-arb desk was interviewed. That's the desk I mentioned on post #2 a while back.
http://www.wallstreetandtech.com/dat...leID=208700219
Reply With Quote
  #7  
Old 08-10-2008, 12:04 PM
Bastian Gross's Avatar
Bastian Gross Bastian Gross is offline
German Mathquant
 
Join Date: Jan 2008
Location: Trier, Germany
Job: PHD student in research
179 Posts, ranked 26
Blog Entries: 2
There are some good papers about Option Pricing with Graphics Processing Units by Wladimir Surkov.

And even some Germans aware GPU-Programing : Turbo-Grafikchips machen PCs schneller
Reply With Quote
  #8  
Old 12-04-2008, 09:54 AM
JOELUI JOELUI is offline
 
Join Date: Nov 2008
1 Posts, ranked 1790
I already implemented a Monte Carlo pricing Engine using CUDA.
It is just a very simple exotic pricing engine. With 3 billions sample paths in just 20 seconds, I can accurately find the delta value within 0.5% error. (I'm just using a very cheap GF8600 GPU, but with the most advanced chips version, I guess the speed can improve further .... )
Reply With Quote
  #9  
Old 12-07-2008, 03:46 PM
Adam G Adam G is offline
 
Join Date: Dec 2007
Location: Illinois
Job: Chemical process optimization/planning
15 Posts, ranked 210
Would Cuda speed up the calculations required for non-linear regressions?
Reply With Quote
  #10  
Old 12-07-2008, 06:35 PM
doug reich's Avatar
doug reich doug reich is offline
Some guy
 
Join Date: Apr 2008
Location: New York, NY
787 Posts, ranked 4
Yes it would
______________________________
Forum FAQ: http://www.quantnet.com/wiki/Forum_FAQ
Please contribute your wisdom on many subjects to the Quantnet Wiki (for example, Programming Languages - C++ - PhD - Research Papers).
Reply With Quote
  #11  
Old 12-07-2008, 08:30 PM
Stefan Zota Stefan Zota is offline
View Stefan Zota's Facebook Profile
 
Join Date: Apr 2008
Location: New York
Job: IT Analyst GS
355 Posts, ranked 14
This is an interesting subject. I've read only a bit from the links on the thread.
In graphics and hardware research GPGPU is a hot topic. Computing power is incredible, the "price" being to write parallel code.
For option pricing (finite differences, trees etc) we would need to change the single threaded implementation to leverage parallel execution. Maybe will do some research in "spare time" ...
Reply With Quote
  #12  
Old 12-07-2008, 08:55 PM
Andy's Avatar
Andy Andy is online now
Quant Network
View Andy's LinkedIn Profile Follow Andy's Twitter
 
Join Date: May 2006
Location: NY
4,409 Posts, ranked 1
Blog Entries: 29
I have a GeForce 8600GTS on my computer so I will try to see how fast they get.
Here is CUDA.NET that you can use C# with CUDA
http://www.gass-ltd.co.il/en/products/cuda.net/
Reply With Quote
  #13  
Old 12-08-2008, 12:07 AM
satyag satyag is offline
View satyag's LinkedIn Profile
 
Join Date: Mar 2008
Location: New Jersey
34 Posts, ranked 111
This is indeed an interesting subject. Recently, I had a chat with some guys at work in mortgage group at bloomberg when I was researching some hardware acceleration options for data compression. It gave a good insight in to the power and limitations of GPU computing. It is amazing to know that NVIDIA now makes "head less GPU hardware" specifically for financial applications. i.e., these GPUs dont have any video out socket at all. So, technically speaking they are not GPUs and they dont have anything to do with graphics.

Though GPUs have way more cores than CPU, it is not the main reason for using GPUs. In-fact, the typical clock speed of a GPU core is way less than the CPU clock speeds of today. Secondly, most financial problems are very sequential. However, they are more repetitive, i.e., pricing a single security is sequential but you can price more securities with more cores. GPUs power is really in their ability to handle floating point more efficiency and more importantly, the SIMD support (single instruction multiple data). Suppose, you have to add two vectors, a CPU will take linear time to execute the add operation because you will have a loop in your code to add each element separately. On the other hand, GPU's support vector add instructions which can typically add up to 128 elements in constant time.

But all this power comes at a cost.
1) You loose portability. GPU code is very much tied to vendor and hardware specific
2) Programming paradigm is different. Once you are on a GPU, OS has very little role in resource management. So, applications have to manage resources like cores and several types of memory and registers on GPUs themselves and also make sure that they are not stepping on each other's resources
3) The amount of memory on GPU is limited. So your data structures have to be more compact and less fragmented and the application on the CPU will have to move bits and pieces to the GPU and drive the algorithm.
4) Unless you are developing everything from scratch, integrating with existing code is going to be tricky and painful.
Reply With Quote
  #14  
Old 12-08-2008, 01:55 PM
cw202 cw202 is offline
 
Join Date: Jul 2004
23 Posts, ranked 145
response to nVidia CUDA

>But all this power comes at a cost.
>1) You loose portability. GPU code is very much tied to vendor and hardware specific

Moot point.... Nvidia is not going away anytime soon. There are many proprietary technical solutions implemented on a desk that are tied to a vendor. This is an argument often espoused by "Java" and in general Open source supporters. If this was truly a concern, the MSFT .NET framework would have gone the way of Windows ME. Moreover, Apple's Grand Central would not be getting the buzz it has been receiving in the GPU community.

>2) Programming paradigm is different. Once you are on a GPU, OS has very little role in >resource management. So, applications have to manage resources like cores and several >types of memory and registers on GPUs themselves and also make sure that they are not >stepping on each other's resources

An argument often espoused by those who dabble in languages that run inside a virtual machine :-). Here lies the difference between a coder and a programmer. Managing memory, threads, etc is tedious, but not hard to implement. The heavy lifting comes from designing the program or framework.


>3) The amount of memory on GPU is limited. So your data structures have to be more compact >and less fragmented and the application on the CPU will have to move bits and pieces to the >GPU and drive the algorithm.

It depends on the skills of the programmer. Unless you are loading an entire database in memory, the current memory on GPUs are more than adequate. And if you are loading a huge dataset into the GPU, then you have to reconsider your program design.

The "headless" GPU cards and standalone systems come with 1GB - 4GB of ram. It is more than
enough to handle heavy computing.

4) Unless you are developing everything from scratch, integrating with existing code is going to be tricky and painful.

CUDA is more or less C, which means it will talk to C++ programs (with some modifications), and if you look hard enough, you can find a way to wrap up the interface for other languages.
On the enterprise level, if you have a robust messaging system (i.e. Tibco rendezvous), then this becomes a non-issue.


The GPU is a great computing resource. It does take more effort in the design and coding of programs. The pros far outweigh the cons.

Last edited by cw202; 12-08-2008 at 02:36 PM.
Reply With Quote
  #15  
Old 12-08-2008, 02:27 PM
alain's Avatar
alain alain is offline
Older and Wiser
 
Join Date: Mar 2004
1,448 Posts, ranked 2
As soon as I have some hard details I will try to post them here. We are working with CUDA at the moment.

BTW, it is really really fast. Also, we are not talking about graphic cards but Tesla cards and Tesla machines.
Reply With Quote
  #16  
Old 12-09-2008, 08:47 AM
parisjohn parisjohn is offline
 
Join Date: Aug 2008
Location: Paris
Job: Quantitative Analyst
22 Posts, ranked 151
Quote:
Originally Posted by satyag View Post
This is indeed an interesting subject. Recently, I had a chat with some guys at work in mortgage group at bloomberg when I was researching some hardware acceleration options for data compression. It gave a good insight in to the power and limitations of GPU computing. It is amazing to know that NVIDIA now makes "head less GPU hardware" specifically for financial applications. i.e., these GPUs dont have any video out socket at all. So, technically speaking they are not GPUs and they dont have anything to do with graphics.

Though GPUs have way more cores than CPU, it is not the main reason for using GPUs. In-fact, the typical clock speed of a GPU core is way less than the CPU clock speeds of today. Secondly, most financial problems are very sequential. However, they are more repetitive, i.e., pricing a single security is sequential but you can price more securities with more cores. GPUs power is really in their ability to handle floating point more efficiency and more importantly, the SIMD support (single instruction multiple data). Suppose, you have to add two vectors, a CPU will take linear time to execute the add operation because you will have a loop in your code to add each element separately. On the other hand, GPU's support vector add instructions which can typically add up to 128 elements in constant time.

But all this power comes at a cost.
1) You loose portability. GPU code is very much tied to vendor and hardware specific
2) Programming paradigm is different. Once you are on a GPU, OS has very little role in resource management. So, applications have to manage resources like cores and several types of memory and registers on GPUs themselves and also make sure that they are not stepping on each other's resources
3) The amount of memory on GPU is limited. So your data structures have to be more compact and less fragmented and the application on the CPU will have to move bits and pieces to the GPU and drive the algorithm.
4) Unless you are developing everything from scratch, integrating with existing code is going to be tricky and painful.
1) NVIDIA-CUDA, AMD-Brook and IBM for the CELL are three possibility to use new way with GPU (altough cell it's a little be different)
However, openCL was launched today with the first header.
I think it would be the solution in the future
OpenCL - Wikipedia, the free encyclopedia
2) right
3) you can see this monster http://www.nvidia.com/object/persona...computing.html, but the memory it's a big problem for read and write.
as far as i am concerned, the big problem with GPU it's to transfer DATA on the card and after copy it to the CPU.
For example, for a good montecarlo with sobol sequences for example we need to have all your data on the GPU otherwise if you want to read-write the memory you loose the power of GPU
OpenCL - Wikipedia, the free encyclopedia
J
______________________________
The Quantitative Finance Library : http://www.quant-press.com
Live News with Facebook : http://www.facebook.com/quantpress
Reply With Quote
  #17  
Old 12-09-2008, 09:35 AM
alain's Avatar
alain alain is offline
Older and Wiser
 
Join Date: Mar 2004
1,448 Posts, ranked 2
FYI, in preliminary tests from a vendor, they are able to calculate implied vol for a 500,000 options in 1.5 seconds.

BTW, Brook has been around for some time but it has never caught on. GPU people are very secretive about their stuff so I don't see cross compatibility any time soon.
Reply With Quote
  #18  
Old 12-09-2008, 01:21 PM
cw202 cw202 is offline
 
Join Date: Jul 2004
23 Posts, ranked 145
(1) OpenCL is an Apple initiative. Is there a Linux, Unix, or Win32/Win64 implementation?

(3) Please elaborate on the memory and system bus constraint? Do you expect to scale up to 16/32/64GB and to allow for a 3GHZ memory pipe between the CPU, GPU, and RAM?
You can design your program around memory constraints. The bus speed of your system
will allow your GPU to write to and from system RAM in an efficient manner especially if you
employ a PCIe 2.0 card and DDR3 ram. The bottleneck is reduced, almost un-noticable. You do not lose the power of the GPU, you get your results a little later, which to the end user is un-noticable.

Please elaborate on the "good montecarlo with sobol sequences" comment.


Quote:
Originally Posted by parisjohn View Post
1) NVIDIA-CUDA, AMD-Brook and IBM for the CELL are three possibility to use new way with GPU (altough cell it's a little be different)
However, openCL was launched today with the first header.
I think it would be the solution in the future
OpenCL - Wikipedia, the free encyclopedia
2) right
3) you can see this monster NVIDIA Tesla Personal Supercomputer, but the memory it's a big problem for read and write.
as far as i am concerned, the big problem with GPU it's to transfer DATA on the card and after copy it to the CPU.
For example, for a good montecarlo with sobol sequences for example we need to have all your data on the GPU otherwise if you want to read-write the memory you loose the power of GPU
OpenCL - Wikipedia, the free encyclopedia
J
Reply With Quote
  #19  
Old 12-10-2008, 05:42 AM
parisjohn parisjohn is offline
 
Join Date: Aug 2008
Location: Paris
Job: Quantitative Analyst
22 Posts, ranked 151
Quote:
Originally Posted by cw202 View Post
(1) OpenCL is an Apple initiative. Is there a Linux, Unix, or Win32/Win64 implementation?

(3) Please elaborate on the memory and system bus constraint? Do you expect to scale up to 16/32/64GB and to allow for a 3GHZ memory pipe between the CPU, GPU, and RAM?
You can design your program around memory constraints. The bus speed of your system
will allow your GPU to write to and from system RAM in an efficient manner especially if you
employ a PCIe 2.0 card and DDR3 ram. The bottleneck is reduced, almost un-noticable. You do not lose the power of the GPU, you get your results a little later, which to the end user is un-noticable.

Please elaborate on the "good montecarlo with sobol sequences" comment.
1) Just go to OpenCL
Opencl is universal
3) "The bus speed of your system
will allow your GPU to write to and from system RAM in an efficient manner especially if you
employ a PCIe 2.0 card and DDR3 ram"
Ok, but if for each evaluation of your payoff you need to put GPU data on CPU data, you will have a CUDAread and a CUDAwrite which is expansive compare to kernel
Just to say, that if you want to get the total power of GPU you need to rebuild all your code on the GPU card and it's so expansive
In my example with montecarlo :
You create with a Mersenne Twister random number on the GPU card
After if you want Sobol sequences, you will need to copy it to GPU card and after if you have a Basket of 10 assets with 100 dates of evaluation and 100000 paths, you will have at least 1Gb Ram takes by this random number
Another example, with a BGM, the drift computation is very expansive on a CPU but with the GPU is very very fast. However, if you need your discount curve in the future on the CPU to make another computation you will need to cpy the forward curve on the CPU for each dates of your evaluation and it's expansive and so you loose the power of GPU.
So, you must for each example rebuild your code and it's so risky

Consequently, cuda it's a good solution but firm can not invest on it because they will rebuild all their application espacially with CUDA which is so dangerous. If one year later Brook is better you will need to destroy cuda code and begins with brook...and the same with CELL
OpenCL create homogeneasation and will be the solution
Reply With Quote
  #20  
Old 12-10-2008, 05:45 AM
parisjohn parisjohn is offline
 
Join Date: Aug 2008
Location: Paris
Job: Quantitative Analyst
22 Posts, ranked 151
Quote:
Originally Posted by alain View Post
FYI, in preliminary tests from a vendor, they are able to calculate implied vol for a 500,000 options in 1.5 seconds.

BTW, Brook has been around for some time but it has never caught on. GPU people are very secretive about their stuff so I don't see cross compatibility any time soon.
For compatibility just look OpenCL and i think next year OpenCL will be a standard
However, intel with Larabee will be a good competitor
500,000 options in 1.5seconds, it's not a real case because you will need cudamemcpy for all your data : Forward, Discount etc... It's the famous example in the sdk CUDA with BS but he does not change the data.
When a vendor says that there is a gain of 100 compare to CPU it's generally a fake, when the vendor says 10 it's was more realistic
Reply With Quote
  #21  
Old 12-10-2008, 10:24 AM
alain's Avatar
alain alain is offline
Older and Wiser
 
Join Date: Mar 2004
1,448 Posts, ranked 2
Quote:
500,000 options in 1.5seconds, it's not a real case because you will need cudamemcpy for all your data
I lied, I'm sorry. It really takes 1 minute to send the data across the wire to the server via http services, calculate the implied vol and get the result back. This is no fake, this is real in our case.
Reply With Quote
  #22  
Old 12-10-2008, 11:35 AM
parisjohn parisjohn is offline
 
Join Date: Aug 2008
Location: Paris
Job: Quantitative Analyst
22 Posts, ranked 151
Quote:
Originally Posted by alain View Post
I lied, I'm sorry. It really takes 1 minute to send the data across the wire to the server via http services, calculate the implied vol and get the result back. This is no fake, this is real in our case.
Ok with a QuadCore computer how many times it takes ? with 8-Core ?
Because in this example, the GPU power is canceled by transfering data of the 500 000 options...
Reply With Quote
  #23  
Old 01-16-2009, 08:10 AM
larsp larsp is offline
 
Join Date: Jan 2009
1 Posts, ranked 1813
We use extensively CUDA in our main program Zonar for options calculations and in the portfolio system.
Based on the experiences we got and speed improvements, we are now implementing CUDA in an algorithmic trading solution.
Also nice with a version 2, that now works with VS-2008

Used in ZOnar:
SoftCapital - Zonar, learn more

Cheers,
Lars
Reply With Quote
  #24  
Old 01-16-2009, 09:56 AM
parisjohn parisjohn is offline
 
Join Date: Aug 2008
Location: Paris
Job: Quantitative Analyst
22 Posts, ranked 151
I think OpenCL arrives at a good time because NVIDIA have many difficulties with the crisis
Reply With Quote
  #25  
Old 01-17-2009, 04:19 PM
Bastian Gross's Avatar
Bastian Gross Bastian Gross is offline
German Mathquant
 
Join Date: Jan 2008
Location: Trier, Germany
Job: PHD student in research
179 Posts, ranked 26
Blog Entries: 2
Does anyone have experience with Star-P?
S T A R - P ™
Reply
Reply

Bookmarks

Tags
cuda

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
XLL Plus - toolkit for C/C++ programmers to write Excel add-in libraries Andy Computing 2 12-09-2008 04:39 PM
VIX Options pricing question Whetstone Pricing and Hedging 4 09-14-2008 07:14 PM
Options pricing on the iPhone Olivier Sel Pricing and Hedging 3 09-10-2008 01:59 PM


All times are GMT -4. The time now is 11:10 PM.