Estimating Potential Future Exposure with QuantLib and AWS EMR – Part II

In my previous post, we saw how to submit a Pyspark job to AWS EMR cluster. In this post, I will go over the setup of the cluster.

Before we start with the cluster, we must have a certificate keypair (.pem file) and a security group setup in AWS. There are many resources available online for this and I will not go into the details.

Next is selecting the custom AMI that I mentioned in my previous post as a base machine for all nodes in the cluster so that Anaconda Python and QuantLib are already installed on the nodes.  The AMI that I used is publicly available for now, it costs me about $5 per month to keep it alive. The details of the AMI are as shown below.

AMI ID ami-c56a94bf

I selected 1 m3xlarge (4 cores, 15 GB) machine for my master node and 3 r4x8large (16 cores, 244 GB) machines for my worker nodes. I also always select spot pricing for all my nodes.

Once the cluster is up and running (in waiting mode), first you need to secure copy spark-pfe.py file (available at my github repo spark-pfe github repo) to the master node and then you can SSH into master node to run the job using spark-submit script from previous post.

It took about 7 minutes for Pyspark job to complete, it computed netting set NPV for 5000 simulations across future 454 dates for 2 swaps and 1 FxFwd. The output of the simulation was written to S3 bucket and now its time to pull it onto local machine for analysis. Loading the NPV cube on a local machine is fine for demonstration purposes as we have only 5000 simulations but I would load the NPV cube into Amazon Redshift or AuroraDB for production purposes.

We will use boto3 library for downloading the files from S3.

Once we have the time grid and NPV cube in memory, we can do some plots to visualize the simulated exposure paths. The Blue paths are for Collateralized exposures and Red are for Uncollateralized.

Now we can calculate and plot the Expected Exposure where we just take the positive exposures from the above simulated exposure paths.

Now we can plot the PFE curves where we take the 95% quantile of above expected exposures.

We can now calculate the Maximum PFE for both the curves.

Maximum Uncollateralized PFE: 260,962.61

Maximum Collateralized PFE: 252,916.08

The spark-submit script that I have used is by no means optimized. I have tried tuning the various spark memory and number of executors parameters in a trial and error approach within a limited time frame I had. The configuration I came up with is optimal for the cluster I used  but I am sure it can be improved.

Thanks for stopping by.

 

Leave a Reply

Your email address will not be published. Required fields are marked *