The NAG Blog

I posted here a week or two ago about my diary leading up to the year's biggest supercomputing event - SC11 in Seattle. I though it would be handy to give a quick summary of the diary entries so far for those who haven't been reading along.

If you recall, I said: "On my twitter stream (@hpcnotes), I described it as: "the world of #HPC in one week & one place but hard to find time for all of technical program + exhibition + customer/partner/supplier meetings + social meetings + sleep!" To follow the news about SC11 on twitter, follow @supercomputing and/or the #SC11 hashtag."

"Any hope of "live" blogging or actively tweeting during the week of SC11 itself is almost nil - the week is just too busy with the day job. Even simply consuming the relevant news, comment and gossip is a challenge."

"So instead I am going to try to write a diary of the lead up to SC11."

If you've missed them, here are the 8 SC11 blogs so far:

Along the way, I have briefly alluded to a few things NAG will be doing at SC11. One of my colleagues will be along shortly to post here about our activities at SC11, but in the meantime, plan to visit us on booth 2622, or get in touch to arrange a conversation.

R is a widely-used environment for statistical computing and data analysis. It is one of the modern implementations of the S programming language (i.e. much code written in S will run unaltered in R) although its underlying semantics are derived from Scheme. R is Free Software.

The capabilities of R can be extended through the use of add-on packages and a large number of these are available from the Comprehensive R Archive Network (CRAN). Some users have expressed an interest in calling NAG Library routines from within R; accordingly, we have recently created a package which provides access to some of NAG's numerical functionality. More specifically, the NAGFWrappers R package contains the local and global optimization chapters of the NAG Fortran Library, together with a few nearest correlation matrix solvers and some other simpler routines. The package incorporates documentation in the so-called Rdoc style that will automatically produce help pages within the R system (see figure below), and also in HTML and PDF - for example, here is the full list of NAG routines that are contained in the package.

Help for the NAG routine e04uc, as displayed in R

For completeness, and to help R users further, we have also published more general instructions about how to use the R extension mechanisms to access any NAG routine from within R.

The original version of NAGFWrappers has been available since mid-2011; we have just updated it to use Mark 23 of the Fortran Library, and are releasing R binary packages for Windows 32 bit and Windows 64 bit, along with the R source package which can be used on other platforms (for example, we have built and run it on 64 bit Linux).

It should perhaps be noted that this is a preview release of the package, which is aimed at obtaining user feedback. Although it has been built and run on the platforms mentioned above, it is not a NAG product. We are keen to receive user feedback, and will respond to technical queries and problem reports via support@nag.co.uk so that we can further refine this package and make it still more useful to the R community.

Given the origins of NAG and our mission, it’s natural for us to take the “long view” in giving back to the communities in which we operate. In the US, we’ve just finished our second year as sponsor of the DemandTec Retail Challenge (DTRC) scholarship competition for high school students in the Chicago area. DemandTec is one of our earliest and strongest software company partners who incorporate NAG components into their software products, providing demand management software for major retailers around the world.

The DTRC puts 2-person teams of high school seniors in the role of category managers for a retail store in a a 2-week computer simulation where each day represents a week in the “real world”. The students are given many weeks of data showing the price, inventory, unit sales, promotions used and the profit earned on each product they are managing. In the contest, they are responsible for two brands of coffee, one brand of tea and coffee filters. During the competition, they must analyze prior results to set the new price of each item, decide whether to run promotions and decide how much inventory to purchase. The simulation creates an interaction both with consumers and with the other teams in the competition. Ultimately, the three teams with the highest profit at the end of two weeks advance to the regional finals here in Chicago.

At the Chicago Regional finals held November 10th, teams from Glenbard West High School and Wheaton Warrenville South High School gave presentations of their data analysis, strategy formulation and how they worked as a team as they adapted their strategy in the course of the contest. The judges for the contest were experienced professionals in retail analytics from DemandTec and other local companies. Our winners were team “Price Lords” consisting of Peter Ericksen and Greg Grabarek. Each of them received $2500 toward their colleges expenses next year from NAG and our co-sponsor software company Informatica. Our third sponsor was the Network of Executive Women (NEW), which supports the education and advancement of women in leadership roles.

This contest is a labor of love for those of us involved. We have cultivated relationships with teachers of mathematics and statistics at local schools and have made presentations on the contest to a number of high school classes. It not only engages us with the students and teachers but it also helps us educate the next generation of applied scientists who might one day be NAG users. We wish Peter and Greg good luck as they compete in the national semi-finals in early December and are already making plans or next year’s contest.

The end of the year at NAG is always celebrated with a rather splendid Christmas lunch, which is generously paid for by the company as an acknowledgement of the hard work its employees have put in over the previous twelve months. Accordingly, it provides an occasion for some much-needed relaxation and refreshment before the Christmas break. It's also the time for the NAG Christmas Quiz which, whilst not necessarily contributing to the participants' relaxation, usually provides some entertainment or diversion for those who care about such arcane matters as the number of hearts an octopus has or Paul McCartney's middle name. One of those tortured souls is the present author, who cleverly realized some time ago that the only way to be sure of knowing all the answers to the questions was to set them.

Setting quiz questions in the connected age - particularly for a technical-savvy band such as the employees of NAG - can present a few challenges, however. For example, anyone with internet access (via, say, a smartphone's web browser) would be able to find the answers to the questions indicated above in a matter of moments. More direct questions, such as

Who wrote "A Child's Christmas In Wales"?
What kind of logs did Good King Wenceslas ask for?
When was "Merry Xmas Everybody" number 1 in the UK?

are even easier to answer, though some skill and judgement may still be required in the selection of the correct response to an ambiguous or ill-posed question such as the first one on this list (does it refer to Dylan Thomas's prose piece or John Cale's song?). Whilst this is clearly a valid and imaginative use of technology, I wondered whether it would provide an unfair advantage over those participants who wouldn't be using their phones in this fashion (or texting more knowledgeable friends for answers) and started wondering about ways to obviate their effectiveness.

I realized fairly quickly that confiscating all phones on entry to the restaurant didn't fall within my powers as quizmaster (chiefly because I didn't have any) and, for a similar reason, there were no funds in the quizmaster's budget for the purchase of a mobile phone jammer. My suggestion that we should move the location of the Christmas lunch to the interior of a Faraday cage in order to attenuate the phone signal wasn't looked on too kindly by senior management (or the restaurant) either. Accordingly, I began thinking about using questions that were more indirect, which might make searching for an answer more difficult. For example: what's the connection between these three things?

Queen Jezebel
The Regents of Prague
Chopin's piano

This type of query is - very roughly - analogous to a so-called inverse problem in science, in which we're asked to use observed data or results to deduce something about an underlying system or model. It can be harder to answer because there may be several models that are consistent with the observations - for example, one thing that connects the three things is that each contains the letter 'n', but that's not necessarily the right answer (which is ever-so-vaguely computer related, lest I be accused of straying too far off-topic).

Another type of problem that might present more challenges in the search for an answer is image recognition. Whilst mobile tools such as Google Goggles are already trying to make this easier for specific examples (e.g. in the identification of labels and landmarks), it's still either very difficult or impossible to recognize arbitrary objects. In the context of the quiz, this means questions such as: identify these people:

or name these films:

I'm not sure how effective my preparations were, but the quiz appeared to provide the usual amount of stimulation, along with a certain degree of exaltation and frustration (only for those who care about this sort of thing, naturally). Some of the participants have even started talking to me again, but that might be just the supervening effect of the Christmas break, and all the goodwill-to-all associated with that happy season.

PS If you'd like to answer any of the questions above, please feel free to add a comment below, although I regret to say that the unbelievably fabulous prizes have long been consumed by the winners of the quiz. Don't use your phone, though.

Winston Churchill once said "The pessimist sees difficulty in every opportunity. The optimist sees the opportunity in every difficulty." (for you confirmed pedants, it may have been L.P. Jacks)

My custom is to use the time away from work at the end of the year to think about what I want to do differently in the year ahead. Among the topics that came up was e-mail, the bane of my life and perhaps yours as well. I get hundreds every day (and that doesn't include the SPAM).

Being a lifelong optimist I've decided to make e-mail less of a pain in my life, both work and personal. Being an occasional realist, I recognize that I have a limited number of options and they must focus on what I can do.

So, here's my plan for 2012 (with acknowledgement to Scott Belsky and Stever Robbins who supplied some of the ideas and got me thinking). See Disrupt Your Inbox and What you should never say in an e-mail

Experiment with three-sentence emails when I need an answer from someone. (improve the likelihood that someone will actually read and answer).
Start e-mails with action I want, Don't leave the reader guessing until the end.
Use subject lines that intrigue the reader and actually invite them to open and read the e-mail.
Take disagreements offline. There are volumes that could be written on this.
Don't "reply all" unless everyone needs to be involved. How do you feel when you are one of 92 people copied on an e-mail that doesn't interest or pertain to you? Resist the urge, it's probably illegal.
If you need to make several points and expect a response, use numbers for reference to reduce length, opportunity for confusion.

What's your plan?

More on managing e-mail in a future installment.

Enable innovation and efficiency in product design and manufacture by using more powerful simulations. Apply more complex models to better understand and predict the behaviour of the world around us. Process datasets faster and with more advance analyses to extract more reliable and previously hidden insights and opportunities.

All ambitions that will probably resonate with those seeking scientific advances, commercial innovation, industrial growth and more cost-effective research. Underpinning all of the above is the use of more powerful computing methods and technologies. Faster and more capable computers - but equally important - more advanced and better performing algorithms and software implementations.

It's a pretty convincing story for those who take the time to listen - whether business leaders, governments, or research funders. Even in these challenging economic times, it has led to investments from industry and governments for this reason - the potential return is well documented and significant. It is even enticing enough to interest the media and the public - especially when we use emotive descriptions like "world's fastest supercomputer", "international competitiveness in digital economy", "personal supercomputing", and so on.

And it is this last thought that cause me to diverge from the grand theme to explore names and attention. I will come back to the main theme later (a future blog), as it is both important and timely. But on to my side topic.

Anybody using, selling or funding the technologies and methods described above will know what I am talking about. But the names and labels applied to it can vary significantly across the diverse audience. High performance computing (HPC), supercomputing, computational science and engineering, technical computing, advanced computer modelling, advanced research computing, etc. The range of names/labels and the diversity of the audience involved mean that what is a common everyday term for many (e.g. HPC) is an unrecognised meaningless acronym to others - even though they are doing "HPC".

This can create a barrier to engaging politicians, companies that could benefit, the media, and people in search of solutions for their day-to-day modelling/simulation/data processing challenges.

Let's play.

Most of us who see this as part of our daily life use the terms HPC or supercomputing. How do these stack up with the wider world? Let's turn to Google Trends as an arbitrary tool of statistics.

The following graph shows the search popularity of these terms ("supercomputer" and "HPC") over the last few years. Clearly "HPC" is a more common keyword.

[Plot 1: blue = supercomputer, red = HPC]

But what if we add in that term that is so often considered just a buzzword by seasoned HPC professionals - "cloud computing"?

[Plot 2: blue = supercomputer, red = HPC, orange = cloud computing]

We see that in the last few years "cloud computing" has soared above the traditional names in usage. This means a wider audience - and thus more possibilities for that ambitious opening paragraph of mine.

Adding some more technical terms ("parallel computing", "multicore") hardly register in comparative popularity.

[Plot 3: blue = supercomputer, red = HPC, orange = cloud computing, green = parallel computing, purple = multicore]

Interestingly, adding a domain specific term ("CFD") tracks the popularity of HPC rather than cloud computing.

[Plot 4: blue = CFD, red = HPC, orange = cloud computing]

You can play the same game with key technologies of the supercomputing world - e.g. [MPI, OpenMP, CUDA, OpenCL, Fortran] - and discover more interesting trends, but as this blog is already getting long - that is for another day.

I'll just leave you with this one, which might be interpreted as speaking volumes to the challenges faced in delivering the promise of my opening paragraph - [computer, software, programmer, algorithm].

computer {blue}, software {red}, programmer {orange}, algorithm {green}

computer {blue}, software {red}, programmer {orange}, algorithm {green}

[Plot 5: red = computer, blue = software, orange = programmer, green = algorithm]

What interesting related trends can you find and analyze?

When I get to work one of the first things that I do each morning is check out what’s happening on my Twitter timeline. One Thursday, a couple of weeks ago, one particular tweet caught my eye. It lead me to a great blog 'Girls can love computing; someone just needs to show them how' about the Manchester Girl Geeks. They are a group who are trying to encourage more girls and women to be interested in maths, science and technology. Being a girl myself, (OK, a woman really), and working for a mathematical software company, the article sparked a real interest.

A Girl Geek Tea Party

When I was at school, maths wasn’t my best subject, well actually and I’m going to be completely honest with you, it was my worst subject. My fear of all things mathematical started after being made to stand in front of the class reciting times tables. So it’s somewhat ironic that I found myself working at a numerical software company a few years ago, albeit in the marketing department. Had the 'Girl Geeks' been around in the 80s and had visited my school it might have made maths and science a bit more alluring for me. It’s a sad truth that girls are still way in the minority in choosing technical and science options at GCSE, A’level and degree level*.

NAG want to help in some way to reverse this trend. I think I can speak for NAG in saying that we want to see more women achieving prominence in our organisation and in scientific computing in general.

Anyway, back to my reason for blogging. After reading about the great work that Manchester Girl Geeks are involved in we decided to support them by way of sponsorship. I’m writing today to raise awareness of their mission as we feel it’s really worthwhile. We have some ideas for other ways in which NAG can assist their goals in the future.

What other ways can we as an organisation make a positive difference?

*around 16% in 2009 of students in undergraduate computer science degrees are female.

NLLS stands for nonlinear least-squares and SQP is sequential quadratic programming. So essentially this is an optimization problem, and everyone knows that NAG Library's chapter e04 is the best place to look for optimization solvers. The appropriate NAG routine in our C Library is nag_opt_nlin_lsq (e04unc).

A few weeks ago one of our users contacted NAG and asked for an example program of using e04unc in Excel. NAG and Excel page has quite a few examples and guidelines about using NAG Library in Excel, but we didn't have this particular one.

I wrote this example and now it is available for download on the Excel page. I encourage readers of this blog to download it and play with it on your own. It wasn't difficult to create it, but there was one issue that caused me a nasty headache. Some routines that have callback functions (just as e04unc does) where a vector or matrix is passed to/from a subroutine require usage of Windows API subroutine RtlMoveMemory.

Since the underlying NAG Library is a C library, array arguments are simply declared as the appropriate type. VB6 passes arguments, by default, by reference (ByRef). Hence we have access to a pointer to the array. In the case of input array arguments, the appropriate amount of storage has to be copied to a VB array before it can be used. At the end of the function, output arrays must be copied back to the pointer. For more information please have a look at our Introduction to using NAG C Library in VBA.

Well, I guess it all sounds easy. Nevertheless I had some problems with RtlMoveMemory- I couldn't pass the data to and from a callback without some sort of memory violation error or if it didn't crash I kept on getting incorrect results.

The trick is not to use the default declaration of RtlMoveMemory, but actually have two versions of it: one for passing memory to a callback function via a pointer, and the second for copying memory from an array in the callback function back to a pointer. They differ slightly in declarations:

Declare Sub CopyMemFromPtr Lib "kernel32" Alias "RtlMoveMemory"( _

ByRef hpvDest As Any, ByVal hpvSource As Long, _

ByVal cbCopy As Long)

Declare Sub CopyMemToPtr Lib "kernel32" Alias "RtlMoveMemory"( _

ByVal hpvDest As Long, ByRef hpvSource As Any, _

ByVal cbCopy As Long)

OK, but why is this so important in this example? e04unc has 2 callback functions.

Objfun, which returns the value of the objective function and its Jacobian.
Confun, which returns the values of constraint functions and their respective Jacobians.

Both of them take a vector of variables x(n) on input. In practice it means that the actual input is a pointer x_rptr to x. The user then uses CopyMemFromPtr to fill the vector x(n) with values that x_rptr points to. We start copying memory to the first element of x, from pointer x_rptr, and the full amount of memory copied is the length of the vector times the number of bytes required to store a single variable. Here's how it looks in the code:

Call CopyMemFromPtr(x(1), x_rptr, n * Len(x(1)))

So at this point we have vector x with the input values, so we can calculate the value of the objective function (and optionally its Jacobian). Once we have done it we have a vector f(m) that contains the function values. In order to get these values back to the main function we need to use CopyMemToPtr.

Call CopyMemToPtr(f_rptr, f(1), m * Len(f(1)))

This call means that we focus on the first element of vector f, take a specific amount of memory and make sure that pointer f_rptr will point at this particular vector. A similar approach applies to constraint function and the Jacobians.

This is essentially how RtlMoveMemory Windows API function is used with NAG C Library routines.

I decided not to put the whole VBA code here in order to encourage you, dear reader of this blog, to download the mentioned example and check the code on your own. In order to run it you need the 32-bit Windows implementation of NAG C Library. You can obtain a trial licence key for the Library from support@nag.co.uk.

Please let us know if you found the example useful and if you like us to create example programs for other NAG routines!

Whether we like it or not, at NAG and many other organizations, we live in an "e-mail" culture meaning that e-mails are how we communicate, receive and retain information. For many of us, e-mails also document both what we have done and what we still have to do. If you are like most in this culture, your e-mail inbox is the hub of your work life. I'm going to suggest an inbox "experiment" for you but first, a little fun.

One of my favorite ways to get to know someone is to ask how they use their inbox. It's almost a litmus test for personalities. So, what does yours say about you? Is yours:

The Black Hole: E-mail gets sucked in but never leaves, a filing cabinet with one gigantic drawer and two folders labelled "In" and "Sent". Periodically, either due to an inspired desire to get organized or "intervention" from a systems administrator, the inbox gets purged and the cycle begins anew.

The Formula One Pit: E-mail comes racing in and the pit crew (you) frantically tries to dash off a response that nominally addresses or acknowledges the item. It could be a "holding" response (thinking about it, promise to get back to you later) or a delegating response (passing it on or telling the writer to see some one else). The key feature, like the pit crew, is to get it out as fast as possible. This is a difficult personality to maintain, especially when you aren't connected to the network or need to sleep;-)

The Swiss Army Knife: This inbox, like its multi-tool namesake, can do most everything. It carries information for later reference, tasks that need to be done, dates/times for meetings, etc. It does it all. Stuff gets thrown out or filed elsewhere occasionally when it has been taken care of but it remains the ultimate "nerve center" of work life.

There are others but I think you get the idea. So, you ask, what's the experiment you want me to try? In essence, I want you to get your inbox emptied at least once each week but I want you to do it in a way that is functionally different than the Formula One Pit. Here's the algorithm:

Set aside an hour at a quieter time each week (early morning, late afternoon, whatever works)

Go through you inbox one message at a time and ask yourself "Is there an action required?"

If the answer is "No", then either delete as trash, file it elsewhere or reference or file it in your "great things I'd like to do someday but don't know when" folder. Be ruthless.
If the answer is "Yes" then answer the question "what's the action required?" and one of the following four things happens to it

If you can take the needed action in 2 minutes, do it now and either delete the e-mail or file it in another folder for later reference.
If somebody else needs to do what's needed, forward it and delegate
If you need to act on it on a specific day or day and time (e.g., a meeting) put it on your calendar
If you need to act on it but it's not time or day specific, put it on your task list. By the way, if it's really a project (i.e., has multiple steps) put it on your task list as a project and just note the very next step.

You're done! Everything that was in your inbox is now in the trash, filed for later reference, delegated to someone else, on your calendar or on your task list.

How do you feel? Can you focus better now on what you need to do without that feeling of dread you get when you look at an inbox with 50 messages (or 150, 250)? Let me know what you think.

Next time: Thinking outside the Inbox and an unattributed quote to puzzle over: "If you worry about everything, then you don't have to worry about anything."

I was pointed to this short but interesting blog today: "What's your upgrade?" by @therichbrooks, which makes the point that customers like it when businesses over-deliver on expectations. It is easy to understand what over-deliver might mean for hotels, airlines, rental cars, etc. - upgrades! - but it is equally important for other businesses to consider.

In the contexts of High Performance Computing (HPC) and of software, upgrades are a part of the routine. This covers both upgrades to newer or more powerful hardware (e.g. see the recent upgrades to the Cray supercomputers at HECToR - the UK's national supercomputing service - run by partners including NAG for CSE Support); and software upgrades for new features etc. However, these are all expected and planned upgrades - whilst they do deliver more to the customers, they are not a "over-delivery". And of course, for the service teams, upgrades mean hard work installing, testing, benchmarking and documenting the updated system.

But the key point of the linked blog was not upgrades, rather it was about managing (and meeting) customer expectations - and about over-delivery.

Over-delivery is often a key concept in supercomputer procurements. Usually a procurement specification will state a minimum performance level that a solution must deliver. This is often close to the real minimum so as to allow the optimum range of responses from the market. In reality, the procurement team hopes that substantially greater performance will be proposed by the bidders, often guiding this with "desirable" elements in the specification. Normally, the winning proposal is one that meets all the minimum requirements and also substantially over-delivers in several aspects.

Interestingly, over-delivery is perhaps more complex in products/services that have "soft" delivery - for example training courses. These trade, in part, on reputation - previous attendees spreading their satisfaction with the course by word of mouth. Thus, customer expectations are naturally high and the trick is to make sure these expectations are met with relevant material, high quality delivery, and knowledgeable tutors (it helps that NAG trainers are almost always active practitioners too - they regularly use their expertise on real projects, not just for teaching).

Which brings us back to the start - since airlines, hotels, etc. trade partly on reputation (as well as other factors such as convenience). Yet, as noted above, they have a ready means of over-delivery through upgrades.

Of course, rather than a means of over-delivery, upgrades are the essence of what NAG HPC Services provide - we take your application code and upgrade it - make it go faster, scale to solve bigger problems, adopt improved algorithms, etc. We can also help you upgrade the contribution of HPC to your business by providing advice and consulting on HPC strategy and implementation.

Perhaps a different but equally useful question is: how can NAG upgrades help you to over-deliver to your customers - e.g. through more powerful application of computational modelling and simulation?

Much of our work at NAG is devoted to creating new implementations of our numerical libraries and attempting to make their algorithms available from as many languages and packages as possible, so that our users have access to them from whichever environment they're working in. Thus, users of packages such as MATLAB^® (and similar packages such Octave), LabVIEW and Maple, and programmers working in languages like Java, Python and Visual Basic (along with, of course, more traditional languages such as C and Fortran) have all been making use of NAG algorithms to enhance their applications and solve numerical problems for a long time.

Microsoft Excel^® users can easily access NAG routines from both the NAG Fortran Library and the NAG C Library, because they are distributed as Dynamic Link Libraries (DLLs). For example, my colleague Marcin Krzysztofik has recently described how to solve a nonlinear least-squares problem in Excel using the nag_opt_nlin_lsq (e04unc) routine from the NAG C Library.

Recently, we had a query from an Excel user who wanted to invoke a method from the NAG Library for .NET. One way to do this is to use Visual Studio Tools for Office (VSTO) to create a customized Excel workbook that loads a .NET assembly when it is opened. Events in the workbook (e.g. typing values in cells or clicking buttons) then call assembly methods which can access workbook data and - for example - call NAG methods to process it.

My colleague Sorin Serban and I have illustrated this by producing a demo Excel workbook which fits a surface to a set of points in 3D space. More specifically, it uses the NAG method e01da to compute a bicubic spline through a set of data values on a rectangular grid, and then invokes e02df to calculate the values of the spline on the grid. The user can edit the data values in the workbook, and can also select a subset of the points to be used in the fit.

Screenshot of Excel demo, showing the data points in the table at the top, along with a histogram of the points at bottom left. A subset of the points has been selected by the user, and the NAG methods have been used to fit and calculate the bicubic spline surface which passes through them (displayed bottom right).

The demo is freely available for download as a deployed VSTO solution from this location, whilst more information about its working and installation - including how to obtain a trial licence for the NAG Library for .NET, which is required to run the demo - can be found in this README (a copy of which is also contained in the demo distribution). To obtain a copy of the full solution (including the assembly C# source code which invokes the NAG methods), please contact NAG support.

Worry - To feel uneasy or concerned about something.

In my last post I wrote about managing your e-mail inbox (in the narrow sense) and, more broadly, managing your work and commitments. I ended that last post with this quote: "If you worry about everything, then you don't have to worry about anything." At least one reader suggested that I owed them an explanation and so here it is.

You'll recall that I challenged you (and me) to get our e-mail inboxes empty at least once per week in a systematic way, as follows:
Go through you inbox one message at a time and ask yourself "Is there an action required?"

If the answer is "No", then either delete as trash, file it elsewhere for later reference or file it in your "great things I'd like to do someday but don't know when" folder. Be ruthless.

If the answer is "Yes" then answer the question "what's the very next action required?" and one of the following four things happens to it

· If you can take the needed action in 2 minutes, do it now and either delete the e-mail or file it in another folder for later reference.

· If somebody else needs to do what's needed, forward it and delegate

· If you need to act on it on a specific day or day and time (e.g., a meeting) put it on your calendar

· If you need to act on it but it's not time or day specific, put it on your task list. By the way, if it's really a project (i.e., has multiple steps) put it on your task list as a project and just note the very next step.

You're done! Everything that was in your inbox is now in the trash, filed for later reference, delegated to someone else, on your calendar or on your task list.

What you just did was, at least for the contents of your inbox, to “worry” about each item for a short amount of time in a systematic way. You made a conscious decision about each item and put it where it belongs – in the trash, in a folder for later retrieval, on your calendar or on your task list. In fact, you just did something even cleverer – you took things out of your head and put them into a trusted system. For many of us, this simple act frees the mind to focus on the “bird” (task) in hand rather than the dozen or more “in the bush” (still to be done).

The thoughtful skeptics among you might be forgiven if you are thinking “How does taking something out of my inbox and putting on a task list really free up my mind?” The answer lies in what you do with the contents of your task list and the other things that are on our minds but not in the e-mail inbox. I’ll talk about that next time when I define the “everything” in “worry about everything”.

Giving Credit Where Credit is Due Department: Many of the concepts and ideas mentioned here are things I’ve learned from years of trying to perfect my implementation of a methodology invented by David Allen called “Getting Things Done” or GTD. You can learn more at http://www.davidco.com/.

As a Senior Technical Consultant for NAG, I answer many customer questions covering many topics. I thought I’d write up one such question I recently received from a NAG C Library user, as the answer may be useful to others.

Q: In looking through the C# associated info, I found many examples of InteropService calls from C# to the C Library (CLW3209DA_nag.dll). Have any examples been posted for the "c05" functions, e.g. nag_zero_cont_func_brent_bsrch(c05agc)?

A: I'm glad you asked! By the time you reach the end of this post, there will be one. J

In working with the NAG C Library from C#, there are three main factors to which we must attend. The first is how to represent the NAG C Library structure types in C#, and for C# this has largely been taken care of for you in NAGCFunctionsAPI.cs.

The second is translating the C library function signature into a C# declaration.

The third is the C# declaration of any required callback functions and the assignment of delegates.

nag_zero_cont_func_brent_bsrch indeed requires a user-supplied callback function “f”, the function for which we want to find a root, whose C prototype is:

double f(double xx, Nag_Comm *comm);

In the containing namespace I declare the corresponding delegate:

public delegate double NAG_C05AGC_FUN(double xx, ref CommStruct comm);

where CommStruct is defined in NAGCFunctionsAPI.cs.

The C prototype for c05agc itself is

void nag_zero_cont_func_brent_bsrch(double *x, double h, double xtol, double ftol, double (*f)(double xx, Nag_Comm *comm), double *a, double *b, Nag_Comm *comm, NagError *fail);

In the relevant class I declare the NAG function thusly:

[DllImport("CLW3209DA_nag")]

public static extern void c05agc(ref double x, double h, double xtol, double ftol, NAG_C05AGC_FUN f, ref double a, ref double b, ref CommStruct user_comm, ref NagError fail);

A C# example console application program for c05agc, similar to the C/C++ program provided by NAG, then might run something like this:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Runtime.InteropServices;

using NagCFunctionsAPI;

namespace c05agce

{

public delegate double NAG_C05AGC_FUN(double xx, ref CommStruct comm);

class Program

{

[DllImport("CLW3209DA_nag")]

public static extern void c05agc(ref double x, double h, double xtol, double ftol, NAG_C05AGC_FUN f, ref double a, ref double b, ref CommStruct user_comm, ref NagError fail);

static void Main(string[] args)

{

NAG_C05AGC_FUN F = new NAG_C05AGC_FUN(f);

NagError fail = new NagError();

fail.char_array = new char[512];

CommStruct user_comm = new CommStruct();

double a=0;

double b=0;

double x = 1.0;

double h = 0.1;

double eps = 1e-05;

double eta = 0.0;

c05agc( ref x, h, eps, eta, F, ref a, ref b, ref user_comm, ref fail);

if (fail.code != 0)

{

string error_message = new string(fail.char_array);

Console.WriteLine(error_message);

}

else

{

Console.WriteLine("Root = {0, 9:f5}", x);

}

public static double f(double x, ref CommStruct user_comm)

{

return x - Math.Exp(-x);

}

I have been at NAG for 3 months now and one of my first tasks here was the topic of cloud computing. Customers have been inquiring as to whether they can utilize the NAG library on the hundreds of cores available on Cloud services like Microsoft's Azure and Amazons EC2. Below you will find a preliminary report of calling the NAG Library for .NET on Windows Azure.

I began with Microsoft's Cloud Numerics; a .NET analytical library that can easily be scaled out to Windows Azure for large computations. Cloud Numerics provides a library of about 400 Mathematical and Statistical functions that the user can call (in this case, from C#). Since NAG supplies the library in a .NET framework, I decided this was a good way to start.

Getting an account and all the correct software downloaded can be a challenge. I actually found this example quite useful for installation, setup, and deployment of Cloud Numerics on Azure.

To start calling NAG functions from the MSCloudNumerics example program, just add the NAG .NET dll under references and include the namespace NagLibrary. When you are ready to deploy the application to the cloud, right-click the 'AppConfigure' tab in the Solution Explorer and select 'Set as StartUp Project'. Then put in your Azure account information, create a cluster, and deploy!

The program will be compiled on your local machine to a folder, and then the entire folder will be uploaded to the cloud for processing (you may also need to include the NAG file DTW3206DA.dll in the same place where the program is compiled so that it is uploaded to the cloud). To then access your program, you can go to

https://'your.cloud.name'.cloudapp.net/portal

Just put in your username and password (created when deploying your app) to see a list of jobs you've run on Azure.

For my example program, the data is an array ranging in values from [-.5, .75]. Following this is a call to my favorite NAG function, s15ab (the cumalitive normal distribution function).

I have taken a screenshot below of the two outputs from the example program. On the left is the output from Windows Azure and on the right is the same program run on my local machine.

Left: Example run on cloud. Right: Example run on local machine.

Ideally, I'd leave this up on the cloud for everyone to log in and see it yourself, but Azure charges compute time, storage, and transfer fees every hour. Happy cloud computing!

Last month, we attended the INFORMS 2012 conference in order to learn more about current activities in the field of business analytics, and to present the results of some of the work we've done in this area. The meeting kicked off with a series of interesting technology workshops run by commercial companies as a means of promulgating their software systems; the main insight I got from them was the importance that the community places on high-quality optimization solvers in areas like prescriptive analytics, in which quantitative methods are employed to help make better decisions in business.

The NAG Library contains a variety of optimization routines (for both local and global minimization) - along with, of course, a wide range of solvers for other types of problems in analytics (such as statistical analysis, correlation and regression modelling and time series analysis) and in a variety of other numerical areas. At the conference, we presented the results of some consultancy work performed for a client who was using NAG routines to solve a large-scale constrained optimization problem arising from activities such as price promotion (see abstract 4 on this list for more details).

Optimization for a Client with Large-Scale Constrained Problems: A Case Study, on display at the INFORMS 2012 poster session.

The remainder of the conference consisted of a couple of plenary talks (from Google and eBay), an entertaining panel discussion on the perennial topic of Big Data, and a collection of contributed talks which were arranged in fifteen parallel sessions on topics like The Analytics Process, Decision Analytics, Analytics Around Us, etc. I found the standard of the presentations to be very high; a few personal highlights were:

Google’s Hal Varian describing the use of their Insights for Search tool, which can be used to identify market trends based on the frequency of searches for specific terms.
LinkedIn’s Scott Nicholson discussing their analysis of personal data entered by its users, and the way that’s used to build improved user experiences. There's a detailed article about this interesting talk here.
End-to-End Analytics’ Colin Kessinger drawing a distinction between analysis and decision-making, emphasizing the importance of explaining to the client why the answer is the correct one, and the importance of multiple iterations in the analytics process.
eBay's Bob Page speaking about eBay's perspective on consumer behaviour, and the increasing importance of mobile technologies. He described how mobile was 'driving engagement' and, as if to provide an illustration of his point, I found myself downloading the eBay app whilst he was talking about it.
4i Inc’s Eugene Roytburg giving his impression of the future of analytics and the way it's being increasingly used to drive business decisions. His discussion of the analyst's technical toolset included a reference to NAG, which was pleasing.
AMPL’s Robert Fourer presenting how the AMPL modelling language works, and describing its advantages and future directions in development.
DemandTec's Suzanne Valentine talking about their techniques for structuring and analyzing large-scale consumer data. DemandTec, who are NAG users, were recently acquired by IBM as part of their so-called Smarter Commerce initiative.

Our poster presentation was well-received, and the meeting gave us many opportunities for making useful contacts in the field. The conference location - where a weather forecast of 'partly cloudy' apparently meant 'a dazzlingly bright blue sky with a tiny little cloud over San Diego' - wasn't too shabby either.

I was recently speaking to a colleague about my first couple projects here at NAG. The first project was learning to call the Library from Python using c-types (thanks to Mike Croucher’s blog which helped immensely). Next, was a project using the Java Native Interface (JNI), which I had difficulty using. After hearing the above two pieces of information, my colleague recommended I look into Java Native Access (JNA) as it was very similar to c-types in Python. Thus began a brief love affair! I say ‘love affair’ because my experience the JNA was a bit of a roller coaster of highs and lows. In the beginning, the JNA and I got along great. As time went on, I was left sitting at the computer screen wondering what to do next, hoping for the JNA to fix things.

Background of JNI

NAG already has a thorough technical report on our website for calling the NAG Library using the Java Native Interface. This includes creating header files, compiling java files, compiling the interface library, and running the program. Seems like lots of work, even for simple functions. I was hoping the JNA would be easier.

First date with the JNA

To start using the JNA you just need to go to download the .jar file from https://github.com/twall/jna. Download the file and then move it to the directory you will be working in. Unzip it to create a com folder and you’re done! You can now start using it. Whenever you need to use a package from the JNA, just import it at the top of your Java file.

My first impression of the JNA started off on a high note. I found it extremely easy to start using the interface. Looking at code for calling the Bessel function (routine s17acc):

import com.sun.jna.Library;

import com.sun.jna.Native;

import com.sun.jna.Platform;

public class HelloWorldBessel { public interface CLibrary extends Library { CLibrary INSTANCE = (CLibrary) Native.loadLibrary("/opt/NAG/cll3a09dgl/lib/libnagc_nag.so", CLibrary.class);

double s17acc(double k,int fail); } public static void main(String[] args) {

int k=0; int retCode=0;

System.out.println("Y0(" + k + ") is " + CLibrary.INSTANCE.s17acc(k,retCode));

}

Using the JNA takes only a couple lines! Feel free to compile and run it yourself! No header files, no Javac/Javah code to compile, no JNI in sight! I was excited how easy the JNA was and very eager to try other functions.

Falling in love again

The second impression of the JNA was very nice as well. The reason I ‘fell in love’ was that it took only a couple more lines of code from the above Bessel example to call the linear equation solver (f04arc). Once variables are initialized and we’ve declared the f04arc method, all we need is a call to

CLibrary.INSTANCE.f04arc(n, a, tda, b, x,fail);

Again, it was nice and easy to use. Note: if the NAG function detects an error in the inputs, the error is automatically printed, and the entire program is terminated. A small inconvenience, but all relationships have some skeletons hidden in the closet.

Never getting Calledback

The JNA and I hit of rough patch with the callbacks. It proved to be difficult on getting the right calls, in addition to mapping C structures to Java. It took me pages of online forums and reading through user comments, but you end up needing to add extra lines when loading the library, creating the callback, and adding a structure class. A brief excerpt:

public interface CLibrary extends Library { CLibrary INSTANCE = (CLibrary)
Native.loadLibrary("/opt/NAG/cll3a09dgl/lib/libnagc_nag.so", CLibrary.class);
interface acallback extends Callback{
double invoke(double x);
}
void d01ajc(Callback fun, double a, double b, double epsabs, double epsrel, int max_num_subint, DoubleByReference result, DoubleByReference abserr, Nag_QuadProgress qp,int fail);
}
…
CLibrary.acallback fn=new CLibrary.acallback(){
public double invoke(double x){
return(x*x*x);
}
};

You’ll note when calling d01ajc, I needed to pass a Quad_Progress structure into the function. To do this, we need to create a structure with variables, but if you examine the contents of these variable after the call d01ajc, their information is meaningless (comments on how the call to d01ajc populates these fields are welcome below!). On top of this, the JNA does not handle errors very well. Since it is not apart of Java itself, I would oftentimes get “An error occurred outside Java” and left wondering which argument I got wrong.

Timings

Averaged over 100 runs, the timings are:

	JNA	JNI
s17acc	.0015s	.0010s
f04arc	.0082s	.0015s
d01ajc	.0377s	.0388s

Back to the JNI

My first impression of the JNA was nice. It was simple and no 'glue code' was required. Upon further investigation I found a couple of drawbacks which include:

Mapping C structures to Java
Slightly slower than JNI
Error Handling

Since we already have most (if not all) the wrappers for the Library using the JNI, I will stick with that for now. Alas, my time with the JNA was fun while it lasted.

ISC'12 - the summer's big international conference for the world of supercomputing - is next month in Hamburg.

I will be attending, along with several of my NAG colleagues. Will you be attending? What will you be looking to learn? I will be listening out for these five key topics.

GPU vs MIC vs Other

As at ISC'11 last year (and SC11), I think there will be a strong fight for attention in the key area of manycore/GPU devices - and a matching search for evidence of real progress. So far the loudest voice has been NVidia and CUDA, especially following NVidia's successful GTC event recently. However, interest in Intel's MIC (Knights Corner) is strong and growing - MIC has often been a big discussion topic in workshops, conferences and meetings over the last year. As the MIC product launch gets closer, people will be making obvious comparisons with NVidia's Kepler announced at the GTC.

What about others - will anyone else develop a strong voice in this manycore world? AMD Fusion? ARM? DSP-based products? Will we talk with the same energy about the software issues of manycore, or just the hardware choices?

What is happening with Exascale?

The quest to attain exascale computing rumbles on. I'd expect exascale to take a big share of the debate and agenda at ISC'12. What is happening with the exascale programs around the world? How are the budgets in the weak global economy affecting the exascale ambition? How are the various national pride (I mean national competitiveness) efforts towards exascale progressing? Will the software challenges get the level of investment needed? What new technologies will emerge to be studied as candidates to solve part of the problems getting to exascale?

Is exascale so old hat - do we need to move to discussion of zettaFLOPS to be trendy now? Will zettaFLOPS be impossible (Sterling) or inevitable (Barr)? Maybe we should spare some discussion for making multi-petaFLOPS work properly first?

Top 500, Top 10, Tens of PetaFLOPS

Will any of the next batch of supercomputers vying for title of most powerful in the world leap from the shadows at ISC'12? Will the Top 500 have a new leader? What about the 10/20/30 petaFLOPS systems in build? Has K grown since SC11? Is Sequoia fully built yet? Is Jaguar still Jaguar or has it morphed into Titan? Will it be called Jagan or Tituar if it has only partly morphed? Mira? Blue Waters? Etc.?

Or will there be other big new entries in the Top 10? I believe there will be at least one new multi-petaFLOPS entry in the Top 10. And I'm sure there will be several new entries in the Top50.

Finding the advantage in software

We will hear about applications of HPC to a wide range of problems - academic research, industrial research and engineering, etc. But will we learn how software has enabled those successes? Or will we focus only on the big machines that the software ran on? What about the computational software engineering skills needed? Or the pre/post-processing and workflow challenges? Without attention to these broader aspects of HPC, we are left only with a big box of computers and an electricity bill - much less powerful than a balanced portfolio of supercomputers + software + people + ...

Big Data and HPC

Some of you will have heard me say over the last few months that "cloud" is a buzzword of the past now, as was "green computing" before it. Marketing departments need a new catchphrase to use. I believe "Big Data" is that new buzzword. Just as broad swathes of the HPC product space were labelled "cloud solutions" last year, so will they be mandated to be "big data" solutions this year. Am I joking? Partly. Am I cynical? Of course. Am I right? Probably.

There is a serious side to this too - big data is potentially a huge user of HPC technologies - and HPC in turn generates big data. So the cross-over is real. We just need to seek out the nuggets of value and genuine new solutions among the buzzword deluge.

What will you be looking out for at ISC'12? Leave a comment or get in touch for a direct conversation with me or NAG.

One complication in using the NAG C Library from .NET is callback functions which, in C, have arrays in their parameter lists. Take for example the optimization algorithm e04nfc(nag_opt_qp) for quadratic programming problems. This NAG function requires a callback function qphess of the following C prototype:

void qphess(Integer n, Integer jthcol, const double h[], Integer tdh,

const double x[], double hx[], Nag_Comm *comm);

In C# the corresponding delegate is

public delegate void NAG_E04NFC_QPHESS (int n, int jthcol,

IntPtr h_ptr, int tdh, IntPtr x_ptr, [In, Out] IntPtr hx,

ref CommStruct comm);

If you follow the C example program for nag_opt_qp as well as the style of the NAG C# examples you will write something like this for qphess in C#:

static void qphess0(int n, int jthcol, IntPtr h_ptr, int tdh,

IntPtr x_ptr, [In, Out] IntPtr hx_ptr,

ref CommStruct comm)

{

double[] xloc = new double[n];

double[] hloc = new double[n * n];

double[] hxloc = new double[n];

Marshal.Copy(h_ptr, hloc, 0, n * n);

Marshal.Copy(x_ptr, xloc, 0, n);

for (int i = 0; i < n; ++i) hxloc[i] = 0;

for (int i = 0; i < n; ++i)

for (int j = 0; j < n; ++j)

hxloc[i] += hloc[i * n + j] * xloc[j];

Marshal.Copy(hxloc, 0, hx_ptr, n);

}

Marshaling can be fairly expensive, and in some cases it may be best to avoid it. The cost is trivial for the small example programs NAG provides, but it quickly mounts with increasing problem size. For n of 1000 and this version of qphess, nag_opt_qp takes about 7 seconds to solve a typical problem on my laptop.

How else might we write qphess? Well, if we declare f06pac(dgemv) in the right way:

[DllImport("CLW6I09DA_mkl")]

public static extern void f06pac( MatrixTranspose trans, int m,

int n, double alpha, IntPtr a, int tda, IntPtr x, int incx,

double beta, [In, Out] IntPtr y, int incy);

we can avoid Marshaling

static void qphess1(int n, int jthcol, IntPtr h_ptr, int tdh,

IntPtr x_ptr, [In, Out] IntPtr hx_ptr,

ref CommStruct comm)

{

NagFunctions.f06pac(MatrixTranspose.NoTranspose, n, n,

1.0, h_ptr, tdh, x_ptr, 1, 0.0, hx_ptr, 1);

}

Written this way, e04nfc takes only 0.62 seconds to solve my problem.

However, when I did this I also avoided the cost of serial C# loops and bound-checking – not to mention destroyed any ease of calling f06pac elsewhere. Alternatively, we could use an unsafe block to avoid Marshaling:

static void qphess2(int n, int jthcol, IntPtr h_ptr, int tdh,

IntPtr x_ptr, [In, Out] IntPtr hx_ptr,

ref CommStruct comm)

{

unsafe

{

double* h_ptr_loc = (double*)h_ptr;

double* hx_ptr_loc = (double*)hx_ptr;

double* x_ptr_loc = (double*)x_ptr;

for (int i = 0; i < n; ++i) hx_ptr_loc[i] = 0;

for (int i = 0; i < n; ++i)

for (int j = 0; j < n; ++j)

hx_ptr_loc[i] += h_ptr_loc[i * n + j] * x_ptr_loc[j];

}

Written this way, e04nfc takes 3.5 seconds to solve my problem. This suggests a cost of 3.5 seconds for Marshaling. Since nag_opt_qp makes about 1000 calls to qphess to solve my test problem, Marshaling here comes with a price of 0.44 milliseconds per megabyte.

As a check, we may write qphess a fourth way. With the help of an overload:

public static void f06pac(MatrixTranspose trans, int m, int n,

double alpha, double[] a, int tda,

double[] x, int incx, double beta,

[In, Out] double[] y, int incy)

{

unsafe

{

fixed (double* a_ptr = &a[0])

fixed (double* y_ptr = &y[0])

fixed (double* x_ptr = &x[0])

{

f06pac(trans, m, n, alpha, (IntPtr)a_ptr, tda,

(IntPtr)x_ptr, incx,

beta, (IntPtr)y_ptr, incy);

}

we may write qphess like this:

static void qphess3(int n, int jthcol, IntPtr h_ptr, int tdh,

IntPtr x_ptr, [In, Out] IntPtr hx_ptr,

ref CommStruct comm)

{

double[] xloc = new double[n];

double[] hloc = new double[n * n];

double[] hxloc = new double[n];

Marshal.Copy(h_ptr, hloc, 0, n * n);

Marshal.Copy(x_ptr, xloc, 0, n);

NagFunctions.f06pac(MatrixTranspose.NoTranspose, n, n,

1.0, hloc, n, xloc, 1, 0.0, hxloc, 1);

Marshal.Copy(hxloc, 0, hx_ptr, n);

}

We’ve avoided the C# loops but have used Marshaling. In this case, e04nfc solves my problem in about 4.2 seconds, approximately confirming my previous estimate of 0.44 milliseconds per megabyte.

The inverse gives a memory bandwidth of 2.3 GB/s, which is between my laptop’s Passmark memory test scores for Read(Cached) and Write. Microsoft seems to be doing as well as one could reasonably ask here with Marshal.Copy.

Aside: The time to solution using a purely C# version of qphess can be reduced to about 0.93 seconds by parallelizing the outer loop

static void qphess4(int n, int jthcol, IntPtr h_ptr, int tdh,

IntPtr x_ptr, [In, Out] IntPtr hx_ptr,

ref CommStruct comm)

{

unsafe

{

double* h_ptr_loc = (double*)h_ptr;

double* hx_ptr_loc = (double*)hx_ptr;

double* x_ptr_loc = (double*)x_ptr;

Parallel.For(0, n, i =>

{

hx_ptr_loc[i] = 0;

for (int j = 0; j < n; ++j)

hx_ptr_loc[i] += h_ptr_loc[i * n + j] * x_ptr_loc[j];

}

);

}

This is the latest in a series of blog posts about enhancing LabVIEW applications by using NAG methods and routines; previously, we've described in detail how to invoke methods from the NAG Library for .NET in LabVIEW, and how to call routines from the NAG Fortran and C libraries from within that programming environment. In addition, we supplemented those descriptions with an archive of examples which is available from the NAG LabVIEW page.

The examples we looked at previously were all in the 32 bit environment, but some users have asked whether all this works in the 64 bit world, being keen to take advantage of the larger address space of that architecture. Indeed it does, as we shall show here.

This screenshot shows the block diagram of a demo application, running within the 64 bit version of LabVIEW, in which we call the same routine three times - once from each of the NAG Libraries (this duplication is only used to illustrate that all of the libraries work in the 64 bit environment, and is clearly not necessary in a working application, which would only utilize one of the libraries). More specifically, we use LabVIEW 2011 SP1, version 11.0.1f1 (64 bit) under Windows 7, together with the following NAG Libraries:

NAG Fortran Library [Mark 23] for x86-64 systems, Windows XP/Vista/7 DLL, Intel Fortran for 64-bit applications (FLW6I23DCL)
NAG C Library [Mark 9] for Microsoft Windows XP/Vista/7, Intel C/C++ 64 or Microsoft 64-bit C/C++ (CLW6I09DAL)
NAG Library for .NET [Release 1] for Windows XP/Vista/7, x86-32, x86-64 (DTW3A01DAL). It should be noted that this example uses the 64 bit assembly (NagLibrary64.dll) rather than the 32 bit one (which is also part of the NAG Library for .NET installation).

The NAG routine that we call is s01ba, an extremely simple one which calculates the shifted logarithm of its argument (once again, this was selected only for illustrative purposes: any of the more sophisticated routines in the Libraries could have been used in its place, albeit at the expense of added complexity).

This screenshot shows the front panel of our demo application, with the input value on the left and the output from the the three versions of the NAG routine on the right. As is to be expected, the outputs are identical.

This demo application has been incorporated into an upgraded version (1.1) of the archive of examples mentioned earlier. In addition, the examples in that archive that use the NAG C Library have been updated to use Mark 23 of that Library. The archive is freely downloadable from the NAG LabVIEW page; a README file (a copy of which is contained in the archive) provides more information about the examples and their software prerequisites.

NAG recently embarked on a ‘Knowledge Transfer Partnership’ with the University of Manchester to introduce matrix function capabilities into the NAG Library. As part of this collaboration, Nick Higham (University of Manchester), Rui Ralha (University of Minho, Portugal) and I have been investigating how blocking can be used to speed up the computation of matrix square roots.

There is plenty of interesting mathematical theory concerning matrix square roots, but for now we’ll just use the definition that a matrix X is a square root of A if X²=A. Matrix roots have applications in finance and population modelling, where transition matrices are used to describe the evolution of a system from over a certain time interval, t. The square root of a transition matrix can be used to describe the evolution for the interval t/2. The matrix square root also forms a key part of the algorithms used to compute other matrix functions.

To find a square root of a matrix, we start by computing a Schur decomposition. The square root U of the resulting upper triangular matrix T can then be found via a simple recurrence over the elements U_ij and T_ij:

We call this the ‘point’ method.

In many numerical linear algebra routines (such as LAPACK and the BLAS) run times can be drastically reduced by grouping operations into blocks to make more efficient use of a computer’s cache memory. We found that run times for the point method could be similarly reduced by solving the recurrence in blocks. We found that even greater speed ups could be obtained by using an alternative blocking scheme in which the matrix is repeatedly split into four blocks and the algorithm calls itself recursively.

The graph below shows run times for the three algorithms, for triangular matrices of various sizes. Recursive blocking is about 10% faster than standard blocking and up to eight times faster than the point algorithm.

For full square matrices, much of the work is done in computing the Schur decomposition but speeding up the triangular phase is nevertheless useful! The graph below shows run times for Matlab’s sqrtm (an implementation of the point algorithm) together with Fortran implementations, called from within Matlab, of the point (fort_point) and recursively blocked (fort_recurse) algorithms. The recursive routine, fort_recurse, is about 2.5 times faster than sqrtm and over twice as fast as fort_point.

We wrote up our findings in much more detail here: http://eprints.ma.man.ac.uk/1775/. However, the plot thickened when we started investigating parallel implementations.

A very fruitful way of taking advantage of multicore architectures is simply to use threaded BLAS. Another approach, used in the NAG Library for SMP and Multicore, is to explicitly parallelise code using OpenMP. Some run times for the triangular phase of the algorithm using the various blocking schemes and parallel approaches are shown in the graph below.

If only threaded BLAS was used then recursive blocking still performed best. However, when OpenMP was used, recursive blocking actually slowed down! This is because the algorithm’s performance was badly hit by synchronization requirements within the recursion. Synchronization and data dependency is often a crucial factor determining how well an algorithm will perform in parallel.

So the moral of this story is: the best algorithm for serial architectures may not be the best algorithm in parallel! (Of course, users of the NAG Library for SMP & Multicore need not concern themselves with this detail; they can rely on NAG developers providing optimal tuning to enable best runtime performance.)

SC11 diary catch up

Calling NAG routines from R

Coffee and Filters

Question one: Where's my phone?

Self-Improvement

Cloud computing or HPC? Finding trends.

Girls, Geeks, Twitter and Me.

How to solve a NLLS problem using SQP method in Excel?

Self Improvement - An Algorithm for getting to "empty"

Upgrades - hotels, airlines and HPC

Adding functionality to Excel using the NAG Library for .NET

How to worry about everything (and nothing)

How To: Call Brent's Root-Finding Algorithm From C#

NAG on the Cloud

Optimization, statistics, big data and business analytics

An Affair with the Java Native Access (JNA)

ISC'12 Hamburg Preview

So just how expensive is Marshaling?

Using NAG and LabVIEW in a 64 bit environment

The Matrix Square Root, Blocking and Parallelism