Monday 29 March 2010

DTrace - Great if you can get it to work at all

On the face of it I think DTrace is a phenomenal piece of technology from Sun. That on the fly you can turn on instrumentation on all kernel events and system calls, and write your own scripts to count calls, sum elapsed time, or look at which files had the most I/Os to them, and much more. All implemented with no external change to Solaris itself - all existing commands and utilities run the same - and minimal impact on performance when not enabled. Truly amazing.

For the record I should confess that I was a Sun employee when DTrace was initially announced. However I was an Oracle Consultant and had nothing to do with Solaris kernel development or DTrace. My viewpoint was and remains that of a user of Solaris, and I did not gain any inside knowledge about DTrace one way or another.

I did see the demonstrations that are commonly repeated, where they start off a rogue process and then use DTrace scripts to see what is going on, and which process is causing all the extra work on the system. It looks so simple and so direct - a few iterations and you have identified the process in question and got some of the details of what it is doing.

I have been waiting since then to use DTrace in anger on some real life investigations. It has taken some time for Solaris 10 to be fully rolled out by most organizations. And that opportunity just occurred the other day. A chance to use DTrace to see what an application was doing, and in which functions it was spending most of its time.

A quick peruse of the Web finds the DTrace Toolkit, in which I find something called dappprof.d which does something similar to what I want - elapsed time per function call. I write my own D script for what I want, using similar probes and timestamp calculations, and am ready to look inside this application. Problem solved? No!

Unfortunately, DTrace does not work! Or at least that is what seems to be happening when I try and use it. Which is why I am writing this blog post. Sorry to be so negative, but either it does what it is supposed to do or it does not. And unfortunately it simply does not do what it is supposed to. And I've just spent 2 days banging my head against a brick wall trying to get DTrace to do what it should be doing. The good news is that there is an "answer" to the problem I had, which I'll get to later on. But this could all have been avoided if some of the "details" of DTrace had been done properly by Sun - like useful and helpful error messages, and configuration options that made sense.

First, unlike other software products which tell you WHY something went wrong so that you can fix it, DTrace simply says "I'm not going to do that" and stops. Shades of HAL from 2001: A Space Odyssey spring to mind. Generally whenever something goes wrong with Oracle or a programming language compiler I get a specific message - "syntax error at line xxx", "missing keyword expected", "out of memory". With DTrace you get a message that amounts to "I did not like that so I am stopping" one way or another.

My first error message was:
dtrace: processing aborted: Abort due to systemic unresponsiveness

I managed to find a blog entry by Jon Haslam of Sun, that says that there is nothing you can do about this, other than disabling certain protections within DTrace. And generally speaking those protections are there for a reason, so it is not recommended to disable them.

Okay, maybe I was sampling too often, or matched too many probes, or something. As I say, DTrace never tells you WHAT caused the failure. So I cut down my script and instead of 20,000 probes I now match less than 100. Just 33 in fact. Does it run properly? No! Still the same problem:
dtrace: script './gpsmintrace.d' matched 33 probes
dtrace: processing aborted: Abort due to systemic unresponsiveness

The script only has 4 probes as far as I am concerned, even though DTrace ends up matching more than this:
pid$target:a.out::entry
pid$target:a.out::return
profile:::tick-5sec
dtrace:::END

So the entry and exit points of functions in the program file itself (a.out), every 5 seconds, and finally at the end of the script when it stops. How can these be causing "systemic unresponsiveness?".

I try different combinations - a different process, take out the tick-5sec probe, name the functions to match so only "11 probes" matched - and always the same result:
dtrace: processing aborted: Abort due to systemic unresponsiveness

At this point the only conclusion I can come to is that DTrace is useless. What else can I conclude? Every time I run my minimal D script it sits there with no output messages and then aborts. Why do I not see the output from the 5 second ticks? Why does it never work?

As you can imagine I am just about ready to give up on DTrace - forever. I have been through the trouble of designing and writing an initial D script to help investigate my application behaviour, and have requested the appropriate privileges from the system administrators - dtrace_user and dtrace_proc - but have achieved nothing from 2 days of continuous trying. After 2 days I have got zero information out of DTrace, useful or not.

I'm ready to throw in the towel and give up on DTrace as another piece of great but fundamentally flawed Sun technology when I have one more thought. Why are the 5 second tick probes never firing? Why do I never see any output from them? What if the tick probes never fired? Wouldn't my running totals in the aggregations get very large? Could this be causing some kind of overflow and DTrace to simply give up?

A quick Google on the terms "dtrace tick probe privilege" turns up an entry at http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02621.html with the same symptoms - the tick probe never fires. And the solution is confirmed in the reply http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02622.html - you need the "dtrace_kernel" privilege for the tick probes to fire and output anything. So I make another request to the system administrators for this privilege, and voila! My D script now works as it should do.

So it seems that to do anything useful with DTrace you will need the maximum privilege level of dtrace_kernel, regardless of what the manual tells you differently. Yes, to "use" the tick probe of the profile provider in a D script you need the "dtrace_user" privilege. However, such a defined tick probe will never fire unless you have the "dtrace_kernel" privilege. The "dtrace_user" privilege gives you the "right" to use a tick probe, but not the "access" to that probe at run time. You need the "dtrace_kernel" privilege too for the run time "access" to the tick probe.

I've a feeling that this is extremely poorly designed and documented by Sun. It is not documented in the standard DTrace manuals - I know I looked very closely at them on how to make the tick probe work. So it must be a common problem that people run into time and time again on production systems when trying to work with the least necessary privileges. Yet my Google search only found one explicit reference to this, on a forum posting by someone with the same problem. All the other hits on "dtrace tick probe privilege" were standard descriptions of using probes in DTrace. Nothing from Sun themselves on their own web sites.

1 comment:

ahl said...

John,

It sounds like you've encountered an ugly bug. In the future, I'd suggest you engage with the DTrace community at dtrace-discuss@opensolaris.org as there's a large body of experience to draw on to get problems resolved.