Issue 8054 - Operation duration logging
Summary: Operation duration logging
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: 2.5.0
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-09 03:53 UTC by ebackes@symas.com
Modified: 2020-10-14 21:05 UTC (History)
0 users

See Also:


Attachments
master-operation-duration-logging.patch (8.29 KB, patch)
2015-02-09 03:56 UTC, ebackes@symas.com
Details
RE24-operation-duration-logging.patch (8.72 KB, patch)
2015-02-09 03:56 UTC, ebackes@symas.com
Details

Note You need to log in before you can comment on or make changes to this issue.
Description ebackes@symas.com 2015-02-09 03:53:47 UTC
Full_Name: Emily Backes
Version: master, 2.4
OS: 
URL: 
Submission from: (NULL) (50.113.67.84)


Here is a patch to add duration logging for operations that appears in the
standard result messages of statslog.  It's a reasonably small change with
negligible performance implications except under pathological benchmarking
conditions.  It does change the quasi-machine-parsable statslog format, however,
so it might not be appropriate for general 2.4 release.  Example line:

54d8298c conn=1000 op=1 SEARCH RESULT tag=101 err=0 duration=0.170ms nentries=1
text=

As written, it uses a separate field in struct Operation and is entirely
independent of the o_tincr functionality.  Merging those has further portability
problems, and is out of the scope of this feature.  As a result, the overhead is
two additional gettimeofday() calls per operation.  The change is wrapped in
#ifdef HAVE_GETTIMEOFDAY, so architectures lacking that will see no change.

I have patches versus both the master branch (which likely covers 2.5), and
OPENLDAP_REL_ENG_2_4.  They'll be attached momentarily, outside jitterbug's web
interface.

%ThThese modifications are the sole work of myself (Emily Backes) on behalf of
Symas Corporation based on existing log features of OpenLDAP.  Therefore, these
modifications are:

Copyright 2014-2015 Symas Corporation
Redistribution and use in source and binary forms, with or without modification,
are permitted only as authorized by the OpenLDAP Public License.
Comment 1 ebackes@symas.com 2015-02-09 03:56:17 UTC
Here are the patches.

-- 
Emily Backes
Symas Corporation
ebackes@symas.com
Comment 2 Michael Ströder 2015-02-09 08:26:33 UTC
ebackes@symas.com wrote:
> Here is a patch to add duration logging for operations that appears in the
> standard result messages of statslog.  It's a reasonably small change with
> negligible performance implications except under pathological benchmarking
> conditions.  It does change the quasi-machine-parsable statslog format, however,
> so it might not be appropriate for general 2.4 release.  Example line:
> 
> 54d8298c conn=1000 op=1 SEARCH RESULT tag=101 err=0 duration=0.170ms nentries=1
> text=

This is a very important feature. Highly appreciated!

I'd love to see this in RE24. Yes, I know. No more features. :-(

Would it be possible and make sense to have this configurable per database?

Ciao, Michael.

Comment 3 Howard Chu 2015-02-09 09:59:12 UTC
michael@stroeder.com wrote:
> ebackes@symas.com wrote:
>> Here is a patch to add duration logging for operations that appears in the
>> standard result messages of statslog.  It's a reasonably small change with
>> negligible performance implications except under pathological benchmarking
>> conditions.  It does change the quasi-machine-parsable statslog format, however,
>> so it might not be appropriate for general 2.4 release.  Example line:
>>
>> 54d8298c conn=1000 op=1 SEARCH RESULT tag=101 err=0 duration=0.170ms nentries=1
>> text=
>
> This is a very important feature. Highly appreciated!
>
> I'd love to see this in RE24. Yes, I know. No more features. :-(
>
> Would it be possible and make sense to have this configurable per database?

No it would not make sense. There are ops like Abandon and Unbind which 
have no associated database.

Adding variation here is creating gratuitous complexity for no benefit.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Michael Ströder 2015-02-09 10:26:19 UTC
hyc@symas.com wrote:
> michael@stroeder.com wrote:
>> ebackes@symas.com wrote:
>>> Here is a patch to add duration logging for operations that appears in the
>>> standard result messages of statslog.  It's a reasonably small change with
>>> negligible performance implications except under pathological benchmarking
>>> conditions.  It does change the quasi-machine-parsable statslog format, however,
>>> so it might not be appropriate for general 2.4 release.  Example line:
>>>
>>> 54d8298c conn=1000 op=1 SEARCH RESULT tag=101 err=0 duration=0.170ms nentries=1
>>> text=
>>
>> This is a very important feature. Highly appreciated!
>>
>> I'd love to see this in RE24. Yes, I know. No more features. :-(
>>
>> Would it be possible and make sense to have this configurable per database?
> 
> No it would not make sense. There are ops like Abandon and Unbind which 
> have no associated database.

Aren't Abandon and Unbind associated by authz-DN?
Or how is this solved in slapo-accesslog?

Ciao, Michael.


Comment 5 Hallvard Furuseth 2015-02-11 11:16:50 UTC
Cool feture, but please use fewer characteres in the already
bloated log.  How about "time=<integer>us" or even omit the "us"?

When duration is of interest, what's usually interesting is probably
which operations stand out as slower, not the actual speed.  When
duration is not of interest, then it's noise to keep small and boring.

And we should document at least the default log format (Statslog).

==

About the code, ignoring the above suggestion for now:


Once upon a time, floating-point was slow. Today, that extra code
to avoid it is a pessimization at least on my Linux/x86.  Try this:

#define TIMEDIFF_MS(a,b) \
     (((a)->tv_sec-(b)->tv_sec)*1000 + ((a)->tv_usec-(b)->tv_usec)/1000.0)

OTOH floating-point printf is compex, you save a bit by doing
msec = (int)t; printf("%d.%03d", msec, (int)((t-msec)*1000));
instead of printf("%4.3f", t);


%d format requires 'int msec;' and (int)(duration.tv_usec % 1000).


Since timersub() is use the same way each time, a macro could
instead compute duration and msec, and maybe declare them too:
     DURATION_DECLARE(duration, msec, op->o_hr_time);

For that matter, maybe we should get rid of the #ifdef-duplicated
Statslog() statements and put the ugliness in macros instead.
Like this, I think - an untested draft:

     DURATION_DECLARE(duration, msec, op->o_hr_time);
     StatslogDuration( LDAP_DEBUG_STATS,
         "%s DISCONNECT tag=%lu err=%d "DURATION_LOG_FMT" text=%s\n",
         op->o_log_prefix, rs->sr_tag, rs->sr_err,
         DURATION_LOG_ARG(duration, msec) /* no comma after this */
         rs->sr_text ? rs->sr_text : "", 0 );

#ifdef HAVE_GETTIMEOFDAY
#define DURATION_DECLARE(..blah..) <the duration code before Statslog>
#define DURATION_LOG_FMT "duration=%d.%03dms "
/* This is passed to StatslogDuration() without a comma before next arg.
  * So it is not an extra arg to StatslogDuration(), but it gives two
  * extra 'int' args to whatever StatslogDuration() passes it args on to.
  */
#define DURATION_LOG_ARG(duration,msec) (msec), 
(int)((duration).tv_usec%1000),
#define StatslogDuration <like Statslog, but using Log7 instead of Log5>
#else
#define DURATION_DECLARE(..blah..)
#define DURATION_LOG_FMT ""
#define DURATION_LOG_ARG(duration, msec)
#define StatslogDuration Statslog
#endif


Comment 6 Michael Ströder 2015-02-11 11:38:50 UTC
h.b.furuseth@usit.uio.no wrote:
> Cool feture, but please use fewer characteres in the already
> bloated log.  How about "time=<integer>us" or even omit the "us"?

+1

But 'time' sounds a bit general.
How about 'tdu' or even 'td' as abbreviation for "time duration"?

Ciao, Michael.

Comment 7 Hallvard Furuseth 2015-02-11 17:33:45 UTC
On 11. feb. 2015 12:38, Michael Ströder wrote:
> h.b.furuseth@usit.uio.no wrote:
>> Cool feture, but please use fewer characteres in the already
>> bloated log.  How about "time=<integer>us" or even omit the "us"?
>
> +1
>
> But 'time' sounds a bit general.
> How about 'tdu' or even 'td' as abbreviation for "time duration"?

I don't think so. You can't see from that what it means unless you
already know, so you have to look it up just to see if you're
interested.  'time' you can decide to look up if you are interested.

-- 
Hallvard

Comment 8 Clément OUDOT 2015-02-24 08:10:55 UTC
2015-02-11 18:33 GMT+01:00  <h.b.furuseth@usit.uio.no>:
> On 11. feb. 2015 12:38, Michael Ströder wrote:
>> h.b.furuseth@usit.uio.no wrote:
>>> Cool feture, but please use fewer characteres in the already
>>> bloated log.  How about "time=<integer>us" or even omit the "us"?
>>
>> +1
>>
>> But 'time' sounds a bit general.
>> How about 'tdu' or even 'td' as abbreviation for "time duration"?
>
> I don't think so. You can't see from that what it means unless you
> already know, so you have to look it up just to see if you're
> interested.  'time' you can decide to look up if you are interested.
>

Seems other LDAP servers have choosen 'etime' (see
http://ludopoitou.com/2015/02/24/about-auditing-ldap-operations/). Why
not try to use the same word?


Clément OUDOT.

Comment 9 Howard Chu 2015-08-05 16:38:44 UTC
clem.oudot@gmail.com wrote:
> 2015-02-11 18:33 GMT+01:00  <h.b.furuseth@usit.uio.no>:
>> On 11. feb. 2015 12:38, Michael Str=C3=B6der wrote:
>>> h.b.furuseth@usit.uio.no wrote:
>>>> Cool feture, but please use fewer characteres in the already
>>>> bloated log.  How about "time=3D<integer>us" or even omit the "us"?
>>>
>>> +1
>>>
>>> But 'time' sounds a bit general.
>>> How about 'tdu' or even 'td' as abbreviation for "time duration"?
>>
>> I don't think so. You can't see from that what it means unless you
>> already know, so you have to look it up just to see if you're
>> interested.  'time' you can decide to look up if you are interested.
>>
>
> Seems other LDAP servers have choosen 'etime' (see
> http://ludopoitou.com/2015/02/24/about-auditing-ldap-operations/). Why
> not try to use the same word?

etime is fine with me. The other problem here is that it's counting duration 
from when slapd received the request (which is fine) but usually the op gets 
queued immediately after. For this etime to be useful we need to know both the 
amount of time spent queued, and the amount of time spent actually executing. 
qtime and etime?

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 10 Hallvard Furuseth 2015-08-10 12:41:55 UTC
Den 05. aug. 2015 18:38, skrev hyc@symas.com:
>clem.oudot@gmail.com wrote:
>> Seems other LDAP servers have choosen 'etime' (see
>> http://ludopoitou.com/2015/02/24/about-auditing-ldap-operations/). Why
>> not try to use the same word?
>
> etime is fine with me. The other problem here is that it's counting duration
> from when slapd received the request (which is fine) but usually the op gets
> queued immediately after. For this etime to be useful we need to know both the
> amount of time spent queued, and the amount of time spent actually executing.
> qtime and etime?

It's total time which is interesting when looking for hot spots
which need attention.  So I suggest that plus either one of the
others, or percentage of the total spent doing one of the others.

-- 
Hallvard

Comment 11 Hallvard Furuseth 2015-08-10 13:19:04 UTC
Den 10. aug. 2015 14:41, skrev Hallvard Breien Furuseth:
> It's total time which is interesting when looking for hot spots
> which need attention.  So I suggest that plus either one of the
> others, or percentage of the total spent doing one of the others.

Sorry - total+one of the other was silly probably, it just
leaves us calculating the missing one when looking for that
one. Total + percentage of one of the others makes sense,
or (etime, qtime) since a high value still will stand out.

-- 
Hallvard

Comment 12 Howard Chu 2015-08-10 16:00:39 UTC
Hallvard Breien Furuseth wrote:
> Den 10. aug. 2015 14:41, skrev Hallvard Breien Furuseth:
>> It's total time which is interesting when looking for hot spots
>> which need attention.  So I suggest that plus either one of the
>> others, or percentage of the total spent doing one of the others.
>
> Sorry - total+one of the other was silly probably, it just
> leaves us calculating the missing one when looking for that
> one. Total + percentage of one of the others makes sense,
> or (etime, qtime) since a high value still will stand out.
>
Yes, I prefer explicit (etime, qtime) - anyone who needs the total can add it 
up themselves.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 13 OpenLDAP project 2015-10-31 11:59:29 UTC
added in master, with cleanups
Comment 14 Howard Chu 2015-10-31 11:59:29 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Enhancements