AMD_performance_monitor
Name
AMD_performance_monitor
Name Strings
GL_AMD_performance_monitor
Contributors
Dan Ginsburg
Aaftab Munshi
Dave Oldcorn
Maurice Ribble
Jonathan Zarge
Contact
Dan Ginsburg (dan.ginsburg 'at' amd.com)
Status
???
Version
Last Modified Date: 11/29/2007
Number
OpenGL Extension #360
OpenGL ES Extension #50
Dependencies
None
Overview
This extension enables the capture and reporting of performance monitors.
Performance monitors contain groups of counters which hold arbitrary counted 
data.  Typically, the counters hold information on performance-related
counters in the underlying hardware.  The extension is general enough to
allow the implementation to choose which counters to expose and pick the
data type and range of the counters.  The extension also allows counting to 
start and end on arbitrary boundaries during rendering.
Issues
1.  Should this be an EGL or OpenGL/OpenGL ES extension?
    Decision - Make this an OpenGL/OpenGL ES extension
    
    Reason - We would like to expose this extension in both OpenGL and 
    OpenGL ES which makes EGL an unsuitable choice.  Further, support for 
    EGL is not a requirement and there are platforms that support OpenGL ES 
    but not EGL, making it difficult to make this an EGL extension.
    
2.  Should the API support multipassing?
    Decision - No.
    
    Reason - Multipassing should really be left to the application to do.  
    This makes the API unnecessarily complicated.  A major issue is that 
    depending on which counters are to be sampled, the # of passes and which 
    counters get selected in each pass can be difficult to determine.  It is 
    much easier to give a list of counters categorized by groups with 
    specific information on the number of counters that can be selected from 
    each group.
3.  Should we define a 64-bit data type for UNSIGNED_INT64_AMD?
    Decision - No.
    Reason - While counters can be returned as 64-bit unsigned integers, the
    data is passed back to the application inside of a void*.  Therefore,
    there is no need in this extension to define a 64-bit data type (e.g.,
    GLuint64).  It will be up the application to declare a native 64-bit
    unsigned integer and cast the returned data to that type.
New Procedures and Functions
void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
                             uint *groups)
void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
                               int *maxActiveCounters, sizei countersSize, 
                               uint *counters)
void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, sizei *length, 
                                  char *groupString)
void GetPerfMonitorCounterStringAMD(uint group, uint counter, sizei bufSize,
                                    sizei *length, char *counterString)
 
void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
                                  enum pname, void *data)
void GenPerfMonitorsAMD(sizei n, uint *monitors)
void DeletePerfMonitorsAMD(sizei n, uint *monitors)
void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
                                  uint group, int numCounters, 
                                  uint *counterList)
void BeginPerfMonitorAMD(uint monitor)
    
void EndPerfMonitorAMD(uint monitor)
void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, sizei dataSize, 
                                  uint *data, int *bytesWritten)
New Tokens
Accepted by the <pame> parameter of GetPerfMonitorCounterInfoAMD
    COUNTER_TYPE_AMD                           0x8BC0
    COUNTER_RANGE_AMD                          0x8BC1
    
Returned as a valid value in <data> parameter of
GetPerfMonitorCounterInfoAMD if <pname> = COUNTER_TYPE_AMD
    
    UNSIGNED_INT                               0x1405
    FLOAT                                      0x1406
    UNSIGNED_INT64_AMD                         0x8BC2
    PERCENTAGE_AMD                             0x8BC3
    
Accepted by the <pname> parameter of GetPerfMonitorCounterDataAMD
    
    PERFMON_RESULT_AVAILABLE_AMD               0x8BC4
    PERFMON_RESULT_SIZE_AMD                    0x8BC5
    PERFMON_RESULT_AMD                         0x8BC6
Addition to the GL specification
Add a new section called Performance Monitoring
A performance monitor consists of a number of hardware and software counters
that can be sampled by the GPU and reported back to the application.
Performance counters are organized as a single hierarchy where counters are
categorized into groups.  Each group has a list of counters that belong to
the counter and can be sampled, and a maximum number of counters that can be 
sampled.
The command
    void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
                                 uint *groups);
    
returns the number of available groups in <numGroups>, if <numGroups> is
not NULL.  If <groupsSize> is not 0 and <groups> is not NULL, then the list 
of available groups is returned.  The number of entries that will be 
returned in <groups> is determined by <groupsSize>.  If <groupsSize> is 0, 
no information is copied.  Each group is identified by a unique unsigned int 
identifier.
The command
    void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
                                   int *maxActiveCounters, 
                                   sizei countersSize, 
                                   uint *counters);
    
returns the following information.  For each group, it returns the number of 
available counters in <numCounters>, the max number of counters that can be
active at any time in <maxActiveCounters>, and the list of counters in 
<counters>.  The number of entries that can be returned in <counters> is
determined by <countersSize>.  If <countersSize> is 0, no information is
copied. Each counter in a group is identified by a unique unsigned int
identifier.  If <group> does not reference a valid group ID, an 
INVALID_VALUE error is generated.
The command
    void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, 
                                      sizei *length, char *groupString)
    
returns the string that describes the group name identified by <group> in 
<groupString>.  The actual number of characters written to <groupString>,
excluding the null terminator, is returned in <length>.  If <length> is 
NULL, then no length is returned.  The maximum number of characters that
may be written into <groupString>, including the null terminator, is 
specified by <bufSize>.  If <bufSize> is 0 and <groupString> is NULL, the 
number of characters that would be required to hold the group string,
excluding the null terminator, is returned in <length>.  If <group> 
does not reference a valid group ID, an INVALID_VALUE error is generated.
The command
    void GetPerfMonitorCounterStringAMD(uint group, uint counter, 
                                        sizei bufSize, sizei *length, 
                                        char *counterString);
returns the string that describes the counter name identified by <group> 
and <counter> in <counterString>.  The actual number of characters written 
to <counterString>, excluding the null terminator, is returned in <length>.  
If <length> is NULL, then no length is returned.  The maximum number of 
characters that may be written into <counterString>, including the null 
terminator, is specified by <bufSize>.  If <bufSize> is 0 and 
<counterString> is NULL, the number of characters that would be required to 
hold the counter string, excluding the null terminator, is returned in 
<length>.  If <group> does not reference a valid group ID, or <counter> 
does not reference a valid counter within the group ID, an INVALID_VALUE 
error is generated.
   
The command
    void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
                                      enum pname, void *data);
    
returns the following information about a counter.  For a <counter> 
belonging to <group>, we can query the counter type and counter range.  If 
<pname> is COUNTER_TYPE_AMD, then <data> returns the type.  Valid type
values returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
If type value returned is PERCENTAGE_AMD, then this describes a float
value that is in the range [0.0 .. 100.0].  If <pname> is COUNTER_RANGE_AMD,
<data> returns two values representing a minimum and a maximum. The 
counter's type is used to determine the format in which the range values 
are returned.  If <group> does not reference a valid group ID, or <counter> 
does not reference a valid counter within the group ID, an INVALID_VALUE 
error is generated.
The command
    void GenPerfMonitorsAMD(sizei n, uint *monitors)
    
returns a list of monitors.  These monitors can then be used to select 
groups/counters to be sampled, to start multiple monitoring sessions and to 
return counter information sampled by the GPU.  At creation time, the 
performance monitor object has all counters disabled.  The value of the
PERFMON_RESULT_AVAILABLE_AMD, PERFMON_RESULT_AMD, and 
PERFMON_RESULT_SIZE_AMD queries will all initially be 0.
The command
    void DeletePerfMonitorsAMD(sizei n, uint *monitors)
    
is used to delete the list of monitors created by a previous call to 
GenPerfMonitors.  If a monitor ID in the list <monitors> does not 
reference a previously generated performance monitor, an INVALID_VALUE
error is generated.
The command 
    void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
                                      uint group, int numCounters, 
                                      uint *counterList);
    
is used to enable or disable a list of counters from a group to be monitored 
as identified by <monitor>.  The <enable> argument determines whether the
counters should be enabled or disabled.  <group> specifies the group
ID under which counters will be enabled or disabled.  The <numCounters>
argument gives the number of counters to be selected from the list 
<counterList>.  If <monitor> is not a valid monitor created by 
GenPerfMonitorsAMD, then INVALID_VALUE error will be generated.  If <group>
is not a valid group, the INVALID_VALUE error will be generated.  If 
<numCounters> is less than 0, an INVALID_VALUE error will be generated. 
When SelectPerfMonitorCountersAMD is called on a monitor, any outstanding 
results for that monitor become invalidated and the result queries 
PERFMON_RESULT_SIZE_AMD and PERFMON_RESULT_AVAILABLE_AMD are reset to 0.
The command
    void BeginPerfMonitorAMD(uint monitor);
    
is used to start a monitor session.  Note that BeginPerfMonitor calls cannot 
be nested.  In addition, it is quite possible that given the list of groups 
and counters/group enabled for a monitor, it may not be able to sample the 
necessary counters and so the monitor session will fail.  In such a case,
an INVALID_OPERATION error will be generated.
While BeginPerfMonitorAMD does mark the beginning of performance counter
collection, the counters do not begin collecting immediately.  Rather, the
counters begin collection when BeginPerfMonitorAMD is processed by
the hardware.  That is, the API is asynchronous, and performance counter
collection does not begin until the graphics hardware processes the
BeginPerfMonitorAMD command.  
The command
    void EndPerfMonitorAMD(uint monitor);
    
ends a monitor session started by BeginPerfMonitorAMD.  If a performance 
monitor is not currently started, an INVALID_OPERATION error will be 
generated.
Note that there is an implied overhead to collecting performance counters
that may or may not distort performance depending on the implementation.  
For example, some counters may require a pipeline flush thereby causing a
change in the performance of the application.  Further, the frequency at 
which an application samples may distort the accuracy of counters which are 
variant (e.g., non-deterministic based on the input).  While the effects 
of sampling frequency are implementation dependent, general guidance can
be given that sampling at a high frequency may distort both performance
of the application and the accuracy of variant counters.
The command
    void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, 
                                      sizei dataSize, 
                                      uint *data, sizei *bytesWritten);
    
is used to return counter values that have been sampled for a monitor
session.  If <pname> is PERFMON_RESULT_AVAILABLE_AMD, then <data> will
indicate whether the result is available or not.  If <pname> is 
PERFMON_RESULT_SIZE_AMD, <data> will contain actual size of all counter 
results being sampled.  If <pname> is PERFMON_RESULT_AMD, <data> will
contain results.  For each counter of a group that was selected to be 
sampled, the information is returned as group ID, followed by counter ID, 
followed by counter value.  The size of counter value returned will depend 
on the counter value type.  The argument <dataSize> specifies the number of
bytes available in the <data> buffer for writing.  If <bytesWritten> is not 
NULL, it gives the number of bytes written into the <data> buffer.  It is an 
INVALID_OPERATION error for <data> to be NULL.  If <pname> is 
PERFMON_RESULT_AMD and <dataSize> is less than the number of bytes required 
to store the results as reported by a PERFMON_RESULT_SIZE_AMD query, then 
results will be written only up to the number of bytes specified by 
<dataSize>.
If no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for a monitor,
then the result of querying for PERFMON_RESULT_AVAILABLE and 
PERFMON_RESULT_SIZE will be 0.  When SelectPerfMonitorCountersAMD is called
on a monitor, the results stored for the monitor become invalidated and
the value of PERFMON_RESULT_AVAILABLE and PERFMON_RESULT_SIZE queries should
behave as if no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for
the monitor.
Errors
INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is unable
to begin monitoring with the currently selected counters.  
INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is called
when a performance monitor is already active.
INVALID_OPERATION error will be generated if EndPerfMonitorAMD is called
when a performance monitor is not currently started.
INVALID_VALUE error will be generated if the <group> parameter to 
GetPerfMonitorCountersAMD, GetPerfMonitorCounterStringAMD,
GetPerfMonitorCounterStringAMD, GetPerfMonitorCounterInfoAMD, or
SelectPerfMonitorCountersAMD does not reference a valid group ID.
INVALID_VALUE error will be generated if the <counter> parameter to
GetPerfMonitorCounterInfoAMD does not reference a valid counter ID
in the group specified by <group>.
INVALID_VALUE error will be generated if any of the monitor IDs
in the <monitors> parameter to DeletePerfMonitorsAMD do not reference
a valid generated monitor ID.
   
INVALID_VALUE error will be generated if the <monitor> parameter to
SelectPerfMonitorCountersAMD does not reference a monitor created by
GenPerfMonitorsAMD.
INVALID_VALUE error will be generated if the <numCounters> parameter to
SelectPerfMonitorCountersAMD is less than 0.
New State
Sample Usage
typedef struct 
{
        GLuint       *counterList;
        int         numCounters;
        int         maxActiveCounters;
} CounterInfo;
void
getGroupAndCounterList(GLuint **groupsList, int *numGroups, 
                       CounterInfo **counterInfo)
{
    GLint          n;
    GLuint        *groups;
    CounterInfo   *counters;
    glGetPerfMonitorGroupsAMD(&n, 0, NULL);
    groups = (GLuint*) malloc(n * sizeof(GLuint));
    glGetPerfMonitorGroupsAMD(NULL, n, groups);
    *numGroups = n;
    *groupsList = groups;
    counters = (CounterInfo*) malloc(sizeof(CounterInfo) * n);
    for (int i = 0 ; i < n; i++ )
    {
        glGetPerfMonitorCountersAMD(groups[i], &counters[i].numCounters,
                                 &counters[i].maxActiveCounters, 0, NULL);
        counters[i].counterList = (GLuint*)malloc(counters[i].numCounters * 
                                                  sizeof(int));
        glGetPerfMonitorCountersAMD(groups[i], NULL, NULL,
                                    counters[i].numCounters, 
                                    counters[i].counterList);
    }
    *counterInfo = counters;
}
static int  countersInitialized = 0;
    
int
getCounterByName(char *groupName, char *counterName, GLuint *groupID, 
                 GLuint *counterID)
{
    int          numGroups;
    GLuint       *groups;
    CounterInfo  *counters;
    int          i = 0;
    if (!countersInitialized)
    {
        getGroupAndCounterList(&groups, &numGroups, &counters);
        countersInitialized = 1;
    }
    for ( i = 0; i < numGroups; i++ )
    {
       char curGroupName[256];
       glGetPerfMonitorGroupStringAMD(groups[i], 256, NULL, curGroupName);
       if (strcmp(groupName, curGroupName) == 0)
       {
           *groupID = groups[i];
           break;
       }
    }
    if ( i == numGroups )
        return -1;           // error - could not find the group name
    for ( int j = 0; j < counters[i].numCounters; j++ )
    {
        char curCounterName[256];
        
        glGetPerfMonitorCounterStringAMD(groups[i],
                                         counters[i].counterList[j], 
                                         256, NULL, curCounterName);
        if (strcmp(counterName, curCounterName) == 0)
        {
            *counterID = counters[i].counterList[j];
            return 0;
        }
    }
    return -1;           // error - could not find the counter name
}
void
drawFrameWithCounters(void)
{
    GLuint group[2];
    GLuint counter[2];
    GLuint monitor;
    GLuint *counterData;
    // Get group/counter IDs by name.  Note that normally the
    // counter and group names need to be queried for because
    // each implementation of this extension on different hardware
    // could define different names and groups.  This is just provided
    // to demonstrate the API.
    getCounterByName("HW", "Hardware Busy", &group[0],
                     &counter[0]);
    getCounterByName("API", "Draw Calls", &group[1], 
                     &counter[1]);
            
    // create perf monitor ID
    glGenPerfMonitorsAMD(1, &monitor);
    // enable the counters
    glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[0], 1,
                                   &counter[0]);
    glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[1], 1, 
                                   &counter[1]);
    glBeginPerfMonitorAMD(monitor);
    // RENDER FRAME HERE
    // ...
    
    glEndPerfMonitorAMD(monitor);
    // read the counters
    GLint resultSize;
    glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_SIZE_AMD, 
                                   sizeof(GLint), &resultSize, NULL);
    counterData = (GLuint*) malloc(resultSize);
    GLsizei bytesWritten;
    glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_AMD,  
                                   resultSize, counterData, &bytesWritten);
    // display or log counter info
    GLsizei wordCount = 0;
    while ( (4 * wordCount) < bytesWritten )
    {
        GLuint groupId = counterData[wordCount];
        GLuint counterId = counterData[wordCount + 1];
        // Determine the counter type
        GLuint counterType;
        glGetPerfMonitorCounterInfoAMD(groupId, counterId, 
                                       GL_COUNTER_TYPE_AMD, &counterType);
 
        if ( counterType == GL_UNSIGNED_INT64_AMD )
        {
            unsigned __int64 counterResult = 
                       *(unsigned __int64*)(&counterData[wordCount + 2]);
            // Print counter result
            wordCount += 4;
        }
        else if ( counterType == GL_FLOAT )
        {
            float counterResult = *(float*)(&counterData[wordCount + 2]);
            // Print counter result
            wordCount += 3;
        } 
        // else if ( ... ) check for other counter types 
        //   (GL_UNSIGNED_INT and GL_PERCENTAGE_AMD)
    }
}
Revision History 11/29/2007 - dginsburg + Clarified the default state of a performance monitor object on creation
11/09/2007 - dginsbur
   + Clarify what happens if SelectPerfMonitorCountersAMD is called on
     a monitor with outstanding query results.
   + Rename counterSize to countersSize
   + Remove some ';' typos
06/13/2007 - dginsbur
   + Add language on the asynchronous nature of the API and 
     counter accuracy/performance distortion.
   + Add myself as the contact
   + Remove INVALID_OPERATION error when countersList is NULL
   + Clarify 64-bit issue
   + Make PERCENTAGE_AMD counters float rather than uint
   + Clarify accuracy distortion on variant counters only
   + Tweak to overview language
06/09/2007 - dginsbur
   + Fill in errors section and make many more errors explicit
   + Fix the example code so it compiles
06/08/2007 - dginsbur
   + Modified GetPerfMonitorGroupString and GetPerfMonitorCounterString to
     be more client/server friendly.  
   + Modified example.
   + Renamed parameters/variables to follow GL conventions.
   + Modified several 'int' param types to 'sizei'
   + Modifid counters type from 'int' to 'uint'
   + Renamed argument 'cb' and 'cbret'
   + Better documented GetPerfMonitorCounterData 
   + Add AMD adornment in many places that were missing it
 
06/07/2007 - dginsbur
   + Cleanup formatting, remove tabs, make fit in proper page width
   + Add FLOAT and UNSIGNED_INT to list of COUNTER_TYPEs
   + Fix some bugs in the example code
   + Rewrite introduction
   + Clarified Issue 1 reasoning
   + Added Issue 3 regarding use of 64-bit data types
   + Added revision history
03/21/2007 - Initial version written.  Written by amunshi.