QCOM_shading_rate
Name
QCOM_shading_rate
Name Strings
GL_QCOM_shading_rate
Contributors
Jeff Leger
Robert VanReenen
Contact
Jeff Leger - jleger 'at' qti.qualcomm.com
Status
Complete
Version
Last Modified Date: April 22, 2020
Revision: #2
Number
OpenGL ES Extension #279
Dependencies
OpenGL ES 2.0 is required. This extension is written against OpenGL ES 3.2.
This extension interacts with OVR_Multiview.
This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated
When this extension is advertised, the implementation must also advertise GLSL
extension "GL_EXT_fragment_invocation_density" (documented separately), which
provides new built-in variables that allow fragment shaders to determine the
effective shading rate used for fragment invocations.
Overview
By default, OpenGL runs a fragment shader once for each pixel covered by a
primitive being rasterized. When using multisampling, the outputs of that
fragment shader are broadcast to each covered sample of the fragment's
pixel. When using multisampling, applications can optionally request that
the fragment shader be run once per color sample (e.g., by using the "sample"
qualifier on one or more active fragment shader inputs), or run a minimum
number of times per pixel using SAMPLE_SHADING enable and the
MinSampleShading frequency value.
This extension allows applications to specify fragment shading rates of less
than 1 invocation per pixel. Instead of invoking the fragment shader
once for each covered pixel, the fragment shader can be run once for a
group of adjacent pixels in the framebuffer. The outputs of that fragment
shader invocation are broadcast to each covered samples for all of the pixels
in the group. The initial version of this extension allows for groups of
1, 2, 4, 8, and 16 pixels.
This can be useful for effects like motion volumetric rendering
where a portion of scene is processed at full shading rate and a portion can
be processed at a reduced shading rate, saving power and processing resources.
The requested rate can vary from (finest and default) 1 fragment shader
invocation per pixel to (coarsest) one fragment shader invocation for each
4x4 block of pixels. Implementations are given wide latitude to rasterize
at the requested rate or any other rate that is less coarse.
New Tokens
Accepted by the <pname> parameter of GetIntegerv, GetInterger64v
and GetFloatv:
SHADING_RATE_QCOM 0x96A4
Accepted by the <cap> parameter of Enable, Disable, IsEnabled:
SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM 0x96A5
Allowed in the <rate> parameter in ShadingRateQCOM:
SHADING_RATE_1X1_PIXELS_QCOM 0x96A6
SHADING_RATE_1X2_PIXELS_QCOM 0x96A7
SHADING_RATE_2X1_PIXELS_QCOM 0x96A8
SHADING_RATE_2X2_PIXELS_QCOM 0x96A9
SHADING_RATE_4X2_PIXELS_QCOM 0x96AC
SHADING_RATE_4X4_PIXELS_QCOM 0x96AE
New Procedures and Functions
void ShadingRateQCOM(enum rate);
Modifications to the OpenGL ES 3.2 Specification
Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196
(Modify the function approximating Scale Factor (P), to allow implementations
to scale implicit derivatives based on the shading rate. The scale occurs before
the LOD bias and before LOD clamping).
Modify the definitions of (mu, mv, mw):
| du du |
mu = max | ----- , ----- |
| dx dy |
| dv dv |
mv = max | ----- , ----- |
| dx dy |
| dw dw |
mw = max | ----- , ----- |
| dx dy |
to:
| du du |
mu = max | ---- * sx , ---- * sy |
| dx dy |
| dv dv |
mv = max | ---- * sx , ---- * sy |
| dx dy |
| dw dw |
mw = max | ---- * sx , ---- * sy |
| dx dy |
where (sx, sy) refer to _effective shading rate_ (w', h') specified in
section 13.X.2.
Modify Section 13.4, Multisampling, p. 353
(add to the end of the section)
When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM,
the rasterization will occur at the _effective shading rate_ (Section 13.X) and
will result in fragments covering a <W>x<H> group of pixels.
When multisample rasterization is enabled, the samples of the fragment will consist
of the samples for each of the pixels in the group. The fragment center will be
the center of this group of pixels. Each fragment will include a coverage value
with (W x H x SAMPLES) bits. For example, if GL_SHADING_RATE_QCOM is is 2X2 and the
currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment
will consist of 4 pixels and 16 samples. Similarly, each fragment will have
(W * H * SAMPLES) depth values and associated data.
The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading".
Add new section 13.X before Section 13.5, Points, p. 355
Section 13.X, Shading Rate
By default, each fragment processed by programmable fragment processing
corresponds to a single pixel with a single (x,y) coordinate. When using
multisampling, implementations are permitted to run separate fragment shader
invocations for each sample, but often only run a single invocation for all
samples of the fragment. We will refer to the density of fragment shader
invocations as the _shading rate_.
Applications can use the shading rate to increase the size of fragments to
cover multiple pixels and reduce the amount of fragment shader work.
Applications can also use the shading rate to explicitly control the minimum
number of fragment shader invocations when multisampling.
Section 13.X.1, Shading Rate Control
The shading rate can be controlled with the command
void ShadingRateQCOM(enum rate);
<rate> specifies the value of SHADING_RATE_QCOM, and defines the
_shading rate_. Valid values for <rate> are described in
table X.1
Shading Rate Size
---------------------------- -----
SHADING_RATE_1X1_PIXELS_QCOM 1x1
SHADING_RATE_1X2_PIXELS_QCOM 1x2
SHADING_RATE_2X1_PIXELS_QCOM 2x1
SHADING_RATE_2X2_PIXELS_QCOM 2x2
SHADING_RATE_4X2_PIXELS_QCOM 4x2
SHADING_RATE_4X4_PIXELS_QCOM 4x4
Table X.1: Shading rates accepted by ShadingRateQCOM. An
entry of "<W>x<H>" in the "Size" column indicates that the shading
rate request for fragments with a width and height (in pixels) of <W>
and <H>, respectively.
If the shading rate is specified with ShadingRateCOM, it will apply to all
draw buffers. If the shading rate has not been set , the shading rate
will be SHADING_RATE_1x1_PIXELS_QCOM. In either case, the shading rate will
be further adjusted as described in the following sections.
Section 13.X.2, Effective Shading Rate
The value of SHADING_RATE_QCOM, in combination with other GL state,
is used to derive an adjusted rate or _effective shading rate_, as
as described in this section.
Where possible, implementations should provide an _effective shading rate_
equal to the SHADING_RATE_QCOM. When this is not possible, an adjusted
_effective shading rate_ may be used as described in this section. While
there is no API for querying the _effective shading rate_, the value of this
parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT,
and is referred to in a number of places in the specification. Implementations
may also adjust the shading rate for other reasons not listed here.
Implementations derive the _effective shading rate_ in an implementation-dependent
manner. When rendering to the default framebuffer, the rate may be adjusted
to 1x1. When sample shading (section 13.X.3 Sample Shading) is enabled, the
rate may be adjusted to 1x1. When the fragment shader uses GLSL built-in
input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables
declared with "centroid in", the rate may be adjusted to 1x1. When sample coverage
or sample mask operations are enabled (Section 13.8.3 Multisample Fragment
Operations), the rate may be adjusted to 1x1.
The shading rate may be adjusted to limit the number of samples covered by a
fragment. For example, if the implementation supports a maximum of 16 samples
per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound
framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples
per coarse fragment would be 64. In such an example, an implementation may
adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2).
If the active fragment shader uses any inputs that are qualified with
"sample" (unique values per sample), including the built-ins "gl_SampleID"
and "gl_SamplePosition", or the built-in function "interpolateAtSample",
the shader code is written to expect a separate shader invocation for each
shaded sample. For such fragment shaders, the shading rate is adjusted to
1x1.
If the <W>x<H> value of SHADING_RATE_QCOM is expressed as <w, h> then the
adjusted rate may be any <w', h'> as long as (w' * h') <= (w * h). If
PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further
guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1.
Section 13.X.3 Sample Shading
[[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]]
Modifications to Section 13.8.2, Scissor Test (p. 367)
(add to the end of the section)
When the _effective shading rate_ results in fragments covering more than one pixel,
the scissor tests are performed separately for each pixel in the fragment.
If a pixel covered by a fragment fails the scissor test, that pixel is
treated as though it was not covered by the primitive. If all pixels covered
by a fragment are either not covered by the primitive being rasterized or fail
the scissor test, the fragment is discarded.
Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368)
(modify the last sentence of the the first paragraph to indicate that sample mask operations are performed when shading rate is used, even if multisampling is not enabled which can produce fragments covering more than one pixel where each pixel is considered a "sample")
Change the following sentence from:
"If the value of SAMPLE_BUFFERS is not one, this step is skipped."
to:
"This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM
is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM."
(add to the end of the section)
When the _effective shading rate_ results in fragments covering more than one pixel,
each fragment will generate a composite coverage mask that includes separate
coverage bits for each sample in each pixel covered by the fragment. This
composite coverage mask will be used by the GLSL built-in input variable
gl_SampleMaskIn[] and updated according to the built-in output variable
gl_SampleMask[]. The number of composite coverage mask bits in the built-in
variables and their mapping to a specific pixel and sample number
within that pixel is implementation-defined.
Modify Section 14.1, Fragment Shader Variables (p. 370)
(modify sixth paragraph, p. 371, specifying that the "centroid" location
for multi-pixel fragments is implementation-dependent, and is allowed to
be outside the primitive)
After the following sentence:
"When interpolating variables declared using "centroid in",
the variable is sampled at a location within the pixel covered
by the primitive generating the fragment."
Add the following sentence:
"When the _effective shading rate_ results in fragments covering more than one
pixel, variables declared using "centroid in" are sampled from an
implementation-dependent location within any one of the covered pixels."
Modify Section 15.1, Per-Fragment Operations (p. 378)
(insert a new paragraph after the first paragraph of the section)
When the _effective shading rate_ results in fragments covering multiple pixels,
the operations described in the section are performed independently for
each pixel covered by the fragment. The set of samples covered by each pixel
is determined by extracting the portion of the fragment's composite coverage
that applies to that pixel, as described in section 13.8.3.
Errors
INVALID_ENUM is generated by ShadingRateQCOM if <rate> is not
a valid shading rate from table X.1
New State
Add to table 21.7, Rasterization
Get Value Type Get Command Initial Value Description Sec ————————————- —- ———– ——————————– ————– —— SHADING_RATE_QCOM E GetIntegerV SHADING_RATE_1x1_PIXELS_BIT_QCOM shading rate 13.X.1 PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B IsEnabled FALSE maintain aspect 13.X.2
Interactions with OVR_Multiview
If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views.
Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated
QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel
density which is exposed as a fragment size via the fragment
shader built-in gl_FragSizeEXT. This extension defines an effective
shading rate which is also exposed as a fragment size using the via the
same built-in. If either foveation extension is enabled in conjunction with
this extension, then the value of gl_FragSizeEXT is the component-wise product
of both fragment sizes.
Issues
(1) Should the application-specified rate in ShadingRateCOM() be a "hint" that can be ignored by the driver, or is the driver reqired to honor the requested rate?
RESOLVED: The driver should honor the application-specified rate where
possible, but is allowed to use an adjusted rate due to implementation-
depdendent reasons. The specific rates supported in the hardware and the
specific conditions when the rates needs to be adjusted can differ across
different Adreno GPU families. This extension gives drivers the flexibility to
expose this extension on early hardware that may have restrictions and oddities
while providing applications some (admittedly limited) control over the adjusted
rate that will be selected. The actual rate is always exposed via the fragment
shader built-in.
(2) If the application-specified rate is only a hint, can developers expect that all the shading rates exposed by this extension are supported natively by the HW?
RESOLVED: The initial version of this extension exposes token values for
shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4. Most Adreno GPUs supporting
this extension are expected to support all those rates, although some early HW
may support fewer rates. Note that this extension does not include shading
rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates.
Because a future version of this extension could support those rates,
we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates.
(3) How does this feature work with per-sample shading?
RESOLVED: When using per-sample shading, an application is expecting a
fragment shader to run with a separate invocation per sample. The
shading rate might allow for a "coarsening" that would break such
shaders. Furthermore, some Adreno families may not support this
combination. We've chosen not to explicitly disallow this combination,
while giving implementions the flexibility to use an adjusted 1x1 sample
rate.
(4) How do centroid-sampled variables work with fragments larger than one pixel?
RESOLVED: For single-pixel fragments, attributes declared with
"centroid" are sampled at an implementation-dependent location in the
intersection of the area of the primitive being rasterized and the area
of the pixel that corresponds to the fragment. With multi-pixel
fragments, attributes declared with "centroid" are sampled from an
implementation-dependent location within any of the covered pixels.
This wide allowance for implementation-dependent behavior
enables the extension to be exposed on early Adreno hardware.
(5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with fragments larger than one pixel?
RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[]
specify the input and output coverage bits for a single pixel, where bit 'B'
corresonds to SampleID 'B'. With this extension enabled, these built-ins would
specify the coverage bits for all the samples in all the pixels covered by the
fragment. In this extension, the exact behavior of gl_SampleMaskIn[] and
gl_SampleMask[] is implementation-dependent. For some Adreno GPUs, use of these
built-in variables will cause the driver to use a 1x1 adjusted sample rate.
In other cases, the exact mapping of bits to samples/pixels is implementation-
defined. This wide allowance for implementation-dependent behavior enables the
extension to be exposed on early Adreno hardware.
(6) Are there any restrictions on framebuffer formats used with this feature? For example, are EglImages that may contain multi-plane YUV formats supported?
RESOLVED: It is implementation-dependent whether shading rate is supported for
all formats, or only certain formats. Implementations are allowed to adjust
the _effective sample rate_ based on the format.
(7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord?
RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels,
gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of
the fragment. For non multisample cases this may not be at a pixel center. This may
break shaders that assume pixel center (0.5, 0.5) values for fragcoord.
(8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples?
RESOLVED: No, neither built-in is affected. If the shader usess gl_SamplePosition, the
shader runs at sample-rate causing the shading rate to be ignored. gl_NumSamples is
is the number of samples in the framebuffer object which is unaffected by the value of
shading rate.
(9) Should shading rate affect screen-space derivatives?
RESOLVED: This extension scales the gradients between ajacent fragments by
the effecive shading rate (w', h'). The resulting increase in computed LOD
aligns well with the reduced fragment shader invocations in most use cases;
in other cases the shader author may want to bias the LOD to compensate.
Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth)
are similarly scaled for the same reason.
Revision History
Rev. Date Author Changes
---- -------- -------- ----------------------------------------------
1 03/17/20 jleger Initial draft.
2 04/22/20 jleger Relaxed the <w', h'> guarantee from "w'<=w and
h'<=h" to "w'*h' <= w*h".