NV_shader_atomic_fp16_vector

Name

NV_shader_atomic_fp16_vector

Name Strings

GL_NV_shader_atomic_fp16_vector

Contact

Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)

Contributors

Pat Brown, NVIDIA
Mathias Heyer, NVIDIA

Status

Shipping

Version

Last Modified Date:         February 4, 2015
NVIDIA Revision:            3

Number

OpenGL Extension #474
OpenGL ES Extension #261

Dependencies

This extension is written against the OpenGL 4.3 (Compatibility Profile)
Specification.

This extension is written against version 4.30 of the OpenGL Shading
Language Specification.

This extension interacts with NV_shader_buffer_store and NV_gpu_shader5.

This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and
NV_gpu_program5_mem_extended.

This extension requires NV_gpu_shader5.

This extension interacts with NV_shader_storage_buffer_object.

This extension interacts with NV_compute_program5.

This extension interacts with NV_image_formats.

This extension interacts with OES_shader_image_atomic.

Overview

This extension provides GLSL built-in functions and assembly opcodes
allowing shaders to perform a limited set of atomic read-modify-write
operations to buffer or texture memory with 16-bit floating point vector
surface formats.

New Procedures and Functions

None.

New Tokens

None.

Additions to the AGL/GLX/WGL Specifications

None.

GLX Protocol

None.

Modifications to the OpenGL Shading Language Specification, Version 4.30

Including the following line in a shader can be used to control the
language features described in this extension:

  #extension GL_NV_shader_atomic_fp16_vector : <behavior>

where <behavior> is as specified in section 3.3.

New preprocessor #defines are added to the OpenGL Shading Language:

  #define GL_NV_shader_atomic_fp16_vector         1

Modify Section 8.11, Atomic Memory Functions (p. 163)

Add before the table of functions:

Some atomic memory operations are supported on two- and four-component
vectors with 16-bit floating-point components.

Add new functions to the table

    // Computes a new value per-component using the specified operation.
    // Atomicity is only guaranteed on a per-component basis.
    f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data);
    f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data);
    f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data);
    f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data);
    f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data);
    f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data);
    f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data);
    f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data);


Modify Section 8.12, Image Functions (p. 164)

Add before the table of functions:

Some atomic memory operations are supported on two- and four-component
vectors with 16-bit floating-point components, for images with format
qualifiers of <rg16f> and <rgba16f>.

Add new functions to the table:

    // Computes a new value per-component using the specified operation
    // Atomicity is only guaranteed on a per-component basis.
    f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data);
    f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data);
    f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data);
    f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data);
    f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data);
    f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data);
    f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data);
    f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data);

Dependencies on OES_shader_image_atomic

If implemented in OpenGL ES and OES_shader_image_atomic is not
supported, do not introduce additional imageAtomic* functions.

Dependencies on NV_image_formats

If implemented in OpenGL ES and NV_image_formats is not
supported, remove references to two-component images of format
<rg16f>.

Dependencies on NV_shader_buffer_store and NV_gpu_shader5 If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following functions should be added to the "Section 8.Y, Shader Memory Functions" language in the NV_shader_buffer_store specification:

  // Computes a new value per-component using the specified operation
  // Atomicity is only guaranteed on a per-component basis.
  f16vec2 atomicAdd(f16vec2 *address, f16vec2 data);
  f16vec4 atomicAdd(f16vec4 *address, f16vec4 data);
  f16vec2 atomicMin(f16vec2 *address, f16vec2 data);
  f16vec4 atomicMin(f16vec4 *address, f16vec4 data);
  f16vec2 atomicMax(f16vec2 *address, f16vec2 data);
  f16vec4 atomicMax(f16vec4 *address, f16vec4 data);
  f16vec2 atomicExchange(f16vec2 *address, f16vec2 data);
  f16vec4 atomicExchange(f16vec4 *address, f16vec4 data);

Dependencies on NV_gpu_program5, NV_shader_buffer_store, and NV_gpu_program5_mem_extended

If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector"
is specified in an assembly program, "F16X2" and "F16X4" should be allowed
as storage modifiers to the ATOM instruction for the atomic operations
"ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four
fp16 values independently. Atomicity is only guaranteed on a per-component
basis.

(Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension,
as extended by NV_gpu_program5:)

  + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector)

  If a program specifies the "NV_shader_atomic_fp16_vector" option, it may
  use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to
  perform atomic floating-point add or exchange operations.

(Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:)

  atomic     storage
  modifier   modifiers            operation
  --------   ------------------   --------------------------------------
   ADD       U32, S32, U64,       compute a sum
             F16X2, F16X4
   MIN       U32, S32,            compute minimum
             F16X2, F16X4
   MAX       U32, S32,            compute maximum
             F16X2, F16X4
   EXCH      U32, S32, F32        exchange memory with operand
             F16X2, F16X4
   ...

Dependencies on EXT_shader_image_load_store and NV_gpu_program5

If EXT_shader_image_load_store and NV_gpu_program5 are supported and
"OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program,
"F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM
instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH".
These operate on each of the two or four fp16 values independently.
Atomicity is only guaranteed on a per-component basis.

(Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on
NV_gpu_program5" portion of the EXT_shader_image_load specification)

  atomic     storage
  modifier   modifiers       operation
  --------   -------------   --------------------------------------
   ADD       U32, S32,       compute a sum
             F16X2, F16X4
   MIN       U32, S32,       compute minimum
             F16X2, F16X4
   MAX       U32, S32,       compute maximum
             F16X2, F16X4
   EXCH      U32, S32, F32   exchange memory with operand
             F16X2, F16X4
   ...

Dependencies on NV_compute_program5

If NV_compute_program5 is supported and "OPTION
NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
and "F16X4" should be allowed as storage modifiers to the ATOMB instruction
for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
each of the two or four fp16 values independently. Atomicity is only
guaranteed on a per-component basis.

(Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on
NV_gpu_program5" portion of the NV_shader_storage_buffer_object
specification)

  atomic     storage
  modifier   modifiers          operation
  --------   -------------      --------------------------------------
   ADD       U32, S32, U64      compute a sum
             F32, F16X2, F16X4
   MIN       U32, S32,          compute minimum
             F16X2, F16X4
   MAX       U32, S32,          compute maximum
             F16X2, F16X4
   EXCH      U32, S32, F32      exchange memory with operand
             F16X2, F16X4
   ...

Dependencies on NV_shader_storage_buffer_object

If NV_shader_storage_buffer_object is supported and "OPTION
NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
and "F16X4" should be allowed as storage modifiers to the ATOMS instruction
for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
each of the two or four fp16 values independently. Atomicity is only
guaranteed on a per-component basis.

(Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on
NV_gpu_program5" portion of the NV_compute_program5 specification)

  atomic     storage
  modifier   modifiers          operation
  --------   -------------      --------------------------------------
   ADD       U32, S32, U64      compute a sum
             F32, F16X2, F16X4
   MIN       U32, S32,          compute minimum
             F16X2, F16X4
   MAX       U32, S32,          compute maximum
             F16X2, F16X4
   EXCH      U32, S32, F32      exchange memory with operand
             F16X2, F16X4
   ...

Errors

None.

New State

None.

New Implementation Dependent State

None.

Issues

(1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only
modifying some of the components?

RESOLVED: No. If an app really cares to do this, they could inject
"special" values in those components that cause the atomic to have no
effect for that component (e.g. add zero, max with -infinity, etc).  This
would work for atomicAdd, atomicMin, and atomicMax, but not for
atomicExchange.

(2) Are these vector atomics guaranteed to update all components of the
vector atomically?

RESOLVED:  No.  The spec only guarantees that individual components of a
vector be updated atomically.  The initial implementation of this
extension will only atomically update pairs of components.  For many of
the algorithms supported by this extension (computing component-wise sums,
minimums, or maximums of multi-component vectors), it is not necessary to
update all components in a vector as a single unit.

(3) What support should we provide for four-component vectors?

RESOLVED:  All of image, global, buffer, and shared memory atomic
operations will fully support two- and four-component variants.  While one
might emulate some four-component atomic operations using pairs of
two-component operations, we choose to support four-component operations
universally.  Supporting atomics on four-component vectors seems useful,
as it supports computing sums, minimums, or maximums on RGBA color values
and other data with more than two components.

Revision History

Revision 2
- Add OpenGL ES interactions
Revision 1
- Internal revisions.