5 Replies
      Latest reply on Sep 27, 2016 12:10 PM by btipling
      btipling Level 1 Level 1 (0 points)

        I'm trying to use a simd float2 inside a struct to send to a shader and it works if the receiving struct in the metal file is typed as a packed_float2. Things do not look nice when I try to use a float2 in my receiving struct in metal. I checked the alignment documentation and noticed that the alignment for a packed_float2 is 4 bytes but it's size is 8 bytes. The simd float2 type's size is 8 bytes. Is the simd float2 type not aligned by 8 bytes? Since the sizes are the same what does it matter if the shader reads it 8 bytes at a time or 4 bytes at a time. Why does this generate garbage?

         

        Also I'm having similar problems with float3 simd types. I have structs with just float3 types, not packed_float3 types, and it works fine - I can send data from my swift program to the shader without problems, but then I have a separate struct with mixed types (float, float2 and float3, the one mentioned above for the float2 problems actually), but everything in this struct needs to be a packed type. I'm trying to follow the documentation as best as possible, but there really isn't much of this that's documented. Is there some useful documentation or resource guide to help me through my alignment issues? Thank you.

        • Re: What is the difference between a float2 and a packed_float2?
          mhorga Level 1 Level 1 (0 points)

          a packed type guarantees you that all those floats are next to each other in adjacent memory locations so the result is always the one you would expect. on the other hand saving one float at a time (not packed) might or might not end up having them in adjacent locations in memory so it is always a safe bet to go with the packed types because they never contain padding (unused bytes) between two stored locations. padding usually happens when the types do not fully align (eg. float3 does leave unused space for an extra float). also, performance-wise, it is always better to do as few reads/writes as possible, so one 8B read is usually preferred to 2 x 4B reads from memory.

          • Re: What is the difference between a float2 and a packed_float2?
            MikeAlpha Level 3 Level 3 (270 points)

            It seems to me that Metal compiler follows standard C structure layout procedures, and therefore, all one needs to know is type size and alignment. It would be easier if you had posted your exact structs, but my guess is that you're doing something like:

             

            struct Example0 {

                 float singleFloat;

                 float2 doubleFloats;

            };

            vs

            struct Example1 {

                 float single;

                 packed_float2 doubleFloats;

            };

             

            Now we have sizeof( float2 ) == sizeof( packed_float2 ) == 8 bytes. But alignment of float2 is 8 != alignment of packed_float2 which is 4. Therefore, Example0 in memory looks like this:

            offset 0: singleFloat

            offset 4: four padding bytes inserted by compiler because float2's alignment is 8

            offset 8: first float of float2

            offset 12: second float of float2

            And sizeof( Example0 ) will be 16 bytes

             

            Example1 is different:

            offset0: singleFloat

            offset4: first float of packed_float2 (because it is 4-aligned)

            offset8 second float of packed_float2

            In this case, sizeof( Example1 ) will be 12 bytes

             

            Hope that helps, post your exact struct layout as well as host language data access otherwise.

            Regards

            Michal

              • Re: What is the difference between a float2 and a packed_float2?
                btipling Level 1 Level 1 (0 points)

                Hi Michael,

                 

                Thank you for your answer. My struct looks like this:

                 

                struct RenderInfo {
                    float zoom;
                    float near;
                    float far;
                    packed_float2 winResolution;
                    packed_float3 cameraRotation;
                    packed_float3 cameraTranslation;
                    bool useCamera;
                };
                
                

                 

                The code that creates the buffer for it and loads data into it is here:

                 

                    func createRenderInfoBuffer(device: MTLDevice) {
                      
                        // Setup memory layout.
                        let floatSize = MemoryLayout<Float>.size
                        let packedFloat2Size = floatSize * 2
                        let packedFloat3Size = floatSize * 3
                        let boolSize = MemoryLayout<Bool>.size
                      
                        var minBufferSize = floatSize * 3 // zoom, far, near
                        minBufferSize += packedFloat2Size // winResolultion
                        minBufferSize += packedFloat3Size * 2 // cameraRotation, cameraPosition
                        minBufferSize += boolSize // useCamera
                        let bufferSize = alignBufferSize(bufferSize: minBufferSize, alignment: floatSize)
                      
                        renderInfoBuffer_ = device.makeBuffer(length: bufferSize, options: [])
                
                
                    }
                

                and

                 

                func setRenderInfo(frameInfo: FrameInfo) {
                        var renderInfo = RenderInfo(
                                zoom: frameInfo.zoom,
                                near: frameInfo.near,
                                far: frameInfo.far,
                                winResolution: frameInfo.viewDimensions,
                                cameraRotation: frameInfo.cameraRotation,
                                cameraTranslation: frameInfo.cameraTranslation,
                                useCamera: frameInfo.useCamera)
                        if (renderInfoBuffer_ != nil) {
                            let pointer = renderInfoBuffer_!.contents()
                           
                            // Memory layout for shader types:
                            let packedFloat2Size = floatSize * 2
                            let packedFloat3Size = floatSize * 3
                            let boolSize = MemoryLayout<Bool>.size
                           
                            memcpy(pointer, &renderInfo.zoom, floatSize)
                            var offset = floatSize
                            memcpy(pointer + offset, &renderInfo.near, floatSize)
                            offset += floatSize
                            memcpy(pointer + offset, &renderInfo.far, floatSize)
                            offset += floatSize
                            memcpy(pointer + offset, &renderInfo.winResolution, packedFloat2Size)
                            offset += packedFloat2Size
                            memcpy(pointer + offset, &renderInfo.cameraRotation, packedFloat3Size)
                            offset += packedFloat3Size
                            memcpy(pointer + offset, &renderInfo.cameraTranslation, packedFloat3Size)
                            offset += packedFloat3Size
                            memcpy(pointer + offset, &renderInfo.useCamera, boolSize)
                
                
                        }
                    }
                

                The code as written works here since I'm using packed floats, but I wanted to try and get it to work without the packed variants.

                 

                Assuming I aligned them all by unpacked float3's which are all 16 bytes alignment, it sounds like you're suggesting I add padding between each memcpy or does Metal already make the alignment adjustments? It's trivial to create a buffer with the size of 16 bytes * 7 and to always advance the pointer offset by 16 bytes on memcpy but that doesn't work. I've tried creating the buffer sized so that it is 16 bytes * 7 and I've tried advancing the buffer by 16 bytes between each copy to leave enough padding between the types, I've also tried it with just creating the buffer at that size and not advancing the buffer, but at each turned no having packed float variants produces garbage. I just wish I understood this better.

                 

                Thank you!

                  • Re: What is the difference between a float2 and a packed_float2?
                    MikeAlpha Level 3 Level 3 (270 points)

                    Hello

                     

                    Your original structure (with float2/float3s instead of packed types would give something like):

                     

                    offset 0: zoom

                    offset 4: near

                    offset 8: far

                    offset 12: padding 4 bytes inserted by compiler because float2 cannot begin at offset 12 - it has to be multiple of 8

                    offset 16: winResolution.x     - here float2 can begin

                    offset 20: winResolution.y

                    offset 24: padding 12 bytes inserted by compiler because float3 cannot begin at offset 24 - it has to be multiple of 16

                    offset 32: cameraRotation.x - here float3 can begin

                    offset 36: cameraRotation.y

                    offset 40: cameraRotation.z

                    offset 44: padding inherent to float3 type

                    offset 48: cameraTranslation.x - no extra padding needed here, 48 is 3 * 16

                    offset 52: cameraTranslation.y

                    offset 56: cameraTranslation.z

                    offset 60: padding inherent to float4 type

                    offset 64: useCamera

                     

                    Lots of padding - 16 bytes more than is required. Simplest solution is to rearrange your structure a bit, so that compiler won't generate that much alignment. So I'd do:

                    struct RenderInfo {

                         float near;

                         float far;

                         float2 winResolution;

                         float3 cameraRotation;

                         float3 cameraTranslation;

                         float zoom;

                         bool useCamera;

                    };


                    Now, if I am not mistaken (got 6 month old daughter, not getting enough sleep, so be careful), that will look like:

                    offset 0: near

                    offset 4: far

                    offset 8: winResolution.x     - no alignment needed, as address is multiple of 8

                    offset 12: winResolution.y

                    offset 16: cameraRotation.x - no aligment needed, as address is multiple of 16

                    offset 20: cameraRotation.y

                    offset 24: cameraRotation.z

                    offset 28: padding inherent to float3 type - sizeof( float3 ) is 32

                    offset 32: cameraTranslation.x - no alignment needed, as address is multiple of 16

                    offset 36: cameraTranslation.y

                    offset 40: cameraTranslation.z

                    offset 44: padding float, just like offset 28

                    offset 48: zoom

                    offset 52: useCamera


                    See: no extra padding, 16 bytes saved. One caveat though - I've had various problems with bool variables on Intel GPUs, ended up using 4 byte ints for boolean values instead. YMMV, but definitely put that bool at the very end.


                    Hope that clears it up a bit. If not, get any C manual (AFAIK classic K&R has nice explanation of this) and read up on struct layout/alignment.

                    Regards

                    Michal