-
Re: What is the difference between a float2 and a packed_float2?
mhorga Sep 26, 2016 12:15 AM (in response to btipling)a packed type guarantees you that all those floats are next to each other in adjacent memory locations so the result is always the one you would expect. on the other hand saving one float at a time (not packed) might or might not end up having them in adjacent locations in memory so it is always a safe bet to go with the packed types because they never contain padding (unused bytes) between two stored locations. padding usually happens when the types do not fully align (eg. float3 does leave unused space for an extra float). also, performance-wise, it is always better to do as few reads/writes as possible, so one 8B read is usually preferred to 2 x 4B reads from memory.
-
Re: What is the difference between a float2 and a packed_float2?
MikeAlpha Sep 26, 2016 1:33 AM (in response to btipling)It seems to me that Metal compiler follows standard C structure layout procedures, and therefore, all one needs to know is type size and alignment. It would be easier if you had posted your exact structs, but my guess is that you're doing something like:
struct Example0 {
float singleFloat;
float2 doubleFloats;
};
vs
struct Example1 {
float single;
packed_float2 doubleFloats;
};
Now we have sizeof( float2 ) == sizeof( packed_float2 ) == 8 bytes. But alignment of float2 is 8 != alignment of packed_float2 which is 4. Therefore, Example0 in memory looks like this:
offset 0: singleFloat
offset 4: four padding bytes inserted by compiler because float2's alignment is 8
offset 8: first float of float2
offset 12: second float of float2
And sizeof( Example0 ) will be 16 bytes
Example1 is different:
offset0: singleFloat
offset4: first float of packed_float2 (because it is 4-aligned)
offset8 second float of packed_float2
In this case, sizeof( Example1 ) will be 12 bytes
Hope that helps, post your exact struct layout as well as host language data access otherwise.
Regards
Michal
-
Re: What is the difference between a float2 and a packed_float2?
btipling Sep 26, 2016 2:12 PM (in response to MikeAlpha)Hi Michael,
Thank you for your answer. My struct looks like this:
struct RenderInfo { float zoom; float near; float far; packed_float2 winResolution; packed_float3 cameraRotation; packed_float3 cameraTranslation; bool useCamera; };
The code that creates the buffer for it and loads data into it is here:
func createRenderInfoBuffer(device: MTLDevice) { // Setup memory layout. let floatSize = MemoryLayout<Float>.size let packedFloat2Size = floatSize * 2 let packedFloat3Size = floatSize * 3 let boolSize = MemoryLayout<Bool>.size var minBufferSize = floatSize * 3 // zoom, far, near minBufferSize += packedFloat2Size // winResolultion minBufferSize += packedFloat3Size * 2 // cameraRotation, cameraPosition minBufferSize += boolSize // useCamera let bufferSize = alignBufferSize(bufferSize: minBufferSize, alignment: floatSize) renderInfoBuffer_ = device.makeBuffer(length: bufferSize, options: []) }
and
func setRenderInfo(frameInfo: FrameInfo) { var renderInfo = RenderInfo( zoom: frameInfo.zoom, near: frameInfo.near, far: frameInfo.far, winResolution: frameInfo.viewDimensions, cameraRotation: frameInfo.cameraRotation, cameraTranslation: frameInfo.cameraTranslation, useCamera: frameInfo.useCamera) if (renderInfoBuffer_ != nil) { let pointer = renderInfoBuffer_!.contents() // Memory layout for shader types: let packedFloat2Size = floatSize * 2 let packedFloat3Size = floatSize * 3 let boolSize = MemoryLayout<Bool>.size memcpy(pointer, &renderInfo.zoom, floatSize) var offset = floatSize memcpy(pointer + offset, &renderInfo.near, floatSize) offset += floatSize memcpy(pointer + offset, &renderInfo.far, floatSize) offset += floatSize memcpy(pointer + offset, &renderInfo.winResolution, packedFloat2Size) offset += packedFloat2Size memcpy(pointer + offset, &renderInfo.cameraRotation, packedFloat3Size) offset += packedFloat3Size memcpy(pointer + offset, &renderInfo.cameraTranslation, packedFloat3Size) offset += packedFloat3Size memcpy(pointer + offset, &renderInfo.useCamera, boolSize) } }
The code as written works here since I'm using packed floats, but I wanted to try and get it to work without the packed variants.
Assuming I aligned them all by unpacked float3's which are all 16 bytes alignment, it sounds like you're suggesting I add padding between each memcpy or does Metal already make the alignment adjustments? It's trivial to create a buffer with the size of 16 bytes * 7 and to always advance the pointer offset by 16 bytes on memcpy but that doesn't work. I've tried creating the buffer sized so that it is 16 bytes * 7 and I've tried advancing the buffer by 16 bytes between each copy to leave enough padding between the types, I've also tried it with just creating the buffer at that size and not advancing the buffer, but at each turned no having packed float variants produces garbage. I just wish I understood this better.
Thank you!
-
Re: What is the difference between a float2 and a packed_float2?
MikeAlpha Sep 27, 2016 4:09 AM (in response to btipling)Hello
Your original structure (with float2/float3s instead of packed types would give something like):
offset 0: zoom
offset 4: near
offset 8: far
offset 12: padding 4 bytes inserted by compiler because float2 cannot begin at offset 12 - it has to be multiple of 8
offset 16: winResolution.x - here float2 can begin
offset 20: winResolution.y
offset 24: padding 12 bytes inserted by compiler because float3 cannot begin at offset 24 - it has to be multiple of 16
offset 32: cameraRotation.x - here float3 can begin
offset 36: cameraRotation.y
offset 40: cameraRotation.z
offset 44: padding inherent to float3 type
offset 48: cameraTranslation.x - no extra padding needed here, 48 is 3 * 16
offset 52: cameraTranslation.y
offset 56: cameraTranslation.z
offset 60: padding inherent to float4 type
offset 64: useCamera
Lots of padding - 16 bytes more than is required. Simplest solution is to rearrange your structure a bit, so that compiler won't generate that much alignment. So I'd do:
struct RenderInfo {
float near;
float far;
float2 winResolution;
float3 cameraRotation;
float3 cameraTranslation;
float zoom;
bool useCamera;
};
Now, if I am not mistaken (got 6 month old daughter, not getting enough sleep, so be careful), that will look like:
offset 0: near
offset 4: far
offset 8: winResolution.x - no alignment needed, as address is multiple of 8
offset 12: winResolution.y
offset 16: cameraRotation.x - no aligment needed, as address is multiple of 16
offset 20: cameraRotation.y
offset 24: cameraRotation.z
offset 28: padding inherent to float3 type - sizeof( float3 ) is 32
offset 32: cameraTranslation.x - no alignment needed, as address is multiple of 16
offset 36: cameraTranslation.y
offset 40: cameraTranslation.z
offset 44: padding float, just like offset 28
offset 48: zoom
offset 52: useCamera
See: no extra padding, 16 bytes saved. One caveat though - I've had various problems with bool variables on Intel GPUs, ended up using 4 byte ints for boolean values instead. YMMV, but definitely put that bool at the very end.
Hope that clears it up a bit. If not, get any C manual (AFAIK classic K&R has nice explanation of this) and read up on struct layout/alignment.
Regards
Michal
-
Re: What is the difference between a float2 and a packed_float2?
btipling Sep 27, 2016 12:10 PM (in response to MikeAlpha)Thank you Michal! Now I understand. I was misunderstanding alignment, I assumed it contributed to padding after a type, I didn't understand that alignment indicated that the offset byte at which a type had to start on with respect to the entire struct had to be a multiple of its alignment! Huge breakthrough for me. Thank you so much.
-
-
-