Some further information;
The graph's default storage format is set to Float32 - I adjusted the batch size (everything remains constant) and set the training style to CPU to capture the gradients of my top most layer - below are the results (outputting the first 10 coefficents) - (a, b, ...) just indicating the re-runs (first backward pass for each).
BATCH SIZE = 4
Gradient weights l1 (a) ... 1568 ... [0.0032182545, 0.0018722187, 0.004452133, 0.0027766703, 0.004814127, 0.002290076, 0.0005896213, 0.002064481, 0.0019948026, 0.0055566807, 0.003961149]
Gradient weights l1 (b) ... 1568 ... [0.0032182545, 0.0018722187, 0.004452133, 0.0027766703, 0.004814127, 0.002290076, 0.0005896213, 0.002064481, 0.0019948026, 0.0055566807, 0.003961149]
Gradient weights l1 (c)... 1568 ... [0.0032182545, 0.0018722187, 0.004452133, 0.0027766703, 0.004814127, 0.002290076, 0.0005896213, 0.002064481, 0.0019948026, 0.0055566807, 0.003961149]
BATCH SIZE = 8
Gradient weights l1 (a) ... 1568 ... [-0.35463914, 0.58976394, -0.59485054, 0.22903103, -0.51804817, 0.59701616, 0.5051392, 0.074297816, 0.4284085, -0.8984931, -0.10788263]
Gradient weights l1 (b) ... 1568 ... [-0.8611915, 0.12668955, -0.20884266, -0.102241494, -0.6502063, -0.23424746, -0.4674223, -0.6518867, -0.23104043, -0.40736914, -0.31194344]
BATCH SIZE = 16
Gradient weights l1 (a) ... 1568 ... [1.26359e+35, 5.4729107e+35, 3.3159668e+35, 5.214483e+35, 3.2493971e+35, 9.169122e+35, 9.311691e+35, 2.1583421e+35, 3.952557e+35, 2.3942557e+35, 3.6645236e+35]
Gradient weights l1 (b) ... 1568 ... [0.09119261, 0.05756697, 0.07213145, 0.014482293, 0.09319483, 0.038098965, 0.06368228, 0.09818763, 0.034319896, 0.032822747, 0.011597654]
Gradient weights l1 (c) ... 1568 ... [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
BATCH SIZE = 32
Gradient weights l1 (a) ... 1568 ... [1.2068136e+35, -2.3001325e+34, 2.1084688e+35, -2.9456847e+35, 9.786839e+33, -6.9434864e+35, -1.4935384e+35, -1.0668826e+35, -1.9871346e+35, 7.397618e+34, -2.4444336e+35]
Gradient weights l1 (b)... 1568 ... [-1.3880644e+35, -2.4221317e+34, -1.1778572e+35, -1.7336298e+35, -1.8964465e+35, -2.3253935e+35, -4.467901e+35, -2.1361668e+35, -8.294703e+34, -1.3844599e+35, -2.800067e+35]
Gradient weights l1 (c)... 1568 ... [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
I would have thought gradients would have been averaged? Despite this; appears to become instable as you increase the batch size as batch size of 4 illustrates what you would expect - consistent gradient (everything else remaining constant - dropouts removed).
Is it overflowing within the process, memory issues?