I've never worked with SceneKit, however I have a working implementation in OpenGL ES 2.0 (presumably a 3.0 context works as well) so it is definitely possible. I based it on the Metal example and instead used this core video function (included some other ogl calls to give you an idea how it works)
CVPixelBufferRef pixelBuffer = frame.capturedImage;
///// ...
int width1 = (int)CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
int height1 = (int)CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
glActiveTexture(GL_TEXTURE1);
CVOpenGLESTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache, pixelBuffer, NULL, GL_TEXTURE_2D, GL_RG_EXT, width1, height1, GL_RG_EXT, GL_UNSIGNED_BYTE, 1, &outTexture1);
glBindTexture(CVOpenGLESTextureGetTarget(outTexture1), CVOpenGLESTextureGetName(outTexture1));
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
However to be honest, working with ARKit in this way seems super dodgy right now. The view and projection matrices i'm getting are weird and I need to do transforms on them to get the correct output (mostly just guess and checked it). For this reason the hitTest functions are totally broken.
I can get it all to work, but expect some ***/fudge comments.