Is your book a practical book that "comes to life" when a user uses an app to point their phone at a page? If yes, you would likely want to look into Apple's sample code for "Detecting Images in an AR Experience" (https://developer.apple.com/documentation/arkit/detecting_images_in_an_ar_experience) for samples of how to detect an image and augment it with 3D content. In your case, you could provide a sample image of each page in the book that you would want your users to recognize, at which point ARKit could be scanning for that image and, if it is located, add relevant anchors for you to "attach" your 3D content to. The animations themselves, even if video files, could be applied in this case. Additionally, you may want to consider looking into Reality Composer, which might let you prototype this idea in a more effective way than simply relying on code.
To your questions, it's tough to say if this is something that you could/should build using ARKit. ARKit is a framework that does all of the heavy processing of Augmented Reality for you; it's not a program to build AR experiences in. If looking for that, Reality Composer would be your best bet. Additionally, ARKit is for Apple devices - it has no cross-platform functionality to run on Android devices. You could look into third-party libraries for cross-platform functionality (of which, a few do exist), but you lose much of Apple's streamlined approach to leverage the best software and hardware integration, and may provide your users a sub-par experience if you opt to not use ARKit.
Your last question; I would agree that animations would likely be too large to be bundled with the app. You could store such animations on any server, so as long as you have some sort of API to allow your app to contact that server and download the media (something you would need to implement using a URLSession). Looking into "On Demand Resources" is another technology Apple provides that could prove useful in this case, but in most scenarios, a developer would likely store the animation resources/videos on a server, the app would then either have a list (or download a list) of the necessary animation files, then download each of those files locally for use, all while showing the user a progress bar/informing them what's happening.