I have a digital subscription to a textbook, but it’s super annoying to have to use the website to access the book. I’d like to scrape the ebook and dump the contents into a pdf. I have downloaded proprietary pdfs from websites before using downloader browser plugins and predictable urls, but this site is pretty locked down, with randomly generated url tokens and a combination of xml and image data.
Has anyone managed to scrape a digital textbook like this? Any ideas where I should begin?
I had a few books like that that were directly on a scummy academic editors website. No pdf or usable files. I’m currently far from home, so I can’t tell you exacly what program i used. But i noticed that every page was downloaded in my temporary files as image data (cached version on page). So i had to manually flip a few pages, download them 1 by 1 and naming them correctly. I’ll look ok my pc to try to find the program that did that when I’m back
Sounds like you could also use a image downloader browser extension for that
Sounds promising! Please let me know what you find.
It was MZCacheview but the same autor made one for chrome and a general one. But theoware is probable right, a brower extension could also do it!
Looks like this particular publisher has anticipated cache sniffing. No dice.