Help required for downloading and converting a Medical Book

Recently we had a conference of Association of Physicians of India (APICON 2011). The Scientific Proceedings are available on the web site:
APICON 2011- Welcomes you !
as a link
"Medicine Update Vol 21, 2011. Proceedings of Scientific sessions, Apicon 2011"

This link is directed to a Book which opens here: A simple frameset document
This Book is in .aspx format

Can you please suggest a software for downloading or converting this book to another format which can be read offline.
Please help.
Thanks in advance.

I can’t see how
I can download all the pages with DownThemAll and they are all viewable in a browser or in office, but each page has to opened individually
But the images are missing of course

LibreOffice reports it was produced with Pagemaker
and the original document is:

"C:\Documents and Settings\Server\Desktop\MEDICINE UPDATE 2011\02. Recent Insights in Coronary Artery Disease in Women.pmd"

Thank you so much. I found the way to read them offline. I right clicked each page and downloaded them all.
Thanks once again.

On 03/21/2011 06:06 AM, caf4926 wrote:
>
> I can’t see how
>
> “C:\Documents and Settings\Server\Desktop\MEDICINE UPDATE 2011\02. Recent Insights in Coronary Artery Disease in Women.pmd”

given the way the document is presented on the web site, i do not
believe there is a way except this labor intensive way of downloading
each page as a complete web page file, numbered to match the actual
page number:

  1. make a new folder (say /home/[you]/Documents/Recent Insights in
    Coronary Artery Disease in Women)

  2. in your browser go to page one of that book on the web site and (in
    Firefox–others are probably the same, but maybe not) do File > Save
    Page As, and
    a. in the opening Save As dialogue, navigate to the new “Recent
    Insights…” directory just made
    b. make sure to select save/filter “Web Page, complete” [not
    "…HTML only, because then you miss the images]
    c. CHANGE the name of the page from “Book.aspx.htm” to something
    like “Book01.htm”
    d. click save

  3. in the browser click to page 2 of the book and click “Save As”
    a. check that save/filter “Web Page, complete” to your “Recent
    Insights etc” directory is still set (it is in my Firefox, it might
    not be in your browser)
    b. CHANGE the name of the page to save from “Book.aspx.htm” to
    something like “Book02.htm”
    c. click save

once you see your browser is keeping the save to place and save web
page complete constant, then it is just

  1. click to browse to next page
  2. change save as file name to match the number of the actual page
  3. click save

over and over to fetch and save EACH page into that one
directory…when you are finished you can (in konqueror and firefox for
sure–i just checked) browse the book using its internal links, and
all images should be there…

there is probably a way to automate the process, but i don’t know
how…anyway, i guess once you are down to the last 1-2-3 and get a
rhythm going you can do a page every 5 to 15 seconds…

it would of course be a LOT easier for you if the web site designer
set a link to allow you to download the entire book as a pdf document…

hope this helps, even if time consuming…


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

Thanks a lot. Now I can read this book…:):):slight_smile:

(I was using rekonq browser till now. It gave me no option of saving these book’s pages. Now I understand why people use Mozilla Firefox. This may just be another reason for using firefox which has more features than rekonq.)
Thanks once again.

On 03/21/2011 01:06 PM, babloo75 wrote:
>
> Thanks a lot.

no, i did so very little…

i have a friend in Nepal working to help ‘unclean’ women find
jobs…and a son who pays his own way to Uganda to volunteer in a
medical clinic…

i thank all three of you…and ask you to accept my thanks for your
willingness to help women with cardiovascular disease in India…


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

aspx is dotNET asp. If you don’t download/save pages individually (which wouldn’t preserve internal site links so you can’t click page to page), the problem for many apps is that ASPX typically contain plenty of server-side code that constructs pages on the fly (but then cached in RAM).

Did you try something like WebHTTrack which is a website copier? You might lose some pretty page layout formatting, but you should get all the text and images, and re-write the links you can click page to page in your book offline.

Tony

On 03/21/2011 06:06 PM, tsu2 wrote:
>
> Did you try something like WebHTTrack which is a website copier?

the method i gave works perfectly…including live links, images and
exact page layout…


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

Boy,
If that works that’s a new one for me, but unless I can’t follow instructions correctly (I Tried! I Really Tried but I may be dense!) it doesn’t work for me if you’re talking about clicking on the hypertext links and not simply clicking on the individual files (like in a File Manager).

Are you sure you did your testing offline without a working network connection (aboslutely required to test for offline viewing).

For one thing, it would seem to me that if the hypertext links were to work offline, they <must> be re-written to point to the offline location and I can’t see how a simple “SaveAs” does that.

Another way to check is to inspect the page sourcecode. So, for example if you downloaded three pages which should link to each other but renamed as Book1.htm, Book2.htm and Book3.htm, open up one of the pages and search for the name of another. For example with pages renamed this way,

  • Open Book2.htm with your text editor (eg Kwrite).
  • Do a “Find…” and enter text strings “Book1” or “Book2” – Since these pages are flat HTML, any link <must> include the name of any other location in the hypertext link code and be clearly visible in the text editor.

Am I missing something?

Dumb and Dumber,
Tony :slight_smile:

On 03/21/2011 11:36 PM, tsu2 wrote:
> Are you sure you did your testing offline without a working network
> connection (aboslutely required to test for offline viewing).

no, actually i didn’t try that…

and, yes i was speaking of the links at the top of the page…
so, if they don’t work while off line then is not ‘perfect’ as i had
claimed…

> For one thing, it would seem to me that if the hypertext links were to
> work offline, they<must> be re-written to point to the offline location
> and I can’t see how a simple “SaveAs” does that.

you may be right…frankly, i don’t want to unhook from the web to
find out…if you say FF is too dumb to make those changes, i believe
you…

i know there was an OS/2 program which would fetch htm/html code and
automatically rework the links to be off-line readable…

and, (i’m having flash backs…have you and i discussed this before) i
think wget can be set with command line switches to the do the same
thing…

BUT, i just look at the source and i very much doubt that wget has
been ‘taught’ how to rewire the links in that perverted code (save as
any page and look in the new directory named Book1_files (or however
you elect to name the .htm

what a mess…

i suspect my Indian doctor will come back and complain that it didn’t
work so well at home (off-line), after all…

> Am I missing something?

nope…you nailed it.


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

On 2011-03-22 11:33, DenverD wrote:
> On 03/21/2011 11:36 PM, tsu2 wrote:

>> For one thing, it would seem to me that if the hypertext links were to
>> work offline, they<must> be re-written to point to the offline location
>> and I can’t see how a simple “SaveAs” does that.
>
> you may be right…frankly, i don’t want to unhook from the web to find
> out…if you say FF is too dumb to make those changes, i believe you…

wget does, with I don’t remember what cli option.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Carlos E. R. wrote:
> On 2011-03-22 11:33, DenverD wrote:
>> On 03/21/2011 11:36 PM, tsu2 wrote:
>
>>> For one thing, it would seem to me that if the hypertext links were to
>>> work offline, they<must> be re-written to point to the offline location
>>> and I can’t see how a simple “SaveAs” does that.

You didn’t look closely enough then :stuck_out_tongue:
Firefox does save the images and change the links in the content pages.

If you save the Header.aspx frame, though, it still contains the
original links so you would need to edit that file to change the links
in that.

>> you may be right…frankly, i don’t want to unhook from the web to find
>> out…if you say FF is too dumb to make those changes, i believe you…
>
> wget does, with I don’t remember what cli option.

wget is a good way to do the whole thing in one go

I concur … wget can download a whole website local inclusive of all links, subfolders, and files but as there site says “It depts are likely to get real angry if you take their entire site … use with caution”

Thanks for the comments (above) and helping me out in the present circumstances. I am able to read the book pages offline too. They open in Konqueror and I can read the pages. But I would love to download all of them in a single go.
And I just saw in Yast2, my Suse has got wget installed in it.
Can you please guide me the way to download all of these pages in a single go.

Thanks once again.

I tried the following, but got nothing.

babloo@linux-3npg:~> wget A simple frameset document
asking libproxy about url ‘http://www.apicon2011.com/Book.aspx
libproxy suggest to use ‘direct://’
–2011-03-23 19:45:34-- A simple frameset document
Resolving APICON 2011- Welcomes you !… 67.18.185.98
Connecting to www.apicon2011.com|67.18.185.98|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 360 [text/html]
Saving to: “Book.aspx”

100%==================================================================>] 360 --.-K/s in 0s

2011-03-23 19:45:55 (31.1 MB/s) - “Book.aspx” saved [360/360]

babloo@linux-3npg:~>

Please guide.

  1. in the cli navigate where you want your download to be stored.
  2. Enter the following making sure not to specify an exact page or that is all you’ll get. For example www.somesite.com/somepath will download whole section of a site but if you entered … .com/somefile.aspx you would only get the single file.
    wget --wait=20 -U Mozilla <base web address>

–wait short pause between page downloads to prevent web server from thinking your a robot downloader.
-U Mozilla identify you as a standard browser
<base web address> what website you want to scavenge

On 03/23/2011 03:36 PM, babloo75 wrote:

> Please guide.

the manual for wget is rather long and somewhat complex…
but no more so than the medical books being read…

you said you have the ability to read the entire document now, so i
can only assume you want to learn how to download something similar at
some future date…i therefore (especially since it is almost soup
time here) suggest you dig through the manual…it is easily available
if you put this into the address line of konqueror


#wget

the final command you use will have to use some or all of these (and more)

when you have developed the list of various command line switches
which you think should work, but does not then ask here for
clarification and guidance…

wait: i find in my long file of notes this:


download an entire web site:

wget --recursive --no-clobber --page-requisits --html-extension
--convert-links --domains website.org --no-parent
www.website.org/turorials/html --restrict-file-names=windows

but, you don’t want an entire site, you only one part of it but that
should easily solved by correctly giving the URL, maybe…so, the
above is only something to think about…especially since i don’t
remember if it actually worked as expected, or not… i do remember
that more than once i had to unplug the 28.8 modem before i downloaded
the entire internet :wink:


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

On 03/23/2011 04:36 PM, techwiz03 wrote:
>
> Enter the following making sure not to specify an exact page or that
> is all you’ll get. For example www.somesite.com/somepath will download
> whole section of a site but if you entered … .com/somefile.aspx you
> would only get the single file.

so, in this case the URL would be http://www.apicon2011.com/ and
therefore the entire site would have to be downloaded in order to get
the book…


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11

wget is very powerful but not fool proof! You need to do some investigation as to whether you need certain pages and certain folders or not. If you don’t take some of the responsibility for control, then yes you could end up with the whole world on your harddrive.

On 03/23/2011 11:06 PM, techwiz03 wrote:
> wget is very powerful but not fool proof! You need to do some
> investigation as to whether you need certain pages and certain folders
> or not. If you don’t take some of the responsibility for control, then
> yes you could end up with the whole world on your harddrive.

sir, you are not paying attention! the targeted download is at
http://www.apicon2011.com/Book.aspx

i investigated VERY well and gave the the OP precise URL required…

and tried to give the OP the benefit of my experiends–ten or 12 years
ago with wget ported to os/2…and, that was not nearly as
irresponsible an act as you might imagine, because i was watching it
download and when i saw it fetching stuff i didn’t think i had asked
for i just cut it off…and did some more man reading…and trying…


DenverD
CAVEAT: http://is.gd/bpoMD
[NNTP posted w/openSUSE 11.3, KDE4.5.5, Thunderbird3.1.8, nVidia
173.14.28 3D, Athlon 64 3000+]
“It is far easier to read, understand and follow the instructions than
to undo the problems caused by not.” DD 23 Jan 11