Software or method to copy Yahoo group message history.

I’m looking for a way to copy and archive 10 years worth of message history from a Yahoo Group. With some changes in the yahoo group format coming in the future my group has concerns. I can copy a page at a time but it is not practical. Any ideas on available software or how to go about it. I’m nearly clueless and am told that it can’t be done.

Hello difficilus,

I’ve got no experience with Yahoo Group so I’m not sure how/what it works.
So maybe you could give a link to an example of such Yahoo Group?

Maybe you can download the messages with wget.
Start reading the man pages of wget.

Best of luck!:wink:

Here is a link to a public access group.
divingantiquescuba : Diving Antique Scuba - Take those relics off the wall, lets dive them.

If you scroll halfway down you will see a matrix under “message history”. The year and month is indicated. Review of the archived messages is a point and click to open. Individual messages are listed and may be selected the same way. I can copy each page individually but no joy in that. I’ll check out wget.

Hello difficilus,

Thanks for the link.
I tried a lot of different wget arguments and ended up with this command:

wget --recursive --page-requisites --no-parent --page-requisites --html-extension --convert-links http://groups.yahoo.com/group/divingantiquescuba/

This command downloads all the pages under divingantiquescuba : Diving Antique Scuba - Take those relics off the wall, lets dive them. and there requirements (images, attachments, etc…).
And converts the links in the downloaded files to point to your local filesystem so it will be completely independent of Yahoo! Groups.

The only problem I encountered:

  • It also downloads lots of useless stuff. Pages from login.yahoo.com.
  • There’s an folder called messages but as far as I could see it just points to a list of messages, which you’ll download anyway.
  • The link converting points to a directory on my root filesystem. (/group/divingantiquescuba/msearch_adv for example, it should be /<path>/<to>/<download>/group/divingantiquescuba/msearch_adv)

But eventually you get the messages in html format with all the images (and adverts!) stored.

If you want I could tune this command to only download the messages, but that would take a while.

Good luck!:wink:

Thanks Edward Iii. This might just do exactly whats needed. I’ll explore and try and tweek it a bit to get it to access a private group that requires a username and pwd to login and clean out some of the junk links and misc. downloaded. I’ll post back and share any success or ask more questions if needed after I learn something new.

Been working on off and on since the original post and can’t get past the redirect. Need to log on with username and pwd to access the site I’m working on. I have used several variations of this (verbose):

wget --recursive --page-requisites --no-parent --page-requisites --html-extension --convert-links --timestamping --continue --user-agent=Mozilla --wait=1 user=[username] password=[password] Yahoo! Groups - Join or create groups, clubs, forums & communities[groupname]/messages

I get this when I run wget with the above options:

@linux:~> wgetyahoo
–2010-11-23 13:43:44-- http://user=[username]@yahoo.com/
Resolving yahoo.com… 67.195.160.76, 69.147.125.65, 72.30.2.43, …
Connecting to yahoo.com|67.195.160.76|:80… connected.
HTTP request sent, awaiting response… 301 Redirect
Location: Yahoo! [following]
–2010-11-23 13:43:46-- Yahoo!
Resolving Yahoo!… 209.191.122.70
Connecting to www.yahoo.com|209.191.122.70|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Last-modified header missing – time-stamps turned off.
–2010-11-23 13:43:47-- Yahoo!
Connecting to www.yahoo.com|209.191.122.70|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: `Yahoo!

result:

FINISHED --2010-11-23 13:53:48–
Downloaded: 1 files, 38K in 0.02s (1.53 MB/s)
Converting Yahoo… 0-2
Converted 1 files in 0.01 seconds.

Missing something somewhere with the login and redirect. Have any thoughts. I could dink around with the download of just the messages if I could get on the group and make this work I think. Any thoughts or advice?

Hello difficilus,

This could become very tricky.
It could simply be a username and password.
I noticed in your command:

wget --recursive --page-requisites --no-parent --page-requisites  --html-extension --convert-links --timestamping --continue  --user-agent=Mozilla --wait=1 user=[username] password=[password] [Yahoo! Groups - Join or create groups, clubs, forums & communities](http://groups.yahoo.com/group/)[groupname]/messages

This should be:

wget --recursive --page-requisites --no-parent --page-requisites  --html-extension --convert-links --timestamping --continue  --user-agent=Mozilla --wait=1 **--user**=[username] **--password**=[password] [Yahoo! Groups - Join or create groups, clubs, forums & communities](http://groups.yahoo.com/group/)[groupname]/messages

But it could also require a php session.
Meaning when you log-in a cookie will be stored so that PHP knows it’s you.

In this case you should first retrieve the cookies and use them to retrieve the data.
For you I hope this isn’t the case, because finding the right commands for this is very difficult.
The only example I could find is this one: Can’t log in with wget | drupal.org

Best of luck!:wink:

Noted the correction to code but didn’t seem to help. I think you are right about the need for cookies after reading the link provided. I have not been able to work on this much lately but I did however find this!

Products/yahoo2mbox - TT-Solutions

With the possibility of changes to Yahoo and it being several years old it may no longer work, but it appears to be written explicitly to get yahoo group messages. I tried to run it a few times with no success and suspect that my usage / options parameters are the likely cause at this point. I have not given up on this yet! Ten years of message history. I really want this…heh.

Hello difficilus,

I’m afraid you’re right that due to changes yahoo2mbox won’t work.
I looked in the source and indeed it uses session cookies.

After some searching I stumbled upon this: Yahoo Group archiver | Download Yahoo Group archiver software for free at SourceForge.net
I don’t know if it works but it’s worth a try.

Note that the last code update was only 12 hours ago!

Good luck!:wink:

Checked it out. I feel like I’m closing in on a solution. Thanks for the help!

Or you could try PG Offline from PG Offline - Download Yahoo Groups, Messages, Files and Photos

Version 3 outputs to an Access database

Version 4 (beta) outputs to an SQLite database

PGOLite V3 is free (up to 3 groups)

PGO V4 (beta) is free but only lasts 30 days.