Squid / Sarg errors and .xz

All my squid logs are in xz format, I’m unclear if this is now a squid default or an openSUSE default:


PRETTY_NAME="openSUSE 13.2 (Harlequin) (x86_64)"
Squid Cache: Version 3.4.4
SARG Version: 2.3.6 Arp-21-2013

This would be fine except that Sarg seems to choke on .xz files. And even up to “SARG Version: 2.3.9 Sep-21-2014” the documentation makes no mention of inline .xz /lzma2 like “.gz, .bz2 or .Z”

This gives me gibberish:

sarg -o ./web-report -d "01/08/2016-01/09/2016" -l access.log-20160801.xz

so if I uncompress the files using xz -d the files:


SARG: No records found
SARG: End

I have 7 GB of uncompressed logs for the month.

I don’t run these reports often but this report generator has been working pretty well for me since 2010 with minor tweaks as versions increment (once variables are set):


ls ./access.log* -1|sed '/.*/ i\-l'|xargs time sarg -o /var/www/sarg/${SDATESTAMP}-${EDATESTAMP} -d "${START_D}/${START_M}/${START_Y}-${END_D}/${END_M}/${END_Y}"

Yet even uncompressed with a single file hardcoded this finds no records with what appears to me to be the same format of access.log file with unix timestamps that are within range:


sarg -o ./web-report -d "01/08/2016-01/09/2016" -l access.log-20160802

Format/output:


     logformat squid      %ts.%03tu %6tr %>A %Ss/%03>Hs %<st %rm %ru %un %Sh/%<a %mt
     1470024013.792    210 ip.or.FQDN.com TCP_MISS/200 3748 CONNECT urs.microsoft.com:443 - HIER_DIRECT/65.52.108.163 -

I guess I have 2 questions:

  1. Does anyone have suggestions as to what I may be doing wrong here? Suggestions for troubleshooting?
  2. Does anyone know if Sarg have any intention to support .xz natively?

Thank you.

From your description,
It sounds like the log format likely changed.
You can verify that might have happened by opening an old and a new log file in a text editor (or head or cat the first lines) and comparing the fields if the first few lines (I don’t know if the first line is headers or data).

TSU

This sounded quite reasonable. I dug up some archives, grepped the same URL in both and did a diff:


< 1360796163.047   1573 ip.or.fqdn.com TCP_MISS/200 5819 CONNECT urs.microsoft.com:443 - DIRECT/157.56.51.125 -
< 1360796163.301   1747 ip.or.fqdn.com TCP_MISS/200 5819 CONNECT urs.microsoft.com:443 - DIRECT/157.56.51.125 -
< 1360796194.561  10562 ip.or.fqdn.com TCP_MISS/200 5403 CONNECT urs.microsoft.com:443 - DIRECT/157.56.51.125 -
---
>      1469937636.608    267 ip.or.fqdn.com TCP_MISS/200 3748 CONNECT urs.microsoft.com:443 - HIER_DIRECT/64.4.54.165 -
>      1469937636.608    267 ip.or.fqdn.com TCP_MISS/200 3748 CONNECT urs.microsoft.com:443 - HIER_DIRECT/64.4.54.165 -
>      1469937639.922    262 ip.or.fqdn.com TCP_MISS/200 3748 CONNECT urs.microsoft.com:443 - HIER_DIRECT/64.4.54.165 -


big difference is leading spaces - Which is solved here:
https://sourceforge.net/p/sarg/discussion/363374/thread/ba00e625/

This means that the following script works:

ls -1 | xargs xzcat | sed -e 's/^	 ]*//' | sarg -o /root/web-report -z -x -d "01/08/2016-10/09/2016" -

I actually think the problem was more a sed failure in the old version more than anything to do with xz/squid/sarg. I was focused on the wrong part. Thank you.