#!/bin/oops ... metadata adventures and misadventures ...


Long time *nix dude, long time gnome/cinnamon user. Gnome usage started around 1.X … like 1990’s … like REALLY long time user.

A bit over a year ago, I was searching for a file in my Videos folder (yeah, i do actually store my videos in Videos … go figure) and got an unexpected result. To my surprise, the result I did not expect appeared because my search term matched metadata in the video, in spite of not matching any aspect of the file name. WOW! TOO COOL! I didn’t know nautilus did that! Hmmmmm? That got me thinking, never a good sign …

I have been obsessed with multimedia and metadata ever since! Like many folk, I have a ridiculously large collection of videos, audio and images. Terabytes! How can anyone manage that much information. Short answer: they don’t; they can’t! Long answer: As museum curators of the 19th century figured out, one can somewhat mange with good metadata.

And the adventure begins!

So I start a thread … I would like to discuss how to best manage large collections of video,audio, and images …

My tools of choice are tracker, tracke-sparql, and gthumb. What are yours?

I have also played with recoll, which is really, really, really, slick, except that is doesn’t suck data from matroska video (mkv). (I plan to write an mkv video data sucker for it at some time in the not so distant future … looks fairly easy from my initial research … ) Nonetheless, it does do a decent job with mp4!

Matroska (mkv) is superior to mp4 in every single aspect I can imagine, but even if I assume it sucks in every respect but one, that one feature trumps all others. With mkv, it is easy to Add, Delete, and Update video metadata WITHOUT remux! This is a HUGE benefit, as the discussion I hope to have will make clear. Remux a 4GB file just to update keywords=“drame,waar,historic” to keywords=“drama,war,historic” is a slow pain in the ___, even on a crazy fast nvme drive. With mkv, this can be accomplished in a fraction of a second, even on a crappy slow 5400 rpm hard drive in that 8 year old laptop you can’t bring yourself to dump.

Things I would really like to discuss here:
metadata acquisition and editing
indexing: baloo, tracker, xapian, recoll
search: sql, sparql, OTHER
integration: dolphin, nautilus, nemo, thunar, gnome-shell, OTHERS

I am currently in the slow process of converting all my mp4 video to mkv. Technically, I could automate it easily enough. It is slow only because I am checking, adding, correcting metadata. From the small (5%, plus or minus) bit I have thus far converted, WOW! It is just so much easier to view and find and slice and dice. …

I don’t really know if this is the right place to initiate this discussion, but I guess I will know soon enough …

If you were a KDE Plasma user then, I would suggest digiKam to manage Videos and (photographic) Images.

  • But, you’re not – therefore darktable to manage your images.

Being a KDE Plasma user, I’ve never noticed Tracker but, it seems that the Tracker project acknowledges Baloo as being the equivalent KDE Plasma tool.

Hmmm – at the end of the day, if one wants to manage collections of media, one has to accept metadata management – which means, a database has to be used.

  • Which is no different to what physical libraries have been doing for more than a few years now – before computers, they were using physical card systems to document where the physical books or whatever were physically located within the given library …

Hello @dcurtisfra,

Your library card catalog is perfect for the idea I was trying to convey with my less obvious museum curator analogy.

baloo and tracker are comparable, but in my VERY LIMITED testing, baloo does not appear to index video metadata. Recoll does index mp4 video metadata, but not mkv. Recoll, once more from rather limited testing, seems to be more powerful than tracker for things like pdf documents, various office document formats, and perhaps even email. It is also available on Windows and the project documentation implies that is is superior to windows indexing, which has actually become pretty good, if not very good. Once more … limited testing and experience. It also has a very nice graphical front end to it’s search facility. I have read that it integrates nicely with dolphin, but I haven’t tried it. The only way to effectively query the tracker database is to write sparql (similar to sql) from the cli. Tracker does provide a cli program called tracker-search that less technical users should be able to learn easily enough, but it doesn’t even begin to match the power and ease of use provided by the Recoll gui front end.

The tools required to automatically create the “card catalog” exist are mostly robust. Not that long ago, I actually contemplated implementing my own media db in something like sqllite and collecting data with perl scripts and ffprobe. I’m glad I didn’t! That work has already been completed by folk who are much better programmers than me :).