Nepomuk

There was a brief period during which nepomuk actually sort of worked for me (a bit). But that is now over, and once again I am wasting my life trying to get the appalling mess of interconnected bugfest to do something useful. When the new problems began, I asked on the KDE forums, hoping that the folks there would have more expertise and experience with fault-finding and de-bugging, but despite the valiant efforts of one helpful and patient individual, nepomuk still refuses to do anything to earn its HDD space and CPU cycles.

I won’t paste the entire conversation from KDE forums, but to save passing viewers having to go there I shall paraphrase it. For the whole thing:
http://forum.kde.org/viewtopic.php?f=154&t=117408

I got a warning that my /home was full. I found that my soprano-virtuoso.db was 20Gb of a 65Gb /home dir. I deleted it, but it came back, and is now at 30Gb
If I do a htop I see a number of processes of the form:

4791 ?        SNl  248:40 /usr/bin/virtuoso-t +foreground +configfile /tmp/virtuoso_Cj4345.ini +wait

Nepomuk is only set to index /home (not hidden files) and a 100Gb media store containing music and video (I have given up with kmail as it was broken) What is causing this db to bloat uncontrollably? Why are the instances of virtuoso -t hogging resources and hanging around for so long?

(The above had been running for 4 hours)

How do I stop this happening?
Is there an alternative to the whole akonadi/strigi/nepomuk/virtuoso mess? ie a simple file indexer that can be relied upon?

I use opensuse 12.3 with KDE 4.11
http://i127.photobucket.com/albums/p145/wakou/th_htop_zps34d124c4.png](Photo Storage)

In reply to this I was asked to delete the whole data directory, which I duly did.

Just a FYI, I deleted the db and the dir in which it resides, as above, at approx 10:30.
It has gradually grown since then, and is now 740Mb. (12:00)

Also this morning I did a zypper up, and some virtuoso/nepomuk etc components were upgraded, so I am going to reboot and hope that (!) something has improved.

I am not sure that I understood bcooksley’s point re. htop, but here is another…

http://i127.photobucket.com/albums/p145/wakou/th_htop2_zps560b26f6.png](Photo Storage)

edit/ps soprano-virtuoso.db now at 826Mb, grown that much whilst I was writing this…

bcooksley advised me to check for symlinks which might be recursing, which I duly did, with no untoward results. I re-enabled nepomuk, but told it ONLY to ‘look’ at my ‘media store’

But now the behaviour changed; the db stopped growing, BUT it ONLY indexes NEW files, and refuses to index the files which are there already. Does anyone have any ideas? I did ask on #KDE on IRC, the advice there was unanimous, everyone there reported that they had nepomuk switched off. I would really really like a way to search my HDD’s quickly and easily. There is a bug reported here:

https://bugs.kde.org/show_bug.cgi?id=321796

Which advises that a workaround is simply to disable/re-enabled the directories to be indexed, I have of course tried this a number of times, but my .db still only contains files added since the .db was deleted and restarted.

I remember having nepomuk hang with virtuoso, switching to soprano-backend-redland fixed it IIRC.
But this was long time ago (4.3 or 4.4), I had since completely removed nepomuk.

Hi wakou.

I honestly wouldn’t know any other suggestions than the usual ones:

  • check whether all KDE packages come from one repo, if uncertain, post output of
zypper lr -d
  • create a new user, see if the problem exists for that new user. If not
    – delete nepomuk* from both ~/.kde4/share/config and ~/.kde4/share/apps whilst not being logged in.

EDIT: I don’t see the issue on both my machines, db’s are ~140MB, nepomuk running with defaults.

I had another bash/bleat on the kde mailing list recently. I would say the answer is not to use it and frankly it looks like things may get worse. I run kdepim3 and so far no problems at all. Desktop indexing is also turned off but I get good search times.

Why might it get worse - well even the digital clock is going get tied in.

Couple of links

Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk | Finding New Ways…

Akonadi - KDE UserBase Wiki

The indexing link up is so that some one can say right click on a date and set an appointment. Great idea maybe but as some one pointed out is it worth dragging the entire indexing shooting match in just to do that.

Some years ago from comments on the forum some did wonder about bringing back a version of Kmail 1, KDE4 uses Kmail 2. Looks like it will never happen. On the other hand I wonder if kdepim3 is still about for enterprise users.

rotfl! There is a video on YouTube entitled “why linux sucks”. It’s a linux fest presentation video so not what it might seem. Interestingly. the use of Unity is mentioned. That uses OpenBox as the default desktop. That one is a C rewrite of something that went C++ and is very basic as is Unity. There are also some kernel people who want to put the entire C library in the kernel. I get the impression that there may be a revolt against the sort of coding / implementation that KDE and others represent at some point and good luck to them. On the other hand there is a lot excitement about the use of Qt so it all maybe around for a long time.

People who don’t appreciate Open Source’s problems should watch the video. There is also another one “why linux doesn’t suck”.

The comments about SuSe are interesting.

Should add I have nothing against C++ etc but there are clear indications that some aspects are being misused.

John

Yet again Knurpht is the hero of the hour! I sacked my long-term admin assistant, Natasha McTest, and employed her younger sister Tessa McTest. With an empty profile and just defaults in /home/tessa, I set nepomuk to index /home/tessa and /mediastore (which is in fstab, a HDD with music and radio etc) Nepomuk went off and did its work, silently (1) and without putting too much drain on either CPU or HDD. The /mediastore is about 100Gb. After a few hours, the

/home/tessa/.kde4/share/apps/nepomuk/repository/main/data/virtuosobackend/soprano-virtuoso.db

file is steady at approx 170Mb. Search in Dolphin ACTUALLY WORKS!! (which led to a potentially embarrassing moment, when I clicked on ‘videos’ :shame::shame: How on earth did THAT get in there?? delete, delete… )

Now I have somehow to find which file in my own /home directory is causing the problems. I did not know that nepo put files in ~/.kde4/share/config as well as in ~/.kde4/share/nepo/foo/foo/etc so I shall begin by renaming those one at a time.

Or maybe just wait until 13.1 goes gold, and start again with a new /home altogether… (lazy option)

(1) A question, which would not be needed if I 100% trusted that everything was perfect. I would rather that it was NOT silent, is there something I could place in the panel to show when nepo is working, and what % of CPU etc it is using, just to keep an eye for future problems? There used to be such a thing (?) I suppose, if I had a functioning brain-cell, I could write a wee plasmoidwidget or whatever they are called to do just that. #notgoingtohappen

:slight_smile:

I just went back to my own user and looked at the virtuoso-soprano.db. It has somehow got to 1.9Gb, despite being broken so that it ONLY indexes new files, so that it has only indexed 20 .mp3 files, 8 small jpegs/pngs and about 50 small .txt files. So is it worth me trying to preserve/document this state to present a bug report somewhere? If so, how best to go about that? Or should I just be selfish and delete it all and try to solve it?

Whilst not being able to offer any immediate solutions…

From personal experience I’ve found that ~/.kde4/share/config/nepomukstrigirc is a good contender to start with. (Obviously you’ll need to reselect directories for indexing.)

There are also some additional config entries listed here Nepomuk/FileIndexer - KDE UserBase Wiki to further control indexing and startup scan behaviour, or to enable debug mode.

On my own system nepomuk is actually working quite well :open_mouth:

I just had a quick look at that file… and well well!



[General]
exclude filters=autom4te,*.rcore,CTestTestfile.cmake,*.o,*.omf,.hg,*.m4,*.orig,moc_*.cpp,conftest,.xsession-errors*,CMakeTmpQmake,*.tmp,po,.svn,.histfile.*,lzo,.bzr,.git,litmain.sh,cmake_install.cmake,CMakeFiles,*.pc,*.nvram,*.elc,*.la,CMakeCache.txt,confdefs.h,*.gmo,*.csproj,*.rej,config.status,lost+found,confstat,*.pyc,_darcs,CVS,*.part,libtool,*.aux,*.po,CMakeTmp,Makefile.am,*.lo,*.loT,*~,*.moc,*.vm*,*.class,core-dumps
exclude filters version=2
exclude folders$e]=
exclude mimetypes=text/css,text/x-c++src,text/x-c++hdr,text/x-csrc,text/x-chdr,text/x-python,text/x-assembly,text/x-java,text/x-objsrc,text/x-ruby,text/x-scheme,text/x-pascal,text/x-yacc,text/x-sed,text/x-haskell,text/asp,application/x-awk,application/x-cgi,application/x-csh,application/x-java,application/x-javascript,application/x-perl,application/x-php,application/x-python,application/x-sh,application/x-tex
first run=false
folders$e]=/Store,$HOME
***index hidden folders=true***
strigiVersion=unknown

[RemovableMedia]
ask user=false
index newly mounted=false

[general]
legacyCleaning=false

This despite the fact that hidden folders are set to NOT be indexed from the configure desktop page. I suspect that this may well be the source of the grief! (Or at least some of it!)

Although I was under the impression that strigi was no longer used?

That’s correct. Nepomuk doesn’t use strigi anymore, but has its own file indexers now (since 4.10).

But that config file wasn’t renamed, for compatibility reasons I think.

Thank you Wolfie. I have gone ahead and logged on to a console as root. Then I deleted:
/home/stephen/.kde4/share/apps/nepomuk
and all beneath. I then renamed:
/home/stephen/.kde4/share/config/nepomukstrigirc
and rebooted and started KDE as my user (stephen). I then went to configure desktop>desktop search and set nepomuk to index:
/home/stephen and
/Store.
I then looked at:
/home/stephen/.kde4/share/config/nepomukstrigirc
This newly generated file contained the line:
index hidden folders=true

So log out and as root set that value to ‘false’ and start kde again.

Now as this is my main user profile, I expected to have a larger index than that generated for the new user (‘Tessa’ as mentioned above) As I wrote there, her
“/home/tessa/.kde4/share/apps/nepomuk/repository/main/data/virtuosobackend/soprano-virtuoso.db”
file settled, seemingly stable at approx 169Mb. The vast bulk of indexing here (by size) will be in the media store,
/Store

And even though this is an active user profile, the /home/foo files should not be massively different, the computer is not used to generate vast office docs or used in a workplace environment etc. However after three or fours hours, the .db stands now at 561Mb and growing.
Any suggestions please, on how to proceed? I am going of course to let it run for a day or so, just to see if it does top out, or whether it continues to grow of of control. It is NOT using excessive CPU at this point. And nepomuk appears to be working, at least as far as using dolphin to show me files of certain types.

As a niggly ps, despite the ontological wonderment, semantic superness and general futuristic fantastic capabilities, it cannot tell the difference an mp4 audio file and a video. To my eternal shame I have a ‘Johnny Halliday’ album lurking in my /Store, recorded for some reason as .mp4. nepomuck reports these tracks as videos. I am pretty sure that MSWin XP could do this more quickly in 2001. Twelve years, by Moore’s Law is a very long time in computing, and the five years or so since nepomuk was foisted upon us, as a non-working bugfest of interwoven horror should have been plenty of time for it to have been made to work, or dropped and replaced with something which does.

Why as root? Those are your user’s file… But well, doesn’t matter. :wink:

I then went to configure desktop>desktop search and set nepomuk to index:
/home/stephen and
/Store.
I then looked at:
/home/stephen/.kde4/share/config/nepomukstrigirc
This newly generated file contained the line:
index hidden folders=true

Hm, I can’t reproduce this here.
I always get “index hidden folders=false” in the newly created file if I delete it.

Are you sure, nepomuk was not running while you changed it? (yes, logging out of KDE may not be enough. I have seen it happening that nepomuk kept on running at least for some time; so try to delete those files right after reboot, before logging into KDE)

As a niggly ps, despite the ontological wonderment, semantic superness and general futuristic fantastic capabilities, it cannot tell the difference an mp4 audio file and a video. To my eternal shame I have a ‘Johnny Halliday’ album lurking in my /Store, recorded for some reason as .mp4. nepomuck reports these tracks as videos. I am pretty sure that MSWin XP could do this more quickly in 2001. Twelve years, by Moore’s Law is a very long time in computing, and the five years or so since nepomuk was foisted upon us, as a non-working bugfest of interwoven horror should have been plenty of time for it to have been made to work, or dropped and replaced with something which does.

You do know that Nepomuk is not there to tell you what content type your files have? It doesn’t detect that itself anyway, but uses the system’s mimetype database for that like the rest of KDE (in fact it uses kdelibs to detect that).
So what does “file filename” report about those files? And “kmimetypefinder filename”?

You can at least change the matching extensions for each filetype in “Configure Desktop”->“File Associations”.

And to your comment regarding Windows: the reason is that Windows uses only the file’s extension (the part after ‘.’ in the filename) for determining the type. In Linux it’s a bit more complex, here the actual content is checked as well. This may yield differences of course, but has nothing to do with Nepomuk or even KDE.

Because I wanted to be sure I was not logged in as user, to be sure that nepomuk was not active at the time. So what I did was, logout of KDE, then at log in screen, select console log in, then logged in as root, then did the file management, then rebooted.

Hm, I can’t reproduce this here.
I always get “index hidden folders=false” in the newly created file if I delete it.

Are you sure, nepomuk was not running while you changed it? (yes, logging out of KDE may not be enough. I have seen it happening that nepomuk kept on running at least for some time; so try to delete those files right after reboot, before logging into KDE)

Hmm, I will try that if the .db continues to grow…

You do know that Nepomuk is not there to tell you what content type your files have? It doesn’t detect that itself anyway, but uses the system’s mimetype database for that like the rest of KDE (in fact it uses kdelibs to detect that).
So what does “file filename” report about those files? And “kmimetypefinder filename”?

I did not know that. We were told that nepomuk was supposed to be able to know everything about every file it looked at including what size of bra was favoured by the grandmother of an email correspondent.
I must confess that I still do not know what it is supposed to do, and as it has never actually done ANYTHING worthwhile on any of my systems, from 11.x onwards, I have never had a chance to explore its “capabilities” at all.

file Johnny\ Hallyday\ -\ Jesus\ Christ.mp4 
Johnny Hallyday - Jesus Christ.mp4: ISO Media, MPEG v4 system, version 2
kmimetypefinder Johnny\ Hallyday\ -\ Jesus\ Christ.mp4 
video/mp4
(accuracy 100)

You can at least change the matching extensions for each filetype in “Configure Desktop”->“File Associations”.

Then surely those .mp4 that DO have video would report themselves as audio. It is not important… I cannot imagine ever wanting to watch a video of Johnny, and being disappointed to find that I only have audio!

And to your comment regarding Windows: the reason is that Windows uses only the file’s extension (the part after ‘.’ in the filename) for determining the type. In Linux it’s a bit more complex,* here the actual content* is checked as well. This may yield differences of course, but has nothing to do with Nepomuk or even KDE.

My italics… Obviously “the actual content” is not, in this instance checked. There is no video in those files. Anyway, as I say, it is not important, a minor peeve, in comparison to my ever-growing .db (647Mb as now 11:02)

When you login as user in the text console, nepomuk will not get started anyway.
But I really think in your case it was still running, because otherwise that “index hidden folders” option should have been set to “false”.

I did not know that. We were told that nepomuk was supposed to be able to know everything about every file it looked at including what size of bra was favoured by the grandmother of an email correspondent.
I must confess that I still do not know what it is supposed to do, and as it has never actually done ANYTHING worthwhile on any of my systems, from 11.x onwards, I have never had a chance to explore its “capabilities” at all.

Well, nepomuks “capability” is to store information (including the mime type) about the files in a database, which can then be searched more quickly.
But it uses the system’s means to determine the mime type in the first place. Just like it uses existing libraries (libpng f.e.) to extract that information out of the files.

kmimetypefinder Johnny\ Hallyday\ -\ Jesus\ Christ.mp4 
video/mp4
(accuracy 100)

Right, as suspected. The file is detected as video/mp4 by KDE, which just uses the system’s mimetype database (in /usr/share/mime/), i.e. it should be detected the same in GNOME f.e.
I had a look at my system, and indeed the video/mp4 filetype contains the pattern “.mp4" whereas audio/mp4 only has ".aac”, “.f4a" and ".m4a”.
So either rename the files (to .m4a f.e.), or add ".mp4" to the patterns for audio/mp4 in “Configure Desktop”->“Filetype Associations” (you may have to remove it from video/mp4 as well, I’m not sure). Then those files should be detected correctly.

My italics… Obviously “the actual content” is not, in this instance checked.

Yeah, not for all mimetypes is a content check configured (you can’t even correctly distinguish all mimetypes by only looking at the content, think of all those text files). But Windows only ever looks at the filename, whereas in Linux this is more flexible.

Thank you for the valuable info :slight_smile:

Yawn, sigh, etc etc. soprano-virtuoso.db is now 9.1Gb. A possibly related symptom is that Dolphin is now quite annoyingly ‘laggy’; click on a folder and wait two or three seconds until it opens.
I have never known a piece of software as awful as this shambles. It behaves like a virus, looks like a virus, is more difficult to get rid of than a virus. It can break your install.

Hm, sorry, I’m really running out of ideas here.
What’s the information you get when clicking on “Details…” in “Configure Desktop”->“Desktop Search”?

It behaves like a virus, looks like a virus, is more difficult to get rid of than a virus.

No it’s not. You can just turn it off.

And maybe you should at least turn off the file indexing part if it doesn’t behave on your system.

For what it’s worth this is what I’ve found whilst playing around with nepomuk.

The setting labelled “Show hidden folders” (System Settings > Desktop Search > Indexing > Customize Folders) toggles the “index hidden folders= true/false” in nepomukstrigirc.

By default nepomukindexer will index /home and all sub-directories below, unless explicitly told otherwise.

The method of exclusion is rather convoluted to say the least, with several options:

  1. Specify the indexing from “Customize Folders” - Sub-directories inherit the indexing status of their immediate parent, unless explicitly altered on a by directory basis.

  2. Use the check boxes (System Settings > Desktop Search > Indexing) for particular types. (Newly added option)

  3. Specify the explicit Mime type to exclude (System Settings > Desktop Search > Indexing > Advanced) - If at (2) for example Index Videos is deselected then “video/*” is added to the Mime type exclude list.

  4. Specify file name patterns (System Settings > Desktop Search > Indexing > Advanced).

Initially I had problems with directories being included I thought I’d excluded, and vice versa, especially when adding new directories. Together with the badly labelled “Show hidden folders”, which I had taken to mean “show in the directory tree”. I was on the verge of the “I don’t want this freakin’ nepomuk stuff!” - I think I had got so bogged down that I couldn’t see the wood for the trees.

Personally I found that explicitly including/excluding directories (1), together with file name patterns (4) was the most reliable, albeit time consuming way.

I know it’s a new database, but, what happens if you run “nepomukcleaner” ?

Also “nepomukshow” (use --help for options) is quite useful to see just what information has been indexed for any particular file.

http://wstaw.org/m/2013/10/18/plasma-desktophu3000.png

http://wstaw.org/m/2013/10/18/plasma-desktopWW3000.png