Is anyone else seeing high memory usage for the waagent service (provided by the python-azure-agent package) hosted on azure? On a machine with 4GB RAM, the process python -u /usr/sbin/waagent -run-exthandlers starts with around 20MB and in a couple of months goes up to 1.5GB. I have tried installing the package directly from Azure’s github repo https://github.com/Azure/WALinuxAgent but that has the same issue as well. We also use Ubuntu instances on Azure and aren’t seeing any issue on them.
I don’t know if it’s a typo in the documentation but one of the listed dependencies is something called “ip-route”
On openSUSE, we have “ip route” as part of the ip tools package, but no “ip-route”
Have you tried to determine what is in your memory? Have you run top (or I’d recommend htop)?
Have you inspected and compared your logfiles?
What makes you think the memory issue (sounds like a memory leak when resources are not free) is related to the system and not the running application (generally far more likely)? In fact, what kind of app is running on your system and do you know enough about how that app was written to evaluate it for resource usage and possible memory leaks? For that matter, although not a definitive test, you could try deploying a copy in your own LAN, run some load testing and see if you can replicate the memory issue. It’s not exactly the same as the Azure cloud, but if you know how to test, a lot can be learned running in a controlled environment.
Have you inspected your waagent logfile? Maybe you’ll find something, maybe not… Or you might find something perhaps not definitive but possibly contributory… After all, waagent is supposed to collect and send performance metrics.
top shows the process ‘python’ running with the high usage. ps shows the process to be ‘python -u /usr/sbin/waagent -run-exthandlers’.
Have you inspected and compared your logfiles?
There is not much in the waagent log files and no observable difference between Ubuntu and openSUSE(after removing the single preinstalled extension).
What makes you think the memory issue (sounds like a memory leak when resources are not free) is related to the system and not the running application (generally far more likely)? In fact, what kind of app is running on your system and do you know enough about how that app was written to evaluate it for resource usage and possible memory leaks? For that matter, although not a definitive test, you could try deploying a copy in your own LAN, run some load testing and see if you can replicate the memory issue. It’s not exactly the same as the Azure cloud, but if you know how to test, a lot can be learned running in a controlled environment.
I agree that the problem is likely to be with the application and not the OS. We have 12 instances running openSUSE, all having the same issue and yet our Microsoft Support Engineer claims that he can’t reproduce the issue! Our instances are running different applications (mongodb/redis/tomcat/nodejs) and don’t have any packages in common other than the base system.
We don’t have a local machine that’s up all the time to test the memory leak.