Help diagnosing a crash report

Ever since an update to LEAP 15.5 from 15.4, there have been a lot of crash reports showing in the system journal. There are several processes involved with this crash: fail2ban, python, iptables, kernel.

A typical crash report is shown below.
I have tried to duplicate it without success. The fail2ban version did not change with the update. I presume python (maybe) and iptables were updated.

The source of the failure is iptables. But is it because of iptables? Is it bad input? Or a kernel failure?

Should I this report on Bugzilla?

2023-07-05T15:14:34-0700 sma-server3 kernel: iptables: vmalloc error: size 0, page order 9, failed to allocate pages, mode:0x400cc0(GFP_KERNEL_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0
2023-07-05T15:14:34-0700 sma-server3 kernel: CPU: 0 PID: 12133 Comm: iptables Tainted: G                  N 5.14.21-150500.53-default #1 SLE15-SP5 3b90198179ad2dbddc570cfe6efd7895c9be3e4a
2023-07-05T15:14:34-0700 sma-server3 kernel: Hardware name: System manufacturer System Product Name/M3A78-EM, BIOS 1602    03/27/2009
2023-07-05T15:14:34-0700 sma-server3 kernel: Call Trace:
2023-07-05T15:14:34-0700 sma-server3 kernel:  <TASK>
2023-07-05T15:14:34-0700 sma-server3 kernel:  dump_stack_lvl+0x45/0x5b
2023-07-05T15:14:34-0700 sma-server3 kernel:  warn_alloc+0x116/0x180
2023-07-05T15:14:34-0700 sma-server3 kernel:  __vmalloc_node_range+0x390/0x4a0
2023-07-05T15:14:34-0700 sma-server3 kernel:  __vmalloc_node+0x57/0x70
2023-07-05T15:14:34-0700 sma-server3 kernel:  ? xt_alloc_table_info+0x26/0x70 [x_tables e20979056ab8b8537ed985ce4d87d9ec0f6393cb]
2023-07-05T15:14:34-0700 sma-server3 kernel:  xt_alloc_table_info+0x26/0x70 [x_tables e20979056ab8b8537ed985ce4d87d9ec0f6393cb]
2023-07-05T15:14:34-0700 sma-server3 kernel:  do_ipt_set_ctl+0x191/0x3bf [ip_tables fc299e32f3942b3711f7eeaf1a8aa3a911438972]
2023-07-05T15:14:34-0700 sma-server3 kernel:  nf_setsockopt+0x57/0x80
2023-07-05T15:14:34-0700 sma-server3 kernel:  ip_setsockopt+0x2cf/0x12f0
2023-07-05T15:14:34-0700 sma-server3 kernel:  __sys_setsockopt+0xf3/0x1e0
2023-07-05T15:14:34-0700 sma-server3 kernel:  __x64_sys_setsockopt+0x20/0x30
2023-07-05T15:14:34-0700 sma-server3 kernel:  do_syscall_64+0x5b/0x80
2023-07-05T15:14:34-0700 sma-server3 kernel:  ? handle_edge_irq+0x7e/0x1a0
2023-07-05T15:14:34-0700 sma-server3 kernel:  ? exc_page_fault+0x67/0x150
2023-07-05T15:14:34-0700 sma-server3 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
2023-07-05T15:14:34-0700 sma-server3 kernel: RIP: 0033:0x7f16b407b07a
2023-07-05T15:14:34-0700 sma-server3 kernel: Code: ff ff ff c3 48 8b 15 15 fe 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e6 fd 0c 00 f7 d8 64 89 01 48
2023-07-05T15:14:34-0700 sma-server3 kernel: RSP: 002b:00007fffabbfadf8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
2023-07-05T15:14:34-0700 sma-server3 kernel: RAX: ffffffffffffffda RBX: 000055566350ae80 RCX: 00007f16b407b07a
2023-07-05T15:14:34-0700 sma-server3 kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000005
2023-07-05T15:14:34-0700 sma-server3 kernel: RBP: 000055566350ae88 R08: 0000000000216018 R09: 00007f16b3b3fe50
2023-07-05T15:14:34-0700 sma-server3 kernel: R10: 00007f16b392a010 R11: 0000000000000202 R12: 000055566350ae88
2023-07-05T15:14:34-0700 sma-server3 kernel: R13: 0000000000215fb8 R14: 00007f16b392a070 R15: 00007f16b392a010
2023-07-05T15:14:34-0700 sma-server3 kernel:  </TASK>
2023-07-05T15:14:34-0700 sma-server3 kernel: Mem-Info:
2023-07-05T15:14:34-0700 sma-server3 kernel: active_anon:71703 inactive_anon:852670 isolated_anon:0
                                              active_file:656158 inactive_file:161408 isolated_file:0
                                              unevictable:20 dirty:188 writeback:0
                                              slab_reclaimable:87299 slab_unreclaimable:43879
                                              mapped:42446 shmem:1503 pagetables:18254 bounce:0
                                              free:90249 free_pcp:63 free_cma:0
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 active_anon:286812kB inactive_anon:3410680kB active_file:2624632kB inactive_file:645632kB unevictable:80kB isolated(anon):0kB isolated(file):0kB mapped:169784kB dirty:752kB writeback:0kB shmem:6012kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2144256kB writeback_tmp:0kB kernel_stack:18592kB pagetables:73016kB all_unreclaimable? no
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 DMA free:14304kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
2023-07-05T15:14:34-0700 sma-server3 kernel: lowmem_reserve[]: 0 2969 7819 7819 7819
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 DMA32 free:269528kB boost:0kB min:25612kB low:32012kB high:38412kB reserved_highatomic:0KB active_anon:36876kB inactive_anon:1889568kB active_file:552264kB inactive_file:51988kB unevictable:0kB writepending:736kB present:3259904kB managed:3093632kB mlocked:0kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB
2023-07-05T15:14:34-0700 sma-server3 kernel: lowmem_reserve[]: 0 0 4850 4850 4850
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 Normal free:77164kB boost:0kB min:41840kB low:52300kB high:62760kB reserved_highatomic:2048KB active_anon:249936kB inactive_anon:1521140kB active_file:2072628kB inactive_file:593540kB unevictable:80kB writepending:16kB present:5111808kB managed:4967188kB mlocked:80kB bounce:0kB free_pcp:248kB local_pcp:248kB free_cma:0kB
2023-07-05T15:14:34-0700 sma-server3 kernel: lowmem_reserve[]: 0 0 0 0 0
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14304kB
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 DMA32: 5128*4kB (UME) 3978*8kB (UME) 2308*16kB (UME) 1746*32kB (UME) 940*64kB (UME) 338*128kB (UME) 71*256kB (UME) 7*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 270320kB
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 Normal: 4959*4kB (UMEH) 1207*8kB (UMEH) 620*16kB (UMEH) 553*32kB (UMEH) 188*64kB (UMEH) 40*128kB (UMEH) 7*256kB (UEH) 4*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 78100kB
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
2023-07-05T15:14:34-0700 sma-server3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2023-07-05T15:14:34-0700 sma-server3 kernel: 846128 total pagecache pages
2023-07-05T15:14:34-0700 sma-server3 kernel: 27059 pages in swap cache
2023-07-05T15:14:34-0700 sma-server3 kernel: Swap cache stats: add 247635, delete 220636, find 759798/762651
2023-07-05T15:14:34-0700 sma-server3 kernel: Free swap  = 7432188kB
2023-07-05T15:14:34-0700 sma-server3 kernel: Total swap = 8384508kB
2023-07-05T15:14:34-0700 sma-server3 kernel: 2096925 pages RAM
2023-07-05T15:14:34-0700 sma-server3 kernel: 0 pages HighMem/MovableOnly
2023-07-05T15:14:34-0700 sma-server3 kernel: 77880 pages reserved
2023-07-05T15:14:34-0700 sma-server3 kernel: 0 pages cma reserved
2023-07-05T15:14:34-0700 sma-server3 kernel: 0 pages hwpoisoned

If you’ve never reported an openSUSE bug before, read openSUSE:Submitting bug reports - openSUSE Wiki and openSUSE:How to Write a Good Bugreport - openSUSE Wiki to determine whether you can write a good report, or should file a report at all.

Repeated malloc and call trace messages to me mean something needs to be fixed, which doesn’t often happen without somebody writing a report that includes a reproduction scenario that works when a developer tries.

These are just information messages. Calling every kernel message a “crash” is very misleading.

Kernel was requested to allocate 2MiB of contiguous memory and could not satisfy this request because no free contiguous area of this size was available:

The largest contiguous area is 512KiB. Even though the total amount of free memory is sufficient, it is too fragmented.

iptables loads rules as a single chunk and was known to cause similar memory allocation errors. You are using fail2ban which may result in a lot of rules requiring a lot of memory.

From information provided so far it is not a bug but lack of resources for your specific workload. Leap upgrade may be red herring - it is possible that something changed in your environment, you are getting more requests that you block and so the size of iptables rules grew.

You could try switching to nftables. If fail2ban does not support them natively, you may install iptables-backend-nft which provides implementation of iptables command using nftables.

If there wasn’t a Core Dump, it ain’t a crash.

 > coredumpctl 
Hint: You are currently not seeing messages from other users and the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
TIME                          PID  UID GID SIG     COREFILE EXE                                SIZE
Mon 2023-06-05 11:19:49 CEST 6035 1000 100 SIGSEGV missing  /usr/bin/kontact                    n/a
Sat 2023-06-24 18:27:29 CEST 2684 1000 100 SIGSEGV missing  /usr/bin/plasmashell                n/a
Wed 2023-07-05 13:29:26 CEST 4059 1000 100 SIGSEGV present  /usr/bin/akonadi_icaldir_resource 25.0M
 >

BTW, most «REAL» crashes are caused by memory exceptions – code attempts to access memory which doesn’t exist …

AFAICT nothing regarding the configuration of fail2ban changed. It is the same version before and after the update. The memory errors started occurring immediately after the restart of the update, when RAM should at its most contiguous. The errors occur randomly, both during a ban operation and an unban operation; most un-/ban operations produce no errors. Manually ban’ing or unban’ing produced no errors.

A single un-/ban operation involves a single IP address. Needing 2MB of contiguous memory seems extreme, especially for unban’ing. No errors occur during f2b’s startup when it initially loads iptables.

Hmm. Hence my request for assistance.
Currently the number of IP’s blocked is much lower than usual, about half the number that has been blocked recently. In 15.4 no problem with that load.

I (inadvertently) discovered that iptables allows a maximum of 65536 blocked IPs. At no time has the number of IPs set by f2b ever exceeded about 20000.

 $ lsmem --output SIZE,STATE,REMOVABLE,BLOCK,NODE,ZONES
 SIZE  STATE REMOVABLE BLOCK NODE  ZONES
 128M online       yes     0    0   None
   3G online       yes  1-24    0  DMA32
 4.9G online       yes 32-70    0 Normal

Memory block size:       128M
Total online memory:       8G
Total offline memory:      0B