Opened 6 months ago

Closed 6 months ago

Last modified 2 months ago

#864 closed defect (notadefect)

VFS crashes on ia64

Reported by: Jakub Jermář Owned by:
Priority: major Milestone: 0.14.1
Component: helenos/srv/vfs Version: mainline
Keywords: ia64 Cc:
Blocker for: Depends on:
See also:

Description

After the toolchain upgrade from GCC 8.2 to 13.2 VFS started to crash during startup on ia64/ski:

[init:vfs(6)] vfs: Accepting connections
[init:ext4fs(8)] ext4fs: Accepting connections
Task init:vfs (6) killed due to an exception at program counter 0x400000000001d4c0.
ar.bsp=0xe00000000416c148       ar.bspstore=0x60000000002f4058
ar.rnat=0x0     ar.rsc=0xc
ar.ifs=0x8000000000000288       ar.pfs=0xc000000000000288
cr.isr=0x400000000      cr.ipsr=0x1013080a6010
cr.iip=0x400000000001d4c0, #0   (<unknown>)
cr.iipa=0x400000000001f560      (<unknown>)
cr.ifa=0x400    (<unknown>)
Kill message: Page fault: 0x0000000000000400.

Change History (6)

comment:1 by Jakub Jermář, 6 months ago

Upon closer inspection the above error is triggered by an invocation of fibril_mutex_unlock(0x400) from vfs_files_init, which is inlined into _vfs_fd_alloc. Adding debug prints to vfs_files_init masks the bug and the system is usually able to boot normally.

comment:2 by Jakub Jermář, 6 months ago

The respective piece of code looks like this:

4000000000001a00 <_vfs_fd_alloc>:
4000000000001a00:       08 48 39 18 80 05       [MMI]       alloc r41=ar.pfs,14,12,0
4000000000001a06:       c0 02 80 00 42 00                   mov r44=r32
4000000000001a0c:       05 00 c4 00                         mov r40=b0
4000000000001a10:       09 38 01 41 00 21       [MMI]       adds r39=64,r32
4000000000001a16:       a0 02 04 00 42 40                   mov r42=r1
4000000000001a1c:       04 10 41 00                         zxt1 r34=r34;;
4000000000001a20:       11 28 fd 01 00 24       [MIB]       mov r37=127
4000000000001a26:       b0 02 04 65 00 00                   mov.i r43=ar.lc
4000000000001a2c:       e8 b4 01 50                         br.call.sptk.many b0=400000000001cf00 <fibril_mutex_lock>;;
4000000000001a30:       08 60 01 40 00 21       [MMI]       mov r44=r32
4000000000001a36:       e0 00 9c 30 20 20                   ld8 r14=[r39]
4000000000001a3c:       00 50 01 84                         mov r1=r42
4000000000001a40:       0a 68 05 00 00 24       [MMI]       mov r45=1;;
4000000000001a46:       c0 02 00 10 48 e0                   mov r44=1024
4000000000001a4c:       00 70 18 e4                         cmp.eq p7,p6=0,r14
4000000000001a50:       16 00 00 00 00 c8       [BBB]       nop.b 0x0
4000000000001a56:       01 f0 01 80 21 00             (p07) br.cond.dpnt.few 4000000000001e30 <_vfs_fd_alloc+0x430>
4000000000001a5c:       10 00 00 40                         br.few 4000000000001a60 <_vfs_fd_alloc+0x60>
4000000000001a60:       11 00 00 00 01 00       [MIB]       nop.m 0x0
4000000000001a66:       00 00 00 02 00 00                   nop.i 0x0
4000000000001a6c:       28 ba 01 50                         br.call.sptk.many b0=400000000001d480 <fibril_mutex_unlock>;;    <==== !!! HERE 0x400 is passed
4000000000001a70:       08 60 01 40 00 21       [MMI]       mov r44=r32

comment:3 by Jakub Jermář, 6 months ago

I think this is a compiler bug. Look at how r44 (i.e. out0) is used. First, it is initialized with the address of the mutex at address 1a06 from r32 (i.e. in0). out0 is then passed unaltered to fibril_mutex_lock at address 1a2c. After the mutex is taken, out0 is refilled again from r32. The assembly between addresses 1a36 and 1a56 corresponds to the if (!vfs_data->files) check. Here, however, out0 is prepared to be used for the possible call to malloc and gets rewritten by the value of 0x400, which is actually the size to be allocated. If the branch is taken, out0 will be fixed later (not shown in the snippet) to contain the original in0. However, if it is not taken, out0 will continue to hold the allocation size even for the call to fibril_mutex_unlock.

comment:4 by Jakub Jermář, 6 months ago

Filed a GCC bug.

comment:5 by Jakub Jermář, 6 months ago

Resolution: notadefect
Status: newclosed

A workaround has been pushed in commit:

commit e8a6279ff3841e7471155ab4bf21d5249a85c4e6 (HEAD -> master, origin/master)
Author: Jakub Jermář <jakub@jermar.eu>
Date:   Sat Nov 18 16:37:51 2023 +0100

    Work around GCC bug 112604
    
    Turn off the optimization which seems to be responsible for the issue
    on ia64.

comment:6 by Jiri Svoboda, 2 months ago

Milestone: 0.13.10.14.1

Milestone renamed

Note: See TracTickets for help on using tickets.