Opened 13 years ago

Closed 13 years ago

#290 closed defect (fixed)

Infrequent hang of loaders while reading from fat during startup

Reported by: Jakub Jermář Owned by: Jakub Jermář
Priority: major Milestone: 0.4.3
Component: helenos/fs/fat Version:
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

In about 1/10 boots of default mainline,844 on ia32, I experience a hang when loading of the bdsh tasks on all vc's gets stuck in the loader phase. The loader tasks seem to be waiting for VFS / FAT to answer a read() request.

Looking at the FAT task, I can see that there is one dispatched pair of VFS_OUT_READ and IPC_M_DATA_READ. The other reads are still in a non-dispatched state.

For some reason, FAT seems to be unable to make enough forward progress to answer the read request.

Change History (10)

comment:1 by Jakub Jermář, 13 years ago

Status: newaccepted

comment:2 by Jakub Jermář, 13 years ago

After yesterday's luck, this issue was much harder to reproduce today (about 1/100). I noticed that the fat thread is in the ready state when observed from kconsole, which suggests some kind of infinite loop. Using the btrace kconsole command, I can see that the stack trace remains one of the following:

hash_table_find() / cache_compare()
hash_table_remove()
block_get()
_fat_block_get()
fat_block_get()
fat_read()

comment:3 by Jakub Jermář, 13 years ago

It looks like we are dealing with a corrupted collision chain of the cache block hash table. By adding some printfs into hash_table_remove(), hash_table_remove() operates on a hash table which looks like this:

0x38608 is the head link_t

Then there are two entries that form a closed cycle via the next pointer:

0x1e42c
  next = 0x38628
0x38628
  next = 0x1e42c

Note that the second entry is actually only 32 bytes away from the head link. But this is normally impossible, because the list is formed of block_t structures that are 88 bytes long and the hash_link itself is located at offset 76.

comment:4 by Jakub Jermář, 13 years ago

From another instance of the bug:

head=0x385c8
  next=0x1ed1c
  prev=0x1ed1c
0x1ed1c
  next=0x38618
  prev=0x38618
0x38618
  next=0x1ed1c
  prev=0x1ed1c

So the list's prev and next pointers are consistent, only the last record should not be there. This time, it is 80 bytes offset from the head of the list, meaning that there is not enough space between the head and the record for the record to be from a valid allocated memory at the same time as the head of the list.

Has the memory holding the head of the list been freed?

comment:5 by Jakub Jermář, 13 years ago

I managed to reproduce this even when I added some extra debugging facilities:

  • free() sets the freed area with a special pattern so that use-after-free is detected
  • list operations verify that the link is a link and head is a head

These techniques helped me to discover bugs in other code (already fixed in mainline) but unfortunately did not help with debugging of this issue.

comment:6 by Jakub Jermář, 13 years ago

I think I have just had an epiphany: the linked list looks like the middle record was first inserted to the list with head in the first record and later it was inserted into another list with head in the third record without first being removed from the first list. This would suggest an logic error in maintaining the block cache.

in reply to:  6 comment:7 by Jakub Jermář, 13 years ago

Replying to jermar:

I think I have just had an epiphany: the linked list looks like the middle record was first inserted to the list with head in the first record and later it was inserted into another list with head in the third record without first being removed from the first list. This would suggest an logic error in maintaining the block cache.

And given that the linked lists in question form a hash table suggests that the middle record was inserted twice, each time with a key hashed to a different bucket.

comment:8 by Jakub Jermář, 13 years ago

I managed to catch the FAT server when it was corrupting the hash table. The stack shows:

fat_read() → fat_block_get() → _fat_block_get() → block_get(),

where we detect an attempt to initialize a hash table link, which is still hashed in.

We have this piece of assembly:

    7ee7:       8b 53 58                mov    0x58(%ebx),%edx
    7eea:       8b 43 54                mov    0x54(%ebx),%eax
    7eed:       c7 43 5c e0 af dc ba    movl   $0xbadcafe0,0x5c(%ebx)
    7ef4:       c7 43 60 01 00 00 00    movl   $0x1,0x60(%ebx)
    7efb:       89 02                   mov    %eax,(%edx)
    7efd:       8b 43 54                mov    0x54(%ebx),%eax
    7f00:       c7 43 54 00 00 00 00    movl   $0x0,0x54(%ebx)
    7f07:       89 50 04                mov    %edx,0x4(%eax)
    7f0a:       8b 43 40                mov    0x40(%ebx),%eax
    7f0d:       89 45 e0                mov    %eax,-0x20(%ebp)
    7f10:       8d 45 e0                lea    -0x20(%ebp),%eax
    7f13:       c7 43 58 00 00 00 00    movl   $0x0,0x58(%ebx)
    7f1a:       52                      push   %edx
    7f1b:       6a 01                   push   $0x1
    7f1d:       50                      push   %eax
    7f1e:       8b 45 c8                mov    -0x38(%ebp),%eax
    7f21:       50                      push   %eax
    7f22:       e8 69 e9 00 00          call   16890 <hash_table_remove>
    7f27:       83 c4 10                add    $0x10,%esp
    7f2a:       83 ec 0c                sub    $0xc,%esp
    7f2d:       53                      push   %ebx
    7f2e:       e8 2d 58 00 00          call   d760 <fibril_mutex_initialize>
    7f33:       c7 43 18 01 00 00 00    movl   $0x1,0x18(%ebx)
    7f3a:       c6 43 1c 00             movb   $0x0,0x1c(%ebx)
    7f3e:       c6 43 1d 00             movb   $0x0,0x1d(%ebx)
    7f42:       8d 43 20                lea    0x20(%ebx),%eax
    7f45:       89 04 24                mov    %eax,(%esp)
    7f48:       e8 93 5b 00 00          call   dae0 <fibril_rwlock_initialize>
    7f4d:       83 c4 10                add    $0x10,%esp
    7f50:       81 7b 5c e0 af dc ba    cmpl   $0xbadcafe0,0x5c(%ebx)
    7f57:       0f 84 1d 02 00 00       je     817a <block_get+0x56a>
    7f5d:       c7 43 54 00 00 00 00    movl   $0x0,0x54(%ebx)
    7f64:       c7 43 58 00 00 00 00    movl   $0x0,0x58(%ebx)
    7f6b:       c7 43 5c e0 af dc ba    movl   $0xbadcafe0,0x5c(%ebx)
    7f72:       c7 43 60 01 00 00 00    movl   $0x1,0x60(%ebx)
    7f79:       81 7b 6c e0 af dc ba    cmpl   $0xbadcafe0,0x6c(%ebx)
    7f80:       0f 84 db 01 00 00       je     8161 <block_get+0x551>
    8161:       83 7b 70 02             cmpl   $0x2,0x70(%ebx)
    8165:       0f 85 1b fe ff ff       jne    7f86 <block_get+0x376>
    816b:       c7 05 00 00 00 00 01    movl   $0x1,0x0    <==== CRASHED HERE
    8172:       00 00 00 
    8175:       e9 0c fe ff ff          jmp    7f86 <block_get+0x376>

The above roughly corresponds to this piece of block_get():

                       fibril_mutex_unlock(&b->lock);

                        /*
                         * Unlink the block from the free list and the hash
                         * table.
                         */
                        list_remove(&b->free_link);
                        temp_key = b->lba;
                        hash_table_remove(&cache->block_hash, &temp_key, 1);
                }

                block_initialize(b); <=== CRASHED HERE 
                b->devmap_handle = devmap_handle;
                b->size = cache->lblock_size;
                b->lba = ba;
                b->pba = ba_ltop(devcon, b->lba);
                hash_table_insert(&cache->block_hash, &key, &b->hash_link);

In block_initialize(), we do:

        fibril_mutex_initialize(&b->lock);
        b->refcnt = 1;
        b->dirty = false;
        b->toxic = false;
        fibril_rwlock_initialize(&b->contents_lock);
        link_initialize(&b->free_link);
        link_initialize(&b->hash_link); <=== CRASHED HERE

comment:9 by Jakub Jermář, 13 years ago

I think what happens is this:

  1. block_get(ba1)
  2. block_get(ba2), ba2 == ba1

Thanks to a race condition, block_get() does not detect that it has just instantiated a block_t for ba1 and instantiates it once again for ba2. So that we end up with two block_t's in the hash table having the same key.

  1. block_put(ba2)
  2. block_get(ba3), ba3 != ba1

In block_get() we now decide to recycle ba2 for instantiating ba3. So we remove ba2 from the free list and want to hash it out from the hash table. So we call hash_table_remove() and pass ba2 as the key. The problem is that this will find and unhash the block_t for ba1 because it was the first block_t with this key added to the hash bucket list. ba2 will remain in the hash table. Then we hash ba2's block_t for the second time, corrupting thus the hash table.

comment:10 by Jakub Jermář, 13 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.