Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#701 closed defect (fixed)

hdaudio crash in hda_corb_fini

Reported by: Jakub Jermář Owned by: Jiri Svoboda
Priority: major Milestone: 0.7.2
Component: helenos/drv/hdaudio Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

As of mainline,2846, hdaudio crashes with the following stack during startup:

Task hdaudio (31) killed due to an exception at program counter 0x0000000000002a0f.
cs =0x0000000000000023	rip=0x0000000000002a0f	rfl=0x0000000000210206	err=0x0000000000000004
ss =0x000000000000001b
rax=0x000000000002c150	rbx=0x000000000002c4a0	rcx=0x000000000001daad	rdx=0x0000000000000000
rsi=0x0000000000000000	rdi=0x000000000002c150	rbp=0x000000000012dd70	rsp=0x000000000012dd50
r8 =0x0000000000000000	r9 =0x0000000000000000	r10=0x0000000000000000	r11=0x0000000000200216
r12=0xffffffff81dd9000	r13=0x0000000000000000	r14=0x0000000000000000	r15=0x0000000000000000
0x000000000012dd70: 0x0000000000002a0f()
0x000000000012ddd0: 0x00000000000038a0()
0x000000000012deb0: 0x00000000000041bd()
0x000000000012df00: 0x0000000000005b8b()
0x000000000012df80: 0x0000000000006073()
0x000000000012dfd0: 0x00000000000184ba()
0x000000000012dff0: 0x000000000000f8df()
Kill message: Page fault: 0x000000000002c198.
taskmon: Task 31 fault in thread 0xffffffff82588000.
taskmon: Executing /app/taskdump -t 31
Task Dump Utility
Dumping task 'hdaudio' (task ID 31).
failed opening file
failed opening file
Loaded symbol table from /drv/hdaudio/hdaudio

Threads:
 [1] hash: 0xffffffff82588000
Thread 0xffffffff82588000: PC = 0x0000000000002a0f (hda_corb_fini+16). FP = 0x000000000012dd70
  0x000000000012dd70: 0x0000000000002a0f (hda_corb_fini+16)
  0x000000000012ddd0: 0x00000000000038a0 (hda_ctl_init+1309)
  0x000000000012deb0: 0x00000000000041bd (hda_dev_add+1432)
  0x000000000012df00: 0x0000000000005b8b (driver_dev_add+297)
  0x000000000012df80: 0x0000000000006073 (driver_connection_devman+120)
  0x000000000012dfd0: 0x00000000000184ba (connection_fibril+295)
  0x000000000012dff0: 0x000000000000f8df (fibril_main+42)

Address space areas:
 [1] flags: R-XC base: 0x0000000000001000 size: 139264
 [2] flags: RW-C base: 0x0000000000023000 size: 8192
 [3] flags: RW-C base: 0x0000000000025000 size: 8192
 [4] flags: RW-C base: 0x000000000002e000 size: 1048576
 [5] flags: RW-- base: 0x000000000012f000 size: 16384
 [6] flags: R--C base: 0x0000000000133000 size: 4096
 [7] flags: RW-- base: 0x0000000000134000 size: 4096
 [8] flags: RW-- base: 0x0000000000135000 size: 4096
 [9] flags: RW-C base: 0x0000000000136000 size: 16384
 [10] flags: RW-C base: 0x000000000013b000 size: 4096
 [11] flags: R-XC base: 0x0000000070001000 size: 77824
 [12] flags: RW-C base: 0x0000000070014000 size: 12288
 [13] flags: RW-C base: 0x0000000070017000 size: 8192
 [14] flags: RW-C base: 0x000000007001a000 size: 4096
 [15] flags: RW-C base: 0x000000007001c000 size: 4096
 [16] flags: RW-C base: 0x000000007001e000 size: 1048576
 [17] flags: RW-C base: 0x00007ffffff00000 size: 1048576

Fibril 0x000000000002bfa0:
Failed dumping fibrils.

I am going to attach full console log and the hdaudio binary.

Attachments (3)

console.log (15.5 KB ) - added by Jakub Jermář 7 years ago.
hdaudio.gz (60.9 KB ) - added by Jakub Jermář 7 years ago.
hdaudio binary that crashed
console2.log (14.5 KB ) - added by Jakub Jermář 7 years ago.
Another crash, this time happening in hda_ctl_init()

Download all attachments as: .zip

Change History (10)

by Jakub Jermář, 7 years ago

Attachment: console.log added

by Jakub Jermář, 7 years ago

Attachment: hdaudio.gz added

hdaudio binary that crashed

comment:1 by Jakub Jermář, 7 years ago

Component: helenos/unspecifiedhelenos/drv/hdaudio
Owner: set to Jiri Svoboda

comment:2 by Jiri Svoboda, 7 years ago

The sequence of events that occurred:

  • during hda_codec_init() the driver stopped getting responses from the HDA controller
  • the driver returned failure from hda_codec_init()
  • hda_ctl_init() dropped into error recovery path and tried to uninit the controller
  • hda_ctl_init() called hda_corb_fini()
  • we got page fault while accessing address that should be valid

I cannot reproduce the communication failure that was at the beginning of the problem. Please provide more information how to reproduce it (tried mainline amd64 profile, Qemu 2.10.1, gcc 7.1.0, binutils 2.28), ran with ew.py.

If I make hda_codec_init() return failure, I can reproduce the second part (crash due to page fault). Looks like the finalization code of the controller wasn't run yet and does not work as expected. I am not sure what's the problem yet, it's quite puzzling.

comment:3 by Jakub Jermář, 7 years ago

This required many many reboots to happen (I got this while reproducing #700). This was mainline amd64 but with altered optimization level (think -O0), latest toolchain, QEMU 2.10.0, binutils 2.28).

I also got another one, which crashes in hda_ctl_init. See attachments for the console log. This one was created with, IIRC, -O3.

by Jakub Jermář, 7 years ago

Attachment: console2.log added

Another crash, this time happening in hda_ctl_init()

comment:4 by Jakub Jermář, 6 years ago

Milestone: 0.7.1

comment:5 by Jakub Jermář, 6 years ago

I fixed a bug in hda_corb_init() in commit d2c5159dca2974a0e2e4741ff2b4d8235af62f8b. The bug ignored the return value from dmamem_map_anonymous and also left hda→ctl→corb_virt set to AS_AREA_ANY (-1) in case of error.

comment:6 by Jakub Jermář, 6 years ago

Resolution: fixed
Status: newclosed

Fixed in commit 13db20447e9ad45e946906ed3a8fb2e7b7de7f23

The problem was that hda, &hda->ctl->corb_virt and &hda->ctl->rirb_virt occupied the same page. So when we errorneously DMA unmapped &hda->ctl->rirb_virt instead of hda->ctl->rirb_virt, we got the pagefault in hda_corb_fini when we tried to access hda.

comment:7 by Jakub Jermář, 6 years ago

Milestone: 0.7.2
Note: See TracTickets for help on using tickets.