Fedora 16 kernel bugzilla status report from 2012-02-10 to 2012-02-17

494 bugs open in total. (up from 378 last week)
66 bugs closed.


Interesting closures:

Another couple dozen wireless driver dupes. The most commonly reported bug is still 768639: WARNING: at /builddir/build/BUILD/kernel-3.1.fc17/compat-wireless-2011-12-01/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0×158/0x1f0 [ath9k]()

Second most common source of dupe bugs this week was the i915 driver, which grew several new problems.
772886: WARNING: at drivers/gpu/drm/i915/intel_display.c:953 intel_disable_pipe+0×120/0×150 [i915]()
790701: WARNING: at drivers/gpu/drm/i915/intel_dp.c:344 intel_dp_check_edp+0×65/0xb0 [i915]()
790702: WARNING: at drivers/gpu/drm/i915/intel_dp.c:1006 ironlake_edp_panel_vdd_on+0×197/0x1a0 [i915]()

739499: kernel-3.1.0-0.rc6.git0.3.fc16.x86_64 won’t boot on EC2
Fix picked up in the 3.2 rebase.

We saw a few more dupes of the sysfs link remove warning.
This is queued up for the next update already.

In last week’s f16 report, under ‘totally weird shit’, I pointed at bug 787527: kernel BUG at mm/mmap.c:2378!. At the time I had no idea what was happening. Over the week, we got several more reports. Hugh Dickins chased this down to a locking bug in the transparent huge pages code. (upstream thread).
We’ll pull in the final fix for that in the next update.

We’ve had a number of bugs reported from the soft lockup detector firing. When this happens, the traces in a lot of cases don’t make a lot of sense.
The common thing seems to be that they are all using some form of virtualisation. Here’s one from vmware for example (though our first f17 kernel bug is the same problem, but in qemu). For now, booting guests with nosoftlockup is probably the best we can do. There is some work ongoing upstream to better handle this situation.
TODO: Go through all the rest of the soft lockup bugs and see if any of them are the same problem. (likely).

772649: Frequency not scaling on demand – Sandy Bridge
We’ve seen all kinds of power management disasters on sandybridge systems.
From the ongoing i915 rc6 fiasco, to BIOS bonghits that take away P-states when things get too hot.
I suspect it’s all related.

790097: your kernel is tainted by flags
We got so many tainted bug reports that we don’t care about automatically filed by abrt, we had the abrt guys put in a dialog explaining to users that it wasn’t going to file bugs.
So naturally, users have started filing them by hand. Derp.

Just like in F15, we got a bunch more reports of the sd_revalidate_disk bug.
The fix for which is going to be in next weeks update.


91 still-open bugs got filed, or changed in some way.. Of those, here’s some of the more interesting ones.

428555: Soft lockup while doing load_policy
A very old SELinux bug, where loading the policy takes a really long time.
A cond_resched would silence the soft lockup detector, but I’m really curious why it’s taking 22 seconds to load a policy.
Something really doesn’t add up here. AFAIK, this is the only report we’ve ever had of a policy load taking this long.

593035: mount.nfs: page allocation failure. order:4, mode:0xc0d0
The new NFS idmapper code should fix this problem, but is only just getting tested in f17.
Once it’s proven itself there, we’ll look at backporting whatever is necessary to 16. (f15 is likely to be EOL at that point).
This will require userspace updates, which is another reason it won’t be happening in f15.

There were a bunch more irqpoll bugs reported. Still no resolution on the automatic fallback-to-polling idea upstream.

assorted wireless:
746744: Can not connect to PEAP using Intel Corporation WiFi Link 5100
755370: ath9k stability issues
767855: Wifi performance issues (Tx aggregation enabled on ra=MAC)
768639: WARNING: at /builddir/build/BUILD/kernel-3.1.fc17/compat-wireless-2011-12-01/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0×158/0x1f0 [ath9k]()
770484: WARNING: at /builddir/build/BUILD/kernel-3.1.fc16/compat-wireless-3.2-rc6-3/drivers/net/wireless/iwlwifi/iwl-trans-pcie-tx.c:739 iwl_enqueue_hcmd+0x5c8/0x5f0 [iwlwifi]()
770595: WARNING: at /builddir/build/BUILD/kernel-3.1.fc16/compat-wireless-3.2-rc6-3/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c:461 iwl_irq_tasklet+0x3bd/0x7c0 [iwlwifi]()
773513: Wifi wireless network connection abruptly stop working
773652: [ath9k] randomly disconnects wireless[AR9285] — lenovo g475
785422: Wireless fails after kernel update to kernel 3.2.2.1 pae
785561: 3.2.5-3.fc16.x86_64/X53S/K53SV iwlwifi runs like a sloth compared to ath9k
785913: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/iwlwifi/iwl-agn-tx.c:396 iwlagn_tx_skb+0x98d/0xa10 [iwlwifi]()
786609: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/include/net/mac80211.h:3618 rate_control_send_low+0x23e/0×250 [mac80211]()
787649: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/brcm80211/brcmsmac/main.c:7998 brcms_c_wait_for_tx_completion+0×99/0xb0 [brcmsmac]()
788012: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/net/mac80211/driver-ops.h:10 ieee80211_bss_info_change_notify+0x28a/0×290 [mac80211]()
789605: rtl8192cu: After 5~6 minutes, wireless usb lancard doesn’t work (cannot connect internet).
789159: network connection failure
790810: ath5k port gets “hard blocked” when Wireless is disabled via NetworkManager
794710: ath9k: Cannot enable WiFi in gnome-shell (toggle button switches back to Off state)
790275: BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 rtl92ce_get_desc()

ethernet:
625776: e1000e crashes with Intel 82574L
720207: Realtek rtl8188ce works slow: speed is around 1Mb/s
794788: Wake-On-LAN stopped working after upgrade from FC15 to FC16
781217: crash after unplugging DSL cable (atl1c?)

suspend/hibernate:
783032: Can’t suspend to RAM when IR dongle is allowed to wake
788433: Core i7 cannot pm-hibernate/pm-suspend/thaw properly
789699: suspend fails by instant resume
791149: System can’t be suspended with kernel 3.2.x
791267: System reboots immediately after hibernating with 3.2 kernels
794525: 3.2.6-3.fc16.x86_64 doesnt suspend
789708: Hibernating fails all time
767084: kernel crash after back from sleep

boot failures:
789536: Can’t boot into new kernel
789679: kernel 3.2.3-2, 3.2.5-3 won’t boot encrypted setup
791133: Fedora 16 doesn’t boot with 3.2.6-3.fc16.x86_64 kernel on my notebook

Misc oopses/warn_on’s/scary shit:
721127: Heavy disk I/O (MD RAID?) crashes or freezes Fedora 15
787862: WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0()
788706: WARNING: at block/genhd.c:1568 disk_clear_events+0×106/0×110()
791277: WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9b/0xa6()
794692: WARNING: at fs/dcache.c:2485 prepend_path+0x18c/0x1a0()
789990: BUG: unable to handle kernel paging request at 000000011b02e000 vmap_page_range_noflush()
790013: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 path_lookupat()
794531: BUG: unable to handle kernel NULL pointer dereference at (null) sock_init_data()
769576: WARNING: at lib/list_debug.c:56 __list_del_entry+0×82/0xd0()
770581: WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7a/0xa0()
771794: kernel: general protection fault: 0000 [#1] SMP
772738: system crash
755334: Kernel freezes, loops audio, under moderate cpu load
788981: map_vm_area() BUG: unable to handle kernel paging request at 000000011b02e000
789993: BUG: Bad page state in process tar pfn:27d36
794639: BUG: Bad page map in process firefox pte:02126065 pmd:19f00067

btrfs:
789632: WARNING: at fs/btrfs/extent-tree.c:5985 btrfs_alloc_free_block+0×354/0×360 [btrfs]()
790297: kernel BUG at fs/btrfs/transaction.c:1337!
790232: untarring incredibly slow over NFS even worse with BTRFS export

Brightness changing seems broken:
702352: Brightness adjustment FN keys doesn’t work
784532: kernel-3.2.1-3.fc16.x86_64 seems to break settings-screen- brightness slider control so disappears
788675: acpi_video_device_lcd_get_level_current() BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
789962: Cannot adjust the brightness of the display in my laptop by pressing the function key

Fedora 15 kernel bugzilla status report from 2012-02-10 to 2012-02-17

The 3.2 rebase for F15 got pushed to updates. Not a huge amount of post-update activity.

378 bugs open in total. (down from 389 last week)
9 bugs closed.


Interesting closures:
445757: name_count maxed, losing inode data messages in dmesg
Purely cosmetic, but very annoying for people who saw it. audit periodically caused these messages to be spewed:

audit: name_count maxed, losing inode data: dev=00:07, inode=735975

Audit was using a fixed length array to store inodes. The message above got printed whenever something caused enough filesystem activity to create 20 or more inodes in a single syscall. (Something like loading a module could create a whole bunch of sysfs nodes for eg).
The fix was to dynamically allocate them when the array fills up. (The array was also shrunk to 5, which should be the common case). Pretty straight-forward, but for a multitude of reasons this took nearly two years to get merged upstream. Finally fixed in 3.3-rc1. This doesn’t cleanly apply to 3.2, and as it’s purely cosmetic, we’ll wait until we rebase to pick this up in f15/f16.

790982: Kernel crash when unplugging a USB key
Another dupe of the SCSI crash in sd_revalidate_disk that I mentioned last week.
This got mentioned upstream after other distros also started seeing it. Jun’ichi Nomura & Tejun Heo seem to have arrived at a conclusion, so that problem should be put to bed soon. We’ll backport this to F15/16 if it doesn’t appear in -stable first.


20 still-open bugs got filed, or changed in some way.
Of those, here’s some of the more interesting ones.

717211, 718886 & 715137: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xe2/0×147
We continue to see these watchdog timeouts being reported for r8169/atl1c/e1000e. No progress.

720005: possible threading issue on s390x
After last week’s f15 status report, Simon Farnsworth noticed this bug, and thought it may have been related to a threading bug in python’s os.fork. That may be a problem, but it seems there is at least one non-python related deadlock too. There may be more than one bug here being confused as a single problem.

729460: Periodically typed zeros to every text-field
This is one of those “wtf is even going on here?” bugs. No idea yet.
(Personally I’ve had nothing but problems every time I’ve tried one of those wireless keyboards. I don’t know why people persist)

735380: WARNING: at fs/inode.c:901 unlock_new_inode+0x2e/0x4a()
The F15 variant of the unlock_new_inode bug. This has been driving us crazy for months now. It seems that for some users, after they resume from hibernate, they hit a condition where we try to unlock an inode which was never locked.
It smells like memory corruption of some kind, but tracking it down has proved to be very difficult.
It’s made worse by the fact that none of the developers chasing it can reproduce it.
We’re going to throw this patch in the next build to see if it makes any difference at all, but it feels like shooting in the dark.

787054: DMA: Out of SW-IOMMU space for 16 bytes
Reported against 3.1. Not reproduced on 3.2 yet. User has hit an ath9k WARN, which may or may not be related. Too early to say.

789080: WARNING: at lib/kref.c:34
Some reference count got corrupted after resume from suspend.

789659: limited network bandwidth
User reported that his download speeds were chopped in half in 3.2.
Being worked out upstream on the netdev list
What’s interesting about this bug, is that this user was the only one who noticed.

Fedora rawhide bugzilla status report from 2012-02-03 to 2012-02-10

The rawhide kernel is continuing to rebase towards 3.3. (currently at -rc3)
There are 149 open bugs right now.

Things are pretty quiet in rawhide right now. Perhaps things will pick up after the F17 alpha/beta’s start seeing more use, but at the moment there’s considerably more bug activity going on in the releases than the development branch.

In the last week, 6 got closed.

  • 772772: rt2860 should now be working fine.
  • 703118: Dell ST2220T touch screens should work
  • 787373: PowerPC compile failure fixed
  • 626026: Some rcu_deference_check warnings should now be fixed.
  • 788125: usrmove related fallout.
  • 785295: Watchdog overflow no longer seems to be occuring.

16 bugs got filed/changed in the last week.

  • 783211: Cache inconsistency when reading from a partition vs the parent block_device
  • 647429: GFS2: [RFE] Implement trimfs ioctl
  • 636287: GFS2: [RFE] Make GFS2 handle errors more gracefully
  • 785939: iwlwifi is spewing garbage in the logs, sometimes hangs
  • 785772: Logs filled by ICMPv6 RA: ndisc_router_discovery() failed to add default route
  • 696219: No sound input using snd-hda-intel
  • 735641: Oops in kernel-3.1.0-0.rc4.git0.0.fc16.i686 while plugging Sony digital camera
  • 645877: possible circular locking at dquot_commit
  • 788064: reproducible kernel BUG at fs/btrfs/inode.c:1668!
  • 748159: SD card reader not detected on ASUS notebook
  • 759213: Truncated core dumps generated when piped through custom hook
  • 789017: [abrt] kernel: BUG: MAX_LOCKDEP_ENTRIES too low!
  • 787281: [abrt] kernel: WARNING: at fs/sysfs/inode.c:323 sysfs_hash_and_remove+0xa9/0xb0()
  • 784089: (AutoFS) [abrt] kernel: [ INFO: possible recursive locking detected ] lock(&(&dentry->d_lock)->rlock);
  • 787319: [abrt] kernel: [ INFO: possible recursive locking detected ] snd_pcm_action_group() lock(&(&substream->self_group.lock)->rlock/1)
  • 769747: [sdb] Asking for cache data failed

Not really enough samples to start seeing patterns. Going through the older bugs and closing out stale entries is something that needs doing.

Fedora 16 bugzilla status report from 2012-02-03 to 2012-02-10

Fedora 16 bugzilla status report from 2012-02-03 to 2012-02-10

Fedora 16 is currently shipping 3.2.3, the latest stable kernel.
We currently have 472 bugs open against it.

In the last week, we closed out 63 bugs, and saw changes to 91 bugs that are still open.


Interesting closures:
Looking over the list, a lot of duplicates jump out. Many of these were filed by different people, but in some cases, abrt is just being dumb.

Most commonly reported bugs in the last week:
768639: [abrt] kernel: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0×158/0x1f0 [ath9k]()

784692: hso: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0×247/0×250()
Another bunch of reports from the net/sched/sch_generic WARN_ON where the transmit queue locks up.
possibly related: 785806: e1000e Detected Hardware Unit Hang

Looking at the 26 closed bugs that weren’t duplicates, there’s a few wireless bugs that got fixed in an update, a handful of things that turned out to be bad hardware or user misconceptions, and generally pretty boring stuff.


The list of still open bugs is more interesting, especially from a ‘pattern spotting’ perspective.

Wireless:
We get a ton of wireless bug reports these days.
It seems like we get a lot more since we switched to using compat-wireless snapshots.
785409: After update to kernel 3.2.2-1 Atheros wireless fails to connect if using WEP Security
746744: Can not connect to PEAP using Intel Corporation WiFi Link 5100
785413: New kernel release has compatibility issues with my wireless card
784824: no radio devices for kernel 3.2.1
716988: Ralink RT2573 USB dongle randomly overheats
786566: Rapid decline in wireless performance with Atheros AR9002WB-1NG
785721: Unable to connect to WLAN using Broadcom kernel driver when WPA key is too long
767855: Wifi performance issues (Tx aggregation enabled on ra=MAC)
773513: Wifi wireless network connection abruptly stop working
758543: Wireless disconnects under load on Acer Aspire One 150Aw (Atheros AR242x / AR542x using ath5k)
785422: Wireless fails after kernel update to kernel 3.2.2.1 pae
770595: WARNING: at /builddir/build/BUILD/kernel-3.1.fc16/compat-wireless-3.2-rc6-3/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c:461 iwl_irq_tasklet+0x3bd/0x7c0 [iwlwifi]()
773205: WARNING: at /builddir/build/BUILD/kernel-3.1.fc16/compat-wireless-3.2-rc6-3/include/net/mac80211.h:3574 rs_get_rate+0x1c1/0x1d0 [iwlwifi]()
783822: WARNING: at /builddir/build/BUILD/kernel-3.1.fc16/compat-wireless-3.2-rc6-3/net/wireless/mlme.c:366 cfg80211_send_assoc_timeout+0xb8/0×150 [cfg80211]()
768639: WARNING: at /builddir/build/BUILD/kernel-3.1.fc17/compat-wireless-2011-12-01/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0×158/0x1f0 [ath9k]()
787649: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/brcm80211/brcmsmac/main.c:7998 brcms_c_wait_for_tx_completion+0×99/0xb0 [brcmsmac]()
786609: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/include/net/mac80211.h:3618 rate_control_send_low+0x23e/0×250 [mac80211]()
788012: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/net/mac80211/driver-ops.h:10 ieee80211_bss_info_change_notify+0x28a/0×290 [mac80211]()
789235: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/include/net/mac80211.h:3618 rate_control_send_low+0x23e/0×250 [mac80211]()
785913: WARNING: at /builddir/build/BUILD/kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/iwlwifi/iwl-agn-tx.c:396 iwlagn_tx_skb+0x98d/0xa10 [iwlwifi]()
788271: [ath9k] wireless connection drops off with newer kernels
773652: [ath9k] randomly disconnects wireless — lenovo g475
789159: network connection failure

There are a number of bugs that look all related, that take the form “irq happened, nobody cared”. Many of these seem to be using the ASM108x chipset.
Searching for irqpoll turns up a lot of these
There’s some work going on upstream to fall back to polling automatically. We’ll pick this up and backport it.

Suspend/Resume:
788433: Core i7 cannot pm-hibernate/pm-suspend/thaw properly
785384: hibernate hangs since kernel version 3.2.2-1.fc16.i686
754043: Thinkpad R61 sometimes fails to suspend (regression)
787467: Virt is very slow after suspend / resume

Sound:
785417: snd_hda_intel – no sound
786243: Subwoofer in ASUS G73 does not work
757375: microphone does not work in Samsung N110 netbook (regression from 2.6.38.6-26)
785329: setup_bdle() BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
788978: snd_vortex_dev_free() BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0

USB:
746914: bug in ehci_hcd module
789066: WARNING: at drivers/usb/host/xhci-ring.c:472 xhci_find_new_dequeue_state+0xb5/0x1f0(
787533: WARNING: at drivers/usb/host/ehci-hcd.c:1178 ehci_endpoint_reset+0xee/0×100()
This last one is interesting. We also saw this on F15.
From discussion with Alan Stern upstream, this appears to be hplip doing something wrong, and always trying to clear halt status on an endpoint, even if it hasn’t been set.

SCSI:
754518: oops in sd_revalidate_disk
SCSI patch goes unapplied upstream. Film at 11. The number of hotplug related crashes in SCSI continues to amaze. Things really started going to shit after circa 2.6.37

Soft lockups.
789002: BUG: soft lockup – CPU#0 stuck for 23s! [kworker/u:0:2600]
Looks like the ath9k chip locked up, and the ioread never completed. Nasty.
756542: BUG: soft lockup – CPU#2 stuck for 23s! [btrfs-endio-0:440]
788279: BUG: soft lockup – CPU#3 stuck for 23s! [flush-254:0:1124]
IO related stalls.
788938: BUG: soft lockup – CPU#3 stuck for 22s! [systemd-stdout-:563]
This one is weird. Systemd tried to open something, it blocked for 22 seconds.
788620: BUG: soft lockup – CPU#1 stuck for 30s! [kworker/1:1:9884]
atl1c locked up while checking link status. This seems to be a common thing for network drivers.

Networking:
759063: WARNING: at net/ipv4/tcp.c:1485 tcp_recvmsg+0x1bb/0x7f1()
Back traces and warnings from the network layer always give me the creeps.
Not entirely sure what’s happening here yet, or even if it’s reproducable.

Totally weird shit:
788981: map_vm_area() BUG: unable to handle kernel paging request at 000000011b02e000
788770: get_from_free_list() BUG: unable to handle kernel paging request at 01010154
743589: general protection fault: 0000 [#1] SMP : TAINTED ——-D
783335: kernel BUG at mm/huge_memory.c:1921!
787527: kernel BUG at mm/mmap.c:2378!
752175: WARNING: at block/genhd.c:1560 disk_clear_events+0xc6/0xfa()
788280: WARNING: at lib/list_debug.c:53 __list_del_entry+0xa1/0xd0()
787044: WARNING: at lib/list_debug.c:47 __list_del_entry+0×63/0xd0()
787171: WARNING: at fs/dcache.c:2458 prepend_path+0x18c/0x1a0()
788706: WARNING: at block/genhd.c:1568 disk_clear_events+0×106/0×110()

Power management:
675433: Thinkpad X201 overheats and shuts down; fan does not spin up to maximum although capable
786665: Sandybridge laptop seeing lots of wakeups
772730: Kernel panic after ACPI power event with x86_64 kernel
787333: On netbook Acer Aspire One 521 battery indicator is absent.

Virtualisation:
739499: kernel-3.1.0-0.rc6.git0.3.fc16.x86_64 won’t boot on EC2
769300: Strange xen kernel code when installing a vm
787403: WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0×67/0xa0()

Misc:
789215: led_trigger_unregister() BUG: unable to handle kernel NULL pointer dereference at (null)
This looks like the first variant of the bug we saw in f15 bug 726983.

787655: Automatic switching from graphical environment to a virtual screen without requested
768153: Bluetooth usage freezes system.
787468: boot fails by timeout while activating RAIDs with many HDDs
788509: Can’t change brightness on Acer Aspire One 521.
786164: CD/DVD-Rom stop working after a while
758789: cgroup controller test failed on F16 PPC64
787483: Desktop and applications really slow when switching between users
770704: External hdd can’t be accessed via esata (/dev/sdb disappears on access)
751060: F16 won’t power off after shutdown
789282: In the time of high % iowait all users interface are not responding also not responding tty console.
736752: INFO: suspicious rcu_dereference_check() usage in IPoIB code
788316: Kernel breaks working modem
788626: kernel BUG: unable to handle kernel NULL pointer dereference at (null) nfs4 code
785033: Kernel panic (rcu_bh_state detected stall)
784532: kernel-3.2.1-3.fc16.x86_64 seems to break settings-screen- brightness slider control so disappears
754859: Load moduale be2net failed
788748: Webcam recognized but not working on sony vaio sa3
783561: pti_exit() BUG: unable to handle kernel NULL pointer dereference at (null)
788675: acpi_video_device_lcd_get_level_current() BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
789068: xc4000_sleep() BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
747927: [RV710] Xenta Wireless 2.4G mouse looses connection

Fedora 15 bugzilla status report from 2012-02-03 to 2012-02-10

The Fedora 15 kernel recently got rebased to 3.2 (named 2.6.42 for legacy reasons). This still hasn’t made it to an official update,
(needs more karma). When that happens I suspect quite a few
of the existing open bugs will get closed, and no doubt we’ll gain some new ones to deal with.
I suspect next weeks f15 bug activity to be increased due to this.

386 bugs open in total.


7 bugs closed. Interesting closures:

783955: mdadm: sending ioctl 1261 to a partition.
Just informational. This showed up in a number of ways, all benign so far. upstream thread.

784624: Clock loses 5 hours on every boot.
This isn’t a kernel bug. This happens when daemons run in chroots and can’t access /etc/localtime.

728607: Elantech Touchpad Wrongly Detected as Logitech Wheel Mouse
Finally this got some closure. Needed the new elantech driver backported to older releases.

706574: module wmi when unloaded causes Oops and system locks up
Just needed backported to stable.

786984: Use of kernel perf support by PAPI causes crash
Fix inherited when we rebased to 3.2

707403 & 767401: WARNING: at lib/list_debug.c:26 __list_add
This is one of many reports we’ve had over the last six months which manifests as an oops or similar when someone removes a USB memory stick.
There are still some bugs lurking here (As seen by still-open bugs on other releases), but this variant should (touch wood) be fixed now.


24 still-open bugs got filed, or changed in some way.. Of those, here’s some of the more interesting ones.

689127: abysmal performance using btrfs for VM storage
In short, btrfs was really slow at doing small io’s, such as those qemu was generating.
Josef Bacik has been working on this for a while, hopefully his current fixes in the kernel needing testing have fixed this.

717211, 718886 & 715137: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xe2/0×147
This is a warning that has been around for a while. We’ve seen reports of it on multiple chips (r8169, ipheth, atl1c, e1000e, hso).
Some of the upstream network driver developers have been poking at these, but we’re still seeing them frequently.
The actual WARN() that it’s hitting is a transmit queue timeout.

788260: CVE-2011-4086 kernel: jbd2: unmapped buffer with _Unwritten or _Delay flags set can lead to DoS
kernel security bug of the week. Not too exciting.

638943 & 749909: couple ath9k bugs.
Possibly fixed in newer upstreams. Again, waiting for the update to go live.

720005: Possible threading issue on s390x.
Interesting here that it seems to now be reported against ARMv7 and possibly x86_64 too. No ideas yet on what’s happening.

789156, 787299 & 789080: Hey, suspend/resume still sucks. Who knew?

787607: setserial /dev/ttyACM0 fails with “Cannot get serial info: Invalid argument”
Includes a pointer to a patch to cdc-acm that never made it upstream, and possibly fell through the cracks.
I threw this into an f15 build for the user to try. If it works out, I’ll ping the original author & revive it upstream.

713687: BUG: soft lockup – CPU#0 stuck for 67s! [modprobe:499] in ath5k_pci_eeprom_read

92 for (timeout = AR5K_TUNE_REGISTER_TIMEOUT; timeout > 0; timeout--) {
93 status = ath5k_hw_reg_read(ah, AR5K_EEPROM_STATUS);
94 if (status & AR5K_EEPROM_STAT_RDDONE) {
95 if (status & AR5K_EEPROM_STAT_RDERR)
96 return false;
97 *data = (u16)(ath5k_hw_reg_read(ah, AR5K_EEPROM_DATA) &
98 0xffff);
99 return true;
100 }
101 usleep_range(15, 20);
102 }

AR5K_TUNE_REGISTER_TIMEOUT is 20000. So we can spin here in kernel space for some time if the hardware stops responding
for some reason (which it seems, it does).

726983: WARNING: at lib/list_debug.c:47 __list_del_entry+0xa3/0xb0()
Reported against 2.6.38, and 2.6.40 (3.1). First a use-after-free in the battery code, and then in the later kernel an oops instead.

722723: Another linked list bug.
Another use-after-free, this time in VM code. Creepy.

Fedora kernel bug status reports.

Going to start trying a new thing.

Every Friday, I’ll take a look over the bugs reported/changed/closed over the last week, and write up a report about the more interesting ones.
(Because of the sheer volume of bug reports we get, it’s not feasible to really cover everything).

Whether I’ll be able to keep it up depends on how much work it turns out to be, and how much interest there is in me doing it.
I’ll post this weeks F15 report shortly, as it was a quick one to write. I suspect 16 and rawhide will take longer, but hopefully I’ll get those done by the end of the day too.

Hiring for a position on the Fedora kernel team.

NOTE: This position has now been filled. The rest of this post is left for posterity only. If you’re interested in working at Red Hat in some other role, check out jobs.redhat.com and/or send me an email, and I’ll forward it to the right people.

We’re hiring for a position on the Fedora kernel team again.
If you are interested, email me a CV (davej at redhat) and we’ll see where things go. (we don’t have a posting on jobs.redhat.com yet)

Some of the tasks of the job include (but are not limited to):
* diagnosing incoming bugs from users
* backporting fixes from Linus’ latest tree to Fedora
* for bugs unfixed in Linus’ tree, work on fixing them, with the upstream maintainers.
* interacting with the -stable team to make sure those same fixes go there.
* helping keep Fedora on the cutting edge of kernel development by pushing new releases

Hiring distribution kernel maintainers is never easy. It is a job that requires knowledge of the various parts of the kernel. A generalist as opposed to a specialist. The bugs you’re staring at one day could be completely different the next.

With Fedora things are even harder due to the rapid rate of change upstream. By contrast, a RHEL kernel maintainer could spend time working on a bug for a month without the code underneath him changing.

It’s a demanding job, with an unrelenting torrent of new things that need fixing / working on. If this doesn’t dissuade you, you could be just what we’re looking for!

(oh, and the job doesn’t involve you having to move to Boston. Though if you wanted to I’m sure we could help make that happen).

Red Star kernel.

Over the long weekend, I downloaded a copy of Red Star Linux, the official operating system of North Korea.
Because license violations seem to be part of the Juche idea, there’s no known source code online, so proper analysis of what goes into it is difficult.

The rpm headers alone reveal quite a lot of interesting information though.
It seems that Red Star is forked from a version of Fedora somewhere around the Fedora 10 or 11 timeframe.

Here’s the changelog embedded in the kernel rpm..

* Fri Mar 27 14:00:00 2009 Jong Song Jin
- patched linux-2.6.25-drivers-video.patch(for video capture saa7134 onboard driver)

* Mon Mar 16 14:00:00 2009 An Jin
- change machanism for pci device information.

* Thu Feb 19 13:00:00 2009 Kim Yong Gwang
- change system halt to poweroff for x86 architecture

* Wed Jan 7 13:00:00 2009 Kim Jong Chol
- fixed the 8250 serial driver for modem.

* Fri Nov 28 13:00:00 2008 Kim Se Hyok
- apply tuxonice hibernate patch for Software Suspend 2

* Mon Nov 10 13:00:00 2008 Kim Chol Guk
- apply jipsam algorithm

* Mon Nov 10 13:00:00 2008 Kim Yong Gwang
- change 16 from 8 max count of the loop device

* Sat Aug 2 14:00:00 2008 Kim Yong Gwang
- implement the usb filtering through user authentification

* Wed Jul 23 14:00:00 2008 Kim Chol Guk
- Implement koreanize
- sata harddriver
- apply bootsplash

* Wed Apr 30 14:00:00 2008 Dave Airlie 2.6.25-14
- fix radeon fast-user-switch oops + i915 breadcrumb oops

Some interesting things here.
- All changelog timestamps are on the hour. Suggesting they’ve been sanitised, or generated from another source. All 374 changelog entries had been munged in this way, including the ones from the original Fedora release.
- No email addresses for the changelog entries (no surprise)
- The actual changelogs are quite cryptic. “change machanism for pci device information.” why?
- ‘fixed the 8250 serial driver for modem.’ wtf ?
- They decided tuxonice is the way forward for hibernation. Perhaps it works better on dear leaders laptop.
- ‘apply jipsam algorithm’. This is a crypto module that isn’t in mainline (and apparently doesn’t exist outside North Korea). I bet it’s good though. No backdoor master keys or anything similar.
- ‘implement the usb filtering through user authentification’.
What does that even mean ?

Browsing through the rest of the distribution, a lot of packages are renamed. OpenOffice became UriOffice. Gimp became ImageProcessor. Wine became CrossWin2.0.

It also comes with AntiVirus 2.0. Which comes with a rtscan.ko kernel module, which judging by the symbols it uses, does some magic with jprobes to hook various functions for on-open scanning.

Another curious thing, is that throughout the distro whenever you do see an email address or hostname, it has a .kp TLD, that never seem to resolve. I’m assuming that the DNS servers in North Korea show different results if you’re in North Korea or not.

10 years of x86info.

On Mon Feb 26 2001, I committed the first version of x86info to CVS on sourceforge. The actual coding of that first version had happened during the week or so prior.

The project began after seeing the program ‘cpuid’ by Phil Karn. I had sent a few patches to Phil but never got any reply, and no new release of cpuid appeared until months later (not including my patches). As I ended up forking more and more of the code, I decided to push it out as ‘x86info’ for the first time. Phil did a few more releases, up until 2002, when the program was abandoned.

Over the years, x86info has seen 739 commits from at least 19 contributors (some patches written by others were committed in the CVS days without correct attribution). The bulk of the commits were unsurprisingly mine.

The commit rate year by year is interesting.

2001 197
2002 119
2003 88
2004 17
2005 37
2006 46
2007 34
2008 74
2009 65
2010 27
2011 54

2003 was when I got hired at Red Hat, and the Fedora kernel pretty much became my life. 2004 I was doing RHEL4 too. The drop-off around that time is significant, and doesn’t recover until several years later.
This year seems to be off to a good start :-)

x86info has been ported to Solaris and FreeBSD, and at one time, even to Microsoft Windows. That support was ultimately removed due to the number of invasive ifdefs present. The various ports sometimes get broken due to lack of testing during development time, but afaik they are available in ‘ports’ with whatever fixes need to be present.

Asides from the Linux kernel, this is the oldest project I’m involved in that is still seeing development.