处理回归¶
我们不引入回归 —— 本文档阐述了这条“Linux 内核开发首要规则”对开发者而言在实践中意味着什么。它是《报告回归》的补充,后者从用户的角度涵盖了该主题;如果您从未阅读过那篇文章,请在继续阅读本文之前至少快速浏览一遍。
要点(即“TL;DR”)¶
确保回归邮件列表 (regressions mailing list) 的订阅者 (regressions@lists.linux.dev) 能迅速获知任何新的回归报告
当收到一份未抄送给列表的邮件报告时,立即发送至少一份简短的“回复全部”邮件,并抄送给列表,使其进入处理流程。
将通过 Bug 跟踪器提交的任何报告转发或弹回(bounce)到列表。
让 Linux 内核回归跟踪机器人“regzbot”跟踪该问题(这是可选的,但建议这样做)
对于邮件报告,检查报告者是否包含类似
#regzbot introduced: v5.13..v5.14-rc1
的行。如果没有,发送一封回复(抄送给回归列表),其中包含如下段落,告诉 regzbot 问题何时开始出现#regzbot ^introduced: 1f2e3d4c5b6a
当将 Bug 跟踪器中的报告转发到回归列表时(见上文),包含如下段落
#regzbot introduced: v5.13..v5.14-rc1 #regzbot from: Some N. Ice Human <some.human@example.com> #regzbot monitor: http://some.bugtracker.example.com/ticket?id=123456789
提交回归修复时,请在补丁描述中添加“Closes:”标签,指向所有报告该问题的地方,如《提交补丁:将代码引入内核的必备指南》和《Documentation/process/5.Posting.rst》所规定。如果您只修复导致回归问题的一部分,则可以使用“Link:”标签代替。regzbot 目前不对两者进行区分。
一旦确定了罪魁祸首,应尽快修复回归;大多数回归的修复应在两周内合并,但有些需要在两到三天内解决。
与开发者相关的 Linux 内核回归问题的所有详情¶
更详细的要点¶
收到回归报告时该怎么做¶
确保 Linux 内核的回归跟踪者和回归邮件列表 (regressions mailing list) 的其他订阅者 (regressions@lists.linux.dev) 能获知任何新报告的回归问题
当您收到一份未抄送给列表的邮件报告时,立即发送至少一份简短的“回复全部”邮件,并抄送给列表,使其进入处理流程;如果回复的回复中又遗漏了列表,请尝试确保再次抄送。
如果 Bug 跟踪器中提交的报告到达您的收件箱,请将其转发或弹回(bounce)到列表。如果报告者已按照《报告问题》中的指示转发了报告,请考虑事先检查列表存档。
在执行上述任一操作时,请考虑让 Linux 内核回归跟踪机器人“regzbot”立即开始跟踪该问题
对于邮件报告,检查报告者是否包含类似
#regzbot introduced: 1f2e3d4c5b6a
的“regzbot 命令”。如果没有,发送一封回复(抄送给回归列表),其中包含如下段落:#regzbot ^introduced: v5.13..v5.14-rc1这会告诉 regzbot 问题开始出现的版本范围;您也可以使用 commit-id 来指定范围,或者在报告者已二分法定位到问题提交时,直接指定单个 commit-id。
请注意“introduced”前的插入符号 (^):它告诉 regzbot 将父邮件(您回复的邮件)视为您希望跟踪的回归问题的初始报告;这很重要,因为 regzbot 稍后会查找带有“Closes:”标签的补丁,这些标签指向 lore.kernel.org 存档中的报告。
当转发一个报告到 Bug 跟踪器的回归问题时,包含一个带有这些 regzbot 命令的段落
#regzbot introduced: 1f2e3d4c5b6a #regzbot from: Some N. Ice Human <some.human@example.com> #regzbot monitor: http://some.bugtracker.example.com/ticket?id=123456789Regzbot 将自动把包含指向您的邮件或提到的工单的“Closes:”标签的补丁与报告关联起来。
修复回归问题时的要点¶
提交回归修复时无需做任何特殊操作,只需记住按照《提交补丁:将代码引入内核的必备指南》、《Documentation/process/5.Posting.rst》和《关于 Linux -stable 版本的你需要了解的一切》中已详细解释的内容进行即可。
使用“Closes:”标签指向所有报告该问题的地方
Closes: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=1234567890如果您只修复问题的一部分,可以如上述第一份文档中所述,使用“Link:”代替。regzbot 目前将两者视为等同,并认为链接的报告已解决。
添加“Fixes:”标签以指定导致回归的提交。
如果罪魁祸首是在较早的开发周期中合并的,请使用
Cc: stable@vger.kernel.org
标签明确标记该修复以进行反向移植(backporting)。
所有这些都是您应做的,并且在处理回归问题时非常重要,因为这些标签对于(包括您在内的)将来可能在数周、数月甚至数年后调查该问题的每个人都非常有价值。这些标签对于其他内核开发者或 Linux 发行版使用的工具和脚本也至关重要;其中一个工具就是 regzbot,它严重依赖“Closes:”标签来将回归报告与解决它们的更改关联起来。
修复回归的期望和最佳实践¶
作为一名 Linux 内核开发者,您应尽最大努力避免出现因您最近的更改导致回归,从而只留给用户以下选择的情况
运行一个存在影响使用的回归问题的内核。
切换到更旧或更新的内核系列。
在回归问题的罪魁祸首被识别后,继续运行一个过时且可能不安全的内核超过三周。理想情况下应少于两周。如果问题严重或影响许多用户——无论是普遍情况还是在常见环境中——则应在几天内解决。
如何在实践中实现这一点取决于多种因素。以下经验法则可作为指导。
总的来说
优先处理回归问题,高于所有其他 Linux 内核工作,除非后者涉及严重问题(例如:严重安全漏洞、数据丢失、硬件损坏等)。
加速修复最近已进入正式 mainline、stable 或 longterm 版本的回归问题(无论是直接合并还是通过反向移植)。
不要将当前周期的回归视为可以等到周期结束再处理的问题,因为该问题可能会阻碍或阻止用户和 CI 系统现在或普遍地测试 mainline。
在解决问题时需谨慎,以避免造成额外或更大的损害,即使这样解决问题可能比下面所述的时间更长。
一旦确定回归问题的罪魁祸首,关于时间安排
如果问题严重或困扰许多用户——无论是普遍情况还是在特定硬件环境、发行版或 stable/longterm 系列等常见条件下——目标是在两到三天内将修复合并到 mainline。
如果罪魁祸首已进入最近的 mainline、stable 或 longterm 版本(无论是直接合并还是通过反向移植),目标是在下下个周日之前将修复合并到 mainline;如果罪魁祸首在一周初被发现且易于解决,请尝试在同一周内将修复合并到 mainline。
对于其他回归问题,目标是在未来三周内的最后一个周日之前将修复合并到 mainline。如果回归是人们可以轻松忍受一段时间的,例如轻微的性能回归,则推迟一两个周日是可以接受的。
强烈不建议将回归修复的合入 mainline 延迟到下一个合并窗口,除非修复的风险极高或罪魁祸首是在一年多前合入 mainline 的。
关于流程
始终考虑回滚(reverting)罪魁祸首,因为它通常是修复回归问题最快、最不危险的方法。不必担心之后再将修复后的版本合并到 mainline:这应该很简单,因为大部分代码已经审查过一次了。
尝试在当前开发周期结束前解决过去十二个月内引入 mainline 的所有回归问题:Linus 希望这类回归能像当前周期的回归一样处理,除非修复带来异常风险。
如果回归问题看起来很棘手,请考虑在讨论或补丁审查时抄送 Linus。在紧急或危急情况下也这样做——特别是当子系统维护者可能无法联系时。当您知道此类回归已进入 mainline、stable 或 longterm 版本时,也请抄送 stable 团队。
对于紧急回归问题,考虑请求 Linus 直接从邮件列表中接收修复:对于没有争议的修复,他完全可以接受。但理想情况下,此类请求应与子系统维护者协商一致或直接由他们提出。
如果您不确定某个修复在新的 mainline 版本发布前几天应用是否值得冒险,请给 Linus 发送一封邮件,抄送给常规列表和相关人员;在邮件中,总结情况并请求他考虑直接从列表中接收修复。他可以自行决定,必要时甚至可以推迟发布。此类请求也应理想地与子系统维护者协商一致或直接由他们提出。
关于 stable 和 longterm 内核
如果回归问题从未在 mainline 中出现,或者已经在 mainline 中修复,您可以将其留给 stable 团队处理。
如果在过去十二个月内,某个回归问题进入了正式的 mainline 版本,请确保为修复标记“Cc: stable@vger.kernel.org”,因为单独的“Fixes:”标签并不能保证进行反向移植。如果您知道罪魁祸首已被反向移植到 stable 或 longterm 内核,请添加相同的标签。
当收到有关近期 stable 或 longterm 内核系列中回归问题的报告时,请至少简要评估该问题是否也可能发生在当前 mainline 中——如果可能性较大,请接手该报告。如有疑问,请要求报告者检查 mainline。
每当您想迅速解决一个最近也进入了正式 mainline、stable 或 longterm 版本的回归问题时,请在 mainline 中快速修复它;适当时,请 Linus 介入以加速修复(见上文)。这是因为 stable 团队通常既不会回滚也不会修复在 mainline 中造成相同问题的任何更改。
对于紧急的回归修复,一旦修复被合并到 mainline,您可能希望通过给 stable 团队发一个通知来确保及时反向移植;这在合并窗口期间和之后不久尤其值得推荐,因为否则修复可能会落在大量补丁队列的末尾。
关于补丁流程
开发者们,在尝试达到上述时间段时,请记住要考虑修复经过测试、审查并由 Linus 合并所需的时间,理想情况下它们至少会在 linux-next 中短暂存在。因此,如果修复是紧急的,请使其显而易见,以确保其他人能适当处理。
评审者们,请您及时审查回归修复,以帮助开发者达到上述时间段。
子系统维护者们,同样鼓励您加速处理回归修复。因此,评估对于特定修复跳过 linux-next 是否可行。必要时,也请考虑比平时更频繁地发送 git pull 请求。并尽量避免在周末拖延回归修复——特别是当该修复被标记为需要反向移植时。
开发者应了解的更多关于回归的方面¶
如何处理已知存在回归风险的变更¶
评估回归风险有多大,例如通过在 Linux 发行版和 Git 仓库中执行代码搜索。同时,考虑要求可能受影响的其他开发者或项目评估甚至测试拟议的更改;如果出现问题,或许可以找到一个所有人都接受的解决方案。
如果最终回归风险看起来相对较小,请继续进行更改,但要让所有相关方了解风险。因此,请确保您的补丁描述清晰地说明了这一点。一旦更改合并,请告知 Linux 内核的回归跟踪器和回归邮件列表有关风险,以便在报告陆续出现时,每个人都能关注到该更改。根据风险情况,您可能还希望要求子系统维护者在他的 mainline pull request 中提及该问题。
关于回归还有哪些需要了解?¶
查阅《报告回归》,它涵盖了您可能想了解的许多其他方面
“无回归”规则的目的
哪些问题实际属于回归
谁负责寻找回归的根本原因
如何处理棘手情况,例如回归是由安全修复引起时,或修复回归可能导致另一个回归时
遇到回归问题时应向谁寻求建议¶
向回归邮件列表 (regressions@lists.linux.dev) 发送邮件,同时抄送 Linux 内核的回归跟踪者 (regressions@leemhuis.info);如果问题最好私下处理,可以省略列表。
更多关于回归跟踪和 regzbot 的信息¶
为什么 Linux 内核有回归跟踪者,以及为什么使用 regzbot?¶
像“无回归”这样的规则需要有人来确保其得到遵守,否则它们可能会意外或有意地被打破。历史表明,Linux 内核也是如此。这就是为什么 Thorsten Leemhuis 自愿担任 Linux 内核的回归跟踪者来关注这些事情,他偶尔也会得到其他人的帮助。他们都没有因此获得报酬,这就是为什么回归跟踪是在尽最大努力的基础上进行的。
早期手动跟踪回归的尝试表明这是一项耗时且令人沮丧的工作,这就是它们在一段时间后被放弃的原因。为了防止这种情况再次发生,Thorsten 开发了 regzbot 来协助这项工作,其长期目标是尽可能为所有相关人员自动化回归跟踪。
regzbot 如何进行回归跟踪?¶
该机器人会监控对已跟踪回归报告的回复。此外,它还会查找引用此类报告并带有“Closes:”标签的已发布或已提交的补丁;对这些补丁发布的回复也会被跟踪。综合这些数据,可以很好地了解修复过程的当前状态。
Regzbot 尝试以尽可能少的开销完成其工作,无论对报告者还是开发者。实际上,只有报告者会承担一项额外职责:他们需要使用上面概述的 #regzbot introduced
命令告知 regzbot 回归报告;如果他们不这样做,其他人可以使用 #regzbot ^introduced
来处理。
对于开发者来说,通常没有额外的工作,他们只需确保做一件在 regzbot 出现很久以前就已期望的事情:在补丁描述中添加指向所有已修复问题报告的链接。
我必须使用 regzbot 吗?¶
如果您使用 regzbot,这符合所有人的利益,因为像 Linus Torvalds 这样的内核维护者在他们的工作中部分依赖 regzbot 的跟踪——例如在决定发布新版本或延长开发阶段时。为此,他们需要了解所有未修复的回归问题;为了做到这一点,Linus 会查看 regzbot 每周发送的报告。
我必须向 regzbot 报告我遇到的每一个回归问题吗?¶
理想情况下是的:我们都是人类,当意外出现更重要的事情时,很容易忘记问题——例如 Linux 内核中一个更大的问题,或者现实生活中让我们暂时远离键盘的事情。因此,最好向 regzbot 报告每一个回归问题,除非您立即编写了修复程序并将其提交到定期合并到受影响内核系列的树中。
如何查看 regzbot 目前正在跟踪哪些回归?¶
查看 regzbot 的网页界面以获取最新信息;或者,搜索最新的回归报告,regzbot 通常在每周日晚上(UTC 时间)发送一次,这通常在 Linus 发布新的(预)版本前几个小时。
regzbot 监控哪些地方?¶
Regzbot 正在监控最重要的 Linux 邮件列表以及 linux-next、mainline 和 stable/longterm 的 git 仓库。
regzbot 应该跟踪哪类问题?¶
该机器人旨在跟踪回归问题,因此请不要让 regzbot 参与常规问题。但如果您使用 regzbot 跟踪严重问题,例如关于死机、数据损坏或内部错误(Panic、Oops、BUG()、warning 等)的报告,Linux 内核的回归跟踪者是没意见的。
我可以将 CI 系统发现的回归添加到 regzbot 的跟踪中吗?¶
如果特定的回归可能对实际用例产生影响,并因此可能被用户注意到,请随意添加;因此,请不要让 regzbot 参与不太可能在实际使用中出现的理论性回归。
如何与 regzbot 互动?¶
通过在直接或间接回复回归报告邮件时使用“regzbot 命令”来实现。这些命令需要独立成段(即:它们需要使用空行与邮件的其他部分隔开)。
其中一个命令是 #regzbot introduced: <version or commit>
,它使 regzbot 将您的邮件视为已添加到跟踪的回归报告,如上文所述;#regzbot ^introduced: <version or commit>
是另一个类似的命令,它使 regzbot 将父邮件视为它开始跟踪的回归报告。
一旦使用了上述两个命令中的一个,其他 regzbot 命令就可以在对报告的直接或间接回复中使用。您可以将它们写在 introduced 命令之一的下方,或者在使用了其中一个命令的邮件的回复中,或者本身就是对该邮件的回复的邮件中
设置或更新标题
#regzbot title: foo监控讨论或 bugzilla.kernel.org 工单,其中讨论了问题的附加方面或修复——例如发布修复回归的补丁
#regzbot monitor: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/监控功能仅适用于 lore.kernel.org 和 bugzilla.kernel.org;regzbot 将把该线程或工单中的所有消息视为与修复过程相关。
指向包含更多相关详细信息的地方,例如邮件列表帖子或 Bug 跟踪器中的工单,这些信息略有相关,但属于不同主题
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789将回归标记为已由即将上游或已合并的提交修复
#regzbot fix: 1f2e3d4c5d将回归标记为 regzbot 已跟踪的另一个回归的副本
#regzbot dup-of: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/将回归标记为无效
#regzbot invalid: wasn't a regression, problem has always existed
关于 regzbot 及其命令,还有更多要说的吗?¶
有关 Linux 内核回归跟踪机器人的更详细和最新信息,可以在其项目页面上找到,其中包括入门指南和参考文档,这两者都比上面一节涵盖了更多细节。
Linus 关于回归问题的引言¶
以下是 Linus Torvalds 期望如何处理回归问题的几个实际例子
If you break existing user space setups THAT IS A REGRESSION. It's not ok to say "but we'll fix the user space setup". Really. NOT OK. [...] The first rule is: - we don't cause regressions and the corollary is that when regressions *do* occur, we admit to them and fix them, instead of blaming user space. The fact that you have apparently been denying the regression now for three weeks means that I will revert, and I will stop pulling apparmor requests until the people involved understand how kernel development is done. People should basically always feel like they can update their kernel and simply not have to worry about it. I refuse to introduce "you can only update the kernel if you also update that other program" kind of limitations. If the kernel used to work for you, the rule is that it continues to work for you. There have been exceptions, but they are few and far between, and they generally have some major and fundamental reasons for having happened, that were basically entirely unavoidable, and people _tried_hard_ to avoid them. Maybe we can't practically support the hardware any more after it is decades old and nobody uses it with modern kernels any more. Maybe there's a serious security issue with how we did things, and people actually depended on that fundamentally broken model. Maybe there was some fundamental other breakage that just _had_ to have a flag day for very core and fundamental reasons. And notice that this is very much about *breaking* peoples environments. Behavioral changes happen, and maybe we don't even support some feature any more. There's a number of fields in /proc/<pid>/stat that are printed out as zeroes, simply because they don't even *exist* in the kernel any more, or because showing them was a mistake (typically an information leak). But the numbers got replaced by zeroes, so that the code that used to parse the fields still works. The user might not see everything they used to see, and so behavior is clearly different, but things still _work_, even if they might no longer show sensitive (or no longer relevant) information. But if something actually breaks, then the change must get fixed or reverted. And it gets fixed in the *kernel*. Not by saying "well, fix your user space then". It was a kernel change that exposed the problem, it needs to be the kernel that corrects for it, because we have a "upgrade in place" model. We don't have a "upgrade with new user space". And I seriously will refuse to take code from people who do not understand and honor this very simple rule. This rule is also not going to change. And yes, I realize that the kernel is "special" in this respect. I'm proud of it. I have seen, and can point to, lots of projects that go "We need to break that use case in order to make progress" or "you relied on undocumented behavior, it sucks to be you" or "there's a better way to do what you want to do, and you have to change to that new better way", and I simply don't think that's acceptable outside of very early alpha releases that have experimental users that know what they signed up for. The kernel hasn't been in that situation for the last two decades. We do API breakage _inside_ the kernel all the time. We will fix internal problems by saying "you now need to do XYZ", but then it's about internal kernel API's, and the people who do that then also obviously have to fix up all the in-kernel users of that API. Nobody can say "I now broke the API you used, and now _you_ need to fix it up". Whoever broke something gets to fix it too. And we simply do not break user space.摘自 2020-05-21
The rules about regressions have never been about any kind of documented behavior, or where the code lives. The rules about regressions are always about "breaks user workflow". Users are literally the _only_ thing that matters. No amount of "you shouldn't have used this" or "that behavior was undefined, it's your own fault your app broke" or "that used to work simply because of a kernel bug" is at all relevant. Now, reality is never entirely black-and-white. So we've had things like "serious security issue" etc that just forces us to make changes that may break user space. But even then the rule is that we don't really have other options that would allow things to continue. And obviously, if users take years to even notice that something broke, or if we have sane ways to work around the breakage that doesn't make for too much trouble for users (ie "ok, there are a handful of users, and they can use a kernel command line to work around it" kind of things) we've also been a bit less strict. But no, "that was documented to be broken" (whether it's because the code was in staging or because the man-page said something else) is irrelevant. If staging code is so useful that people end up using it, that means that it's basically regular kernel code with a flag saying "please clean this up". The other side of the coin is that people who talk about "API stability" are entirely wrong. API's don't matter either. You can make any changes to an API you like - as long as nobody notices. Again, the regression rule is not about documentation, not about API's, and not about the phase of the moon. It's entirely about "we caused problems for user space that used to work".摘自 2017-11-05
And our regression rule has never been "behavior doesn't change". That would mean that we could never make any changes at all. For example, we do things like add new error handling etc all the time, which we then sometimes even add tests for in our kselftest directory. So clearly behavior changes all the time and we don't consider that a regression per se. The rule for a regression for the kernel is that some real user workflow breaks. Not some test. Not a "look, I used to be able to do X, now I can't".摘自 2018-08-03
YOU ARE MISSING THE #1 KERNEL RULE. We do not regress, and we do not regress exactly because your are 100% wrong. And the reason you state for your opinion is in fact exactly *WHY* you are wrong. Your "good reasons" are pure and utter garbage. The whole point of "we do not regress" is so that people can upgrade the kernel and never have to worry about it. > Kernel had a bug which has been fixed That is *ENTIRELY* immaterial. Guys, whether something was buggy or not DOES NOT MATTER. Why? Bugs happen. That's a fact of life. Arguing that "we had to break something because we were fixing a bug" is completely insane. We fix tens of bugs every single day, thinking that "fixing a bug" means that we can break something is simply NOT TRUE. So bugs simply aren't even relevant to the discussion. They happen, they get found, they get fixed, and it has nothing to do with "we break users". Because the only thing that matters IS THE USER. How hard is that to understand? Anybody who uses "but it was buggy" as an argument is entirely missing the point. As far as the USER was concerned, it wasn't buggy - it worked for him/her. Maybe it worked *because* the user had taken the bug into account, maybe it worked because the user didn't notice - again, it doesn't matter. It worked for the user. Breaking a user workflow for a "bug" is absolutely the WORST reason for breakage you can imagine. It's basically saying "I took something that worked, and I broke it, but now it's better". Do you not see how f*cking insane that statement is? And without users, your program is not a program, it's a pointless piece of code that you might as well throw away. Seriously. This is *why* the #1 rule for kernel development is "we don't break users". Because "I fixed a bug" is absolutely NOT AN ARGUMENT if that bug fix broke a user setup. You actually introduced a MUCH BIGGER bug by "fixing" something that the user clearly didn't even care about. And dammit, we upgrade the kernel ALL THE TIME without upgrading any other programs at all. It is absolutely required, because flag-days and dependencies are horribly bad. And it is also required simply because I as a kernel developer do not upgrade random other tools that I don't even care about as I develop the kernel, and I want any of my users to feel safe doing the same time. So no. Your rule is COMPLETELY wrong. If you cannot upgrade a kernel without upgrading some other random binary, then we have a problem.摘自 2021-06-05
THERE ARE NO VALID ARGUMENTS FOR REGRESSIONS. Honestly, security people need to understand that "not working" is not a success case of security. It's a failure case. Yes, "not working" may be secure. But security in that case is *pointless*. Binary compatibility is more important. And if binaries don't use the interface to parse the format (or just parse it wrongly - see the fairly recent example of adding uuid's to /proc/self/mountinfo), then it's a regression. And regressions get reverted, unless there are security issues or similar that makes us go "Oh Gods, we really have to break things". I don't understand why this simple logic is so hard for some kernel developers to understand. Reality matters. Your personal wishes matter NOT AT ALL. If you made an interface that can be used without parsing the interface description, then we're stuck with the interface. Theory simply doesn't matter. You could help fix the tools, and try to avoid the compatibility issues that way. There aren't that many of them.it's clearly NOT an internal tracepoint. By definition. It's being used by powertop.We have programs that use that ABI and thus it's a regression if they break.摘自 2012-07-06
> Now this got me wondering if Debian _unstable_ actually qualifies as a > standard distro userspace. Oh, if the kernel breaks some standard user space, that counts. Tons of people run Debian unstable摘自 2019-09-15
One _particularly_ last-minute revert is the top-most commit (ignoring the version change itself) done just before the release, and while it's very annoying, it's perhaps also instructive. What's instructive about it is that I reverted a commit that wasn't actually buggy. In fact, it was doing exactly what it set out to do, and did it very well. In fact it did it _so_ well that the much improved IO patterns it caused then ended up revealing a user-visible regression due to a real bug in a completely unrelated area. The actual details of that regression are not the reason I point that revert out as instructive, though. It's more that it's an instructive example of what counts as a regression, and what the whole "no regressions" kernel rule means. The reverted commit didn't change any API's, and it didn't introduce any new bugs. But it ended up exposing another problem, and as such caused a kernel upgrade to fail for a user. So it got reverted. The point here being that we revert based on user-reported _behavior_, not based on some "it changes the ABI" or "it caused a bug" concept. The problem was really pre-existing, and it just didn't happen to trigger before. The better IO patterns introduced by the change just happened to expose an old bug, and people had grown to depend on the previously benign behavior of that old issue. And never fear, we'll re-introduce the fix that improved on the IO patterns once we've decided just how to handle the fact that we had a bad interaction with an interface that people had then just happened to rely on incidental behavior for before. It's just that we'll have to hash through how to do that (there are no less than three different patches by three different developers being discussed, and there might be more coming...). In the meantime, I reverted the thing that exposed the problem to users for this release, even if I hope it will be re-introduced (perhaps even backported as a stable patch) once we have consensus about the issue it exposed. Take-away from the whole thing: it's not about whether you change the kernel-userspace ABI, or fix a bug, or about whether the old code "should never have worked in the first place". It's about whether something breaks existing users' workflow. Anyway, that was my little aside on the whole regression thing. Since it's that "first rule of kernel programming", I felt it is perhaps worth just bringing it up every once in a while