Tuesday, 29 December 2015

GRUB, os-prober and Red Hat / Oracle Linux

I've been successfully using VirtualBox to have test environments to run Oracle Linux and Oracle Database in from some time, but there are limitations to what you can do. So I decided that I wanted to install Oracle Linux onto another disk partition on my PC so I could dual boot into it for some more advanced Oracle Database tests. Well the Oracle Linux installation itself went ahead trouble free - I just had to remember to disable GRUB bootloader installation as GRUB was already installed from my other Linux (Arch Linux) - but then I ran into some problems trying to get this newly installed Linux added properly to the main GRUB boot menu I was using. This post describes why this occurred (after much digging around on my part), and a very quick and simple solution for it.

Detecting other Linux installations on other disk partitions to add to the GRUB boot menu is done on Arch Linux by installing the "os-prober" package which adds some extra scripts used by "grub-mkconfig". The specific problem is that while "os-prober" did detect the Oracle Linux kernels, they were being added to the GRUB menu in the wrong order - it looked like an alphabetical ordering rather than a numeric ordering by kernel version number. This meant that the first Oracle Linux kernel listed in the GRUB menu was not the latest one installed, and in fact more likely to be the oldest one instead or a rescue kernel.

To cut a long story short the problem is due to a combination of the code in the "/usr/lib/linux-boot-probes/mounted/40grub2" detection script and the contents of the "/boot/grub2/grub.cfg" file in the Oracle Linux installation. The "grub.cfg" file in the Oracle Linux installation uses some keywords that are not detected by the "40grub2" script in Arch Linux, so the bootable Linux kernels are not listed in the same order as they are in the source "grub.cfg" file. Instead it is the "90fallback" script that detects the bootable Linux kernels when it is run afterwards by "os-prober". Actually it is run by "linux-boot-prober" and it does a direct listing of Linux kernel files in the "/boot" directory of the other Linux disk partition, and adds each of these to the local GRUB configuration file. And the result of this is that the other Linux kernels are detected and added in alphabetical order.

Details on the Problem

The "40grub2" script works by opening the "/boot/grub2/grub.cfg" file from another Linux installation and looking for the entries for bootable Linux kernels. The idea is that "40grub2" will find Linux kernels in the same order they are in the "grub.cfg" on the other Linux installation, and they will be added to the local "grub.cfg" file in the same order. The benefit of this method is that the first Linux kernel listed for this other installation in the main GRUB boot menu will be the same one as listed by the other Linux installation itself. Which in turn means that if it sorts the Linux kernels in any way or puts a specific Linux kernel first as the default in its "grub.cfg" configuration file, then this is also reflected in the local GRUB configuration file of my main Linux installation.

The "40grub2" script works by opening the "/boot/grub2/grub.cfg" file of the other Linux installation and then reads each line in turn looking for ones that begin "menuentry", "linux" or "initrd". I believe that these are "standard" keywords that GRUB should be using. Unfortunately Oracle Linux is using keywords of "linux16" and "initrd16" instead, which means that these lines are not matched at all by the "40grub2" script and no bootable Linux kernels are matched at all. It seems that Red Hat on which Oracle Linux is based uses these keywords for some obscure, historical reason or other. Either way, they are used and they do not match what "40grub2" is looking for.

Instead the bootable Linux kernels are detected by the "90fallback" script when it runs afterwards, and they are detected in alphabetical naming order as mentioned before.

Solutions

There is a quick, easy and good enough solution you can do yourself, and then there is a more official solution.

First, you can just manually edit your local "40grub2" file and change two lines in it. Add a "16" variation to the lines in the "case" block that test for "linux" and "initrd". Here is the output from "diff" showing the before (<) and after (>) versions of the two lines I changed.

67c67
<    linux)
---
>    linux | linux16 )
80c80
<    initrd)
---
>    initrd | initrd16 )
Once edited run "grub-mkconfig" again to regenerate your "grub.cfg" file, and it should correctly pick up those entries from the other Linux installation now.

Second, it does not look like there is actually an official solution, which can often be the case with open source software. I found some bug reports about this problem but there was some finger pointing going on both ways between the GRUB people and the Red Hat people. It looked like the GRUB people felt that the official keywords were "linux" and "initrd", so it was a Red Hat problem to solve; while the Red Hat people felt that "linux16" and "initrd16" were valid in a GRUB configuration file and did work so it was a GRUB problem with the "40grub2" script.

One person did raise the question on how the main Linux that is adding these entries to its local "grub.cfg" file should be treating these entries with the "16" suffix from the other Linux. Should it ignore them and just use the normal keywords in its own "grub.cfg" file, or should it use exactly the same keywords? The latter solution is a problem because the keywords found in the other "grub.cfg" file are NOT returned back to the "os-prober" script i.e. it is assumed they are only "linux" and "initrd". Making "40grub2" return these extra keywords as extra data fields would need a lot of changes in other places - both "40grub2" and "os-prober" at least, and possibly others too if there is a common format used for passing around information on bootable Linux kernels.

So you can see how something that looks simple can grow into something much bigger, and could have significant changes to something as important as GRUB. And GRUB is a very critical piece of software used at system boot time, so no "obvious solution" should be rushed through without a lot of extra thought and testing. Unfortunately I don't know when we will get any kind of "official solution" to this.

No comments: