Wayland v/s Xorg : How Are They Similar & How Are They Different
A closer look at Wayland and Xorg, the two most popular display servers in the wild. Join security analyst Shivam Sengar to learn about their differences.
Earlier this year Ubuntu 18 LTS was fully released. When I first heard that Ubuntu 18 will be using Xorg as default, it came to me by surprise. Afterall Xorg has some security concerns, and choosing to continue with it was hard to swallow. When I started to dig into this, I realised why the community was going with Xorg instead of Wayland as default. In this article, I will explain the issues I found during my analysis.
Before going over the issue with the Xorg it is vital to understand the working mechanisms and architecture of both Xorg and Wayland. How they are similar and how are they different?
Display Server and Stack
Display Server is the basic component of GUI which sits between the graphical interface and the kernel. Its primary task is to coordinate the input and output of its clients (programs and applications running GUI interface) to and from the rest of the OS, the hardware, and each other. It communicates with its clients over the display server protocol which can be network-transparent and network capable. Commonly known display server communications protocols include X11, Wayland, Mir, etc.
X11 is the protocol implemented by X Windows System while Wayland is the protocol used by Wayland Compositor. In simple terms, X Windows System and Wayland determine how your program's display will appear depending on your actions. These actions include clicking on a checkbox, moving the windows, clicking a button, etc. The X Window System is a client/server network protocol that's been put into use for a while now. X.Org Server is the free and open source implementation of the display server for the X Window System stewarded by the X.Org Foundation. Wayland is a computer protocol that specifies the communication between a display server (called a Wayland compositor) and its clients.
Xorg
Xorg is based on a client/server model and thus allows clients to run either locally or remotely on a different machine. Client applications use a protocol library such as libX11 for sending and receiving commands to the X server. X toolkit libraries are also used to draw and operate widgest like buttons and scroll bars. The X server receives graphics requests from the client programs to be displayed to the user, and it sends back user commands from input devices such as keyboards, mouse, touchscreens, etc.
The numbers on the image show the flow of the events and data across different modules of the Xorg modules. To understand more about the workflow go through this page.
As can be seen, the X Server doesn't have the info to decide which window should receive the event, nor can it transform the screen coordinates to window-local coordinates. And even though X has handed responsibility for the final painting of the screen to the compositing manager, X still controls the front buffer and modesetting. The X server acts as a middleman that introduces an extra step between applications and the compositor and an extra step between the compositor and the hardware.
Wayland
In Wayland, the compositor is the display server. We transfer the control of KMS and evdev to the compositor. The Wayland protocol lets the compositor send the input events directly to the clients and lets the client send the damage event directly to the compositor.
The compositor looks through its scenegraph to determine which window should receive the event.
As in the X case, when the client receives the event it updates the UI in response. But, in the Wayland rendering happens in the client, and the client just sends a request to the compositor to indicate the region that was updated.
With direct rendering client and server share a video memory buffer. the client links to a rendering library such as OpenGL that knows to program hardware and renders directly into the buffer.
Issues with the Xorg
As discussed above, the design of the Xorg doesn't allow the applications to have GUI-Level isolation. This implies that if the system has multiple GUI applications running, like Word Office, any of your favourite editor, or something else, then there is no isolation among them. It is easy to log keystrokes of all processes of the same user using the command xinput.
The command generates the log of all key-presses. Any kind of isolation is not possible, even using SELinux since any keystrokes passed to the X Server are available for any arbitrary program.
To see the issue in action you can use the xinput command available for systems running Xorg. First run the following command to list the devices on your system:
xinput list
As can be seen in the above image the keyboard has the id of 14. Now run the following command to get the key presses:
xinput test id
Replace the id with the id of the device, in our case it is 14. After running the command open another terminal and start typing commands there. You will be able to see the recorded key logs on the initial terminal, just like shown in the image below.
Try running the sudo command, loggin in as root using su, you will observe that the password you type is also shown.
Why still Xorg?
Even though Wayland eliminates most of the design flaws of the Xorg it has its own issues. In order to communicate with the display server the programs, which act as clients, running on the system must know how to communicate with it. Xorg being older than Wayland is more developed and has better extensibility. This is the reason why some applicatons or programs might not run when using Wayland. This is why redshift doesn't work in Wayland.
Wayland is not very stable when compared with Xorg, as it is relatively new. Ubuntu 18 LTS's primary concern was of stability. Even though the Wayland project has been up for almost ten years things are not 100% stable. All this resulted in making Xorg as default in Ubuntu 18, but Wayland is installed allowing users to switch if desired.