Summary
- Appgate Threat Advisory Services discovered a stack overflow vulnerability in the TIPC module of the Linux kernel
- Local or remote exploitation can lead to denial of service and code execution
- Exploitation requires the TIPC module to be loaded on the target
- Affected versions include 4.8 through 5.17-rc3
- Patch released on Feb. 10, 2022
Introduction
In November 2021, SentinelLabs publicly disclosed a remote heap overflow they found in the Linux kernel networking module for the Transparent Inter-Process Communication (TIPC) protocol (CVE-2021-43267).
This was a pretty neat bug, being a modern remote heap overflow in the Linux kernel. It wasn’t long after public disclosure until proof-of-concepts for local privilege escalation were released by researchers (e.g. this writeup by @bl4sty).
However, there hadn’t been anything on leveraging the vulnerability for remote code execution. Naturally, this warranted checking out. You can imagine my surprise when in doing so I discovered a remote stack overflow.
In this post, I’ll give a whistle-stop tour on TIPC to provide some necessary context before diving into the vulnerability itself, remediation, patching and our disclosure timeline.
TL;DR on TIPC
Transparent Inter-Process Communication (TIPC) is an IPC mechanism designed for intra-cluster communication. Cluster topology is managed around the concept of nodes and the links between these nodes.
TIPC communications are done over a “bearer,” which is a TIPC abstraction of a network interface. A “media” is a bearer type, of which there are four currently supported: Ethernet, Infiniband, UDP/IPv4 and UDP/IPv6.
Take this example from the TIPC Getting Started guide:
$ tipc bearer enable media eth dev eth0
Here we are configuring our node (aka our computer) to use a bearer with the Ethernet media type on our eth0 interface. Now TIPC knows it can use eth0 for communicating over Ethernet.
It’s worth noting here, from an exploitation context, that a remote attacker is restricted by the TIPC media types the target has already set up. Locally, if the module is loaded, an attacker can use the underlying netlink communications to configure a bearer (credit to bl@sty for his work on CVE-2021-43267). They won’t, however, have permissions to send raw ethernet frames, leaving a UDP bearer the likely option.
We have nodes, bearers and media types covered. The last integral part for our topology is a “link.” After we have established a bearer for our TIPC communications, our node will begin to broadcast discovery packets and look for other nodes.
The intention here is to establish a link with other nodes. A link defines a communication channel between a pair of nodes. This link has various properties surrounding tolerance and delivery guarantees, as well as being supervised.
Okay, fear not, that should be enough general TIPC context for now! Time to jump into the vulnerability itself.
The Vulnerability
One of the many features of the TIPC module is its monitoring framework. Introduced into the kernel in June 2016, the framework uses a distributed “Overlapping Ring Supervision Algorithm” to monitor neighboring nodes in the same domain.
tl;dr nodes communicate to each other and each node tracks its peer’s view of the domain topology—e.g., are my peers seeing the same number of nodes on the domain as me, and are the same number of peers alive?
/* struct tipc_peer: state of a peer node and its domain * @addr: tipc node identity of peer * @head_map: shows which other nodes currently consider peer 'up' * @domain: most recent domain record from peer * @hash: position in hashed lookup list * @list: position in linked list, in circular ascending order by 'addr' * @applied: number of reported domain members applied on this monitor list * @is_up: peer is up as seen from this node * @is_head: peer is assigned domain head as seen from this node * @is_local: peer is in local domain and should be continuously monitored * @down_cnt: - numbers of other peers which have reported this on lost */ struct tipc_peer { u32 addr; struct tipc_mon_domain *domain; struct hlist_node hash; struct list_head list; u8 applied; u8 down_cnt; bool is_up; bool is_head; bool is_local; };
As we can see, among other things, we keep a reference to a struct tipc_mon_domain.
This struct represents a domain record used to define a view of the TIPC topology, such as how many members are known. See the definition below:
#define MAX_MON_DOMAIN 64 ... /* struct tipc_mon_domain: domain record to be transferred between peers * @len: actual size of domain record * @gen: current generation of sender's domain * @ack_gen: most recent generation of self's domain acked by peer * @member_cnt: number of domain member nodes described in this record * @up_map: bit map indicating which of the members the sender considers up * @members: identity of the domain members */ struct tipc_mon_domain { u16 len; u16 gen; u16 ack_gen; u16 member_cnt; u64 up_map; u32 members[MAX_MON_DOMAIN]; };
Copies of these domain records are transferred between peers to let each other know their respective views of the topology. Each node then keeps a copy of the most up-to-date domain record received from each of its peers via the tipc_peer->domain field.
In TIPC, messages between nodes are categorized via header fields into overarching ‘message users’ (i.e, what part of the TIPC stack uses this) and then further divided into ‘message types.’
These domain records are communicated between links using the LINK_PROTOCOL
TIPC message user and STATE_MSG
message type. This message contains a TIPC header (containing general and message type specific fields) and an optional body, containing a copy of the sender’s struct tipc_mon_domain.
When receiving a STATE_MSG
, if it passes header validation and a few other checks, the message body is passed on to the tipc_mon_rcv
function. The role of this function is to update the domain records of any peers, in the event they include a domain record within a STATE_MSG.
That all sounds fairly straightforward, right? Let’s take a deeper dive into the code to see where the issue crops up:
/* tipc_mon_rcv - process monitor domain event message * * @data: STATE_MSG body * @dlen: STATE_MSG body size (taken from TIPC header) */ void tipc_mon_rcv(struct net *net, void *data, u16 dlen, u32 addr, struct tipc_mon_state *state, int bearer_id) { ... struct tipc_mon_domain *arrv_dom = data; struct tipc_mon_domain dom_bef; [0] ... /* Sanity check received domain record */ [1] if (dlen < dom_rec_len(arrv_dom, 0)) [2] return; if (dlen != dom_rec_len(arrv_dom, new_member_cnt)) [3] return; if (dlen < new_dlen || arrv_dlen != new_dlen) [4] return; ... /* Drop duplicate unless we are waiting for a probe response */ if (!more(new_gen, state->peer_gen) && !probing) [5] return; ... /* Cache current domain record for later use */ dom_bef.member_cnt = 0; dom = peer->domain; if (dom) [6] memcpy(&dom_bef, dom, dom->len); [7] /* Transform and store received domain record */ if (!dom || (dom->len < new_dlen)) { kfree(dom); dom = kmalloc(new_dlen, GFP_ATOMIC); [8] peer->domain = dom; [9] if (!dom) goto exit; } ...
First things first, the function does some basic sanity checks [1] to make sure that a) the message body contains a domain record and b) it contains a valid struct tipc_mon_domain.
This sanitization involves checking that dlen
(the length of the STATE_MSG body as defined in the header) is large enough to contain an empty domain record with zero members [2]. Then it checks that dlen
matches the expected length of a domain record with member_cnt
members, where member_cnt
is from the inbound domain record [3]. Finally, it checks the length provided in the header, dlen,
matches the len
field supplied in the new domain record [4].
After some more checks, we fetch the sending peers struct peer
to see if we’ve already received a domain record from them [6]. If we have, we want to temporarily cache a copy of the old record to do a comparison later [7].
Then, satisfied it’s a new and valid record, we’ll update the struct peer->domain
field with the new info [9]. If it’s the first domain record, we’ll make a new km
allocation for this, or if it’s larger than the last one, we’ll free the old record and reallocate more space [8].
Putting the Pieces Together
Okay, so where’s this all going? Well, some of you might have noticed I explicitly included the #define
for MAX_MON_DOMAIN
earlier, which defines the u32 members[] member as a 64 element array, i.e. we’ll track up to 64 domain members.
However, if we look back at the record sanitization [1], the function fails to validate that new_member_cnt
is less than or equal to MAX_MON_DOMAIN;
we check for the minimum size requirements [2] but not the maximum.
Knowing this, we can set up a link with the target node and submit a struct tipc_mon_domain
with an arbitrary member_cnt
and members[]
field—so long as the dlen
, len
and member_cnt
add up correctly.
The size is only in fact constrained by the MTU of the media used for communication, e.g., typical max ethernet frame (ignoring jumbo framed) is 1,518 bytes.
As an example, as an attacker we could respond to one of the broadcast packets and establish a link, pretending to be a peer node. Next, we would be able to send a 1,072-byte domain record (with 264 members) and pass the validation.
We have nothing to cache [6] as it’s our first domain record submitted from our malicious node, but now the node has allocated space for our 1,072 bytes record without issues [8] and our peer struct now references it [9].
See where this is going? On the second throw, we send a “newer” domain record [5] to the target node. So long as we pass the same sanitization checks, we’ll hit [6] where we need to cache the domain record we sent prior to this, with a call to memcpy [7]:
memcpy(&dom_bef, dom, dom->len)
Let’s remind ourselves what the record we just sent looks like:
dom = { len = 1072, gen = 3, ack_gen = 3, member_cnt = 264; up_map = 0xffffffffffffffff; u32 members[264] = 0x1337... };
What’s &dom_bef
again? That’s a local struct [0], and because it expects members
to be a 64 element array, it’s allocated as a 272-byte buffer on the stack. And we’re about to copy 1,072 bytes into it!
The only constraints on our struct tipc_mon_domain
payload are:
- len = dlen = sizeof(len,gen,ack_gen,member_cnt,up_map) + member_cnt * sizeof(u32)
- gen
needs to be higher than the last—must fit in MTU of bearer media.
As members sits at the end of the struct, we are now free to overwrite whatever follows on the stack with an arbitrary payload with pretty generous size requirements. We’ll touch on how feasible this is to exploit in a later post.
Vulnerability Overview
To recap everything we’ve covered:
- CVE-2022-0435 allows a local or remote attacker to trigger a stack overflow in the TIPC networking subsystem
- The size of the overflow is limited by the MTU of the bearer media enabled (Ethernet/Infiniband/UDP)
- Very light restrictions on the actual content of the payload, the majority being arbitrary
By copying more than 272 bytes into the stack buffer dom_bef
, we’re able to overwrite whatever comes after it in the stack, which likely includes a stack canary, as well as the base pointer and return address, potentially leading to control
flow hijacking.
However, it is worth noting, with modern mitigations in place, it is not trivial to exploit beyond denial of server. These mitigations include CONFIG_FORTIFY_SOURCE=y
(hard mitigation to control flow hijacking), CONFIG_STACK_PROTECTOR=y
(stack canaries increase info required to get control flow hijacking) and of course KASLR.
The latter of these two particularly impact the ease of remote exploitation. However, neither mitigate the ability to cause a remote kernel panic. Furthermore, with the right information leak, arbitrary code execution becomes trivial.
Remediation
This vulnerability has been present since the monitoring framework was first introduced in June 2016, impacting versions 4.8 through to 5.17-rc3.
The patch below was introduced in commit 9aa422ad3266 and has been merged into stable branches; updating your system to include this patch is the best way to mitigate CVE-2022-0345.
The TIPC module must be loaded for the system to be vulnerable. In addition, for the system to be targeted remotely, it needs to have a TIPC bearer enabled. If you don’t need to use TIPC or are unsure if you are, you can take the following steps:
$ lsmod | grep tipc
will let you know if the module is currently loadedmodprobe -r tipc
may allow you to unload the module if loaded, however you may need to reboot your system$ echo "install tipc /bin/true" >> /etc/modprobe.d/disable-tipc.conf
will prevent the module from being loaded
If you need to use TIPC and can’t immediately patch your system, look to enforce any configurations that prevent or limit the ability for attackers to imitate nodes in your cluster. Options include TIPC protocol level encryption, IPSec/MACSec and network separation.
Patch
As part of the initial disclosure, I included a suggested fix for the vulnerability. In the following discussion another issue regarding a u16 overflow was spotted by Eric Dumazet, a fix for which is also included in the final patch by Jon Maloy:
net/tipc/link.c | 9 +++++++-- net/tipc/monitor.c | 2 ++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 8d9e09f48f4c..1e14d7f8f28f 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -2200,7 +2200,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, struct tipc_msg *hdr = buf_msg(skb); struct tipc_gap_ack_blks *ga = NULL; bool reply = msg_probe(hdr), retransmitted = false; - u16 dlen = msg_data_sz(hdr), glen = 0; + u32 dlen = msg_data_sz(hdr), glen = 0; u16 peers_snd_nxt = msg_next_sent(hdr); u16 peers_tol = msg_link_tolerance(hdr); u16 peers_prio = msg_linkprio(hdr); @@ -2214,6 +2214,10 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, void *data; trace_tipc_proto_rcv(skb, false, l->name); + + if (dlen > U16_MAX) + goto exit; + if (tipc_link_is_blocked(l) || !xmitq) goto exit; @@ -2309,7 +2313,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, /* Receive Gap ACK blocks from peer if any */ glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); - + if(glen > dlen) + break; tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, &l->mon_state, l->bearer_id); diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c index 407619697292..2f4d23238a7e 100644 --- a/net/tipc/monitor.c +++ b/net/tipc/monitor.c @@ -496,6 +496,8 @@ void tipc_mon_rcv(struct net *net, void *data, u16 dlen, u32 addr, state->probing = false; /* Sanity check received domain record */ + if (new_member_cnt > MAX_MON_DOMAIN) + return; if (dlen < dom_rec_len(arrv_dom, 0)) return; if (dlen != dom_rec_len(arrv_dom, new_member_cnt))
Acknowledgments
I’d like to thank the maintainers of the TIPC module, as well as members of the security@kernel.org and linux-distros@vs.openwall.org for their help and work throughout the disclosure process.
I would also like to mention the vulnerability research done on TIPC previously by SentinelLabs with their work on CVE-2021-43267 and other security researchers, such as @bl4sty, for publishing their findings on TIPC exploitation.
Disclosure Timeline
- Jan. 27, 2022: Appgate Threat Advisory Services sent initial report to linux-distros and kernel security team
- Feb. 5, 2022: The patch is finalized
- Feb. 10, 2022: Coordinated release date (14:00 GMT)
References