Skip to content

Support dynamic linking #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DrXiao
Copy link
Collaborator

@DrXiao DrXiao commented Aug 13, 2025

The proposed changes enhance shecc to generate dynamically linked executables. When the --dynlink is specified, shecc produce sections such as .plt and .got for the compiled programs, allowing the executables to leverage the ELF interpreter and the GNU C library to run.

This pull request is still a work in progress due to the following incomplete tasks:

  • Fix the potential issues. (The bootstrapping process still fails for dynamically linked shecc.)
  • Improve code quality and commit messages.
  • Implement for RISC-V output
  • Make the dynamically linked shecc run the test suites.
  • Improve README.md to describe dynamic linking.
  • Enhance GitHub workflows to verify the dynamically linked shecc.
  • Validate the proposed changes on an Arm machine such as a BeagleBone Black or Raspberry Pi.

Updated usage:

# Add '--dynlink' to generate dynamically linked executable.
$ shecc [-o output] [+m] [--dump-ir] [--no-libc] [--dynlink] <input.c>

# Run the generated executable by given the elf interpreter prefix.
$ qemu-arm -L /usr/arm-linux-gnueabi/ <executable>

@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 13, 2025

Currently, only the stage 0 compiler and stage 1 compiler can be generated, and the stage 1 compiler will encounter a Segmentation fault when running.

However, the stage 0 compiler can still compile a simple program and run the executable via QEMU:

/* test.c */
int main(void)
{
    printf("%x %x %x\n", 1, 2, 3);
    printf("%x %x %x %x\n", 1, 2, 3, 4);
    printf("%x %x %x %x %x\n", 1, 2, 3, 4, 5);
    return 0;
}
$ out/shecc --dynlink -o test test.c

Then, we can use arm-linux-gnueabi-readelf or arm-linux-gnueabi-objdump to check the executable. For example, check the relocation information:

$ arm-linux-gnueabi-readelf --relocs test

Relocation section '.rel.plt' at offset 0x260 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
000102a8  00000116 R_ARM_JUMP_SLOT   00000000   __libc_start_main
000102ac  00000216 R_ARM_JUMP_SLOT   00000000   printf

However, I ran the test via qemu-arm and found that while the program can execute the main function, the result of certain printf() calls are incorrect.

$ qemu-arm -L /usr/arm-linux-gnueabi/ test
1 2 3
1 2 3 40830000
1 2 3 40830000 10224

Notice that the second and third printf() calls have more than four arguments, certain arguments will be pushed to the stack due to the Arm calling convention.

I think this is a potential issue that shecc pushes wrong values to the stack to make (glibc's) printf() calls produce incorrect results.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 13, 2025

FWIW, I disassemble the test executable:

test.asm
$ arm-linux-gnueabi-objdump -d test

test:     file format elf32-littlearm


Disassembly of section .text:

000100b4 <.text>:
   100b4:	e3a0b000 	mov	fp, #0
   100b8:	e3a0e000 	mov	lr, #0
   100bc:	e49d1004 	pop	{r1}		@ (ldr r1, [sp], #4)
   100c0:	e1a0200d 	mov	r2, sp
   100c4:	e52d2004 	push	{r2}		@ (str r2, [sp, #-4]!)
   100c8:	e52d0004 	push	{r0}		@ (str r0, [sp, #-4]!)
   100cc:	e3a0c000 	mov	ip, #0
   100d0:	e52dc004 	push	{ip}		@ (str ip, [sp, #-4]!)
   100d4:	e30000ec 	movw	r0, #236	@ 0xec
   100d8:	e3400001 	movt	r0, #1
   100dc:	e3a03000 	mov	r3, #0
   100e0:	eb000067 	bl	0x10284
   100e4:	e3a0007f 	mov	r0, #127	@ 0x7f
   100e8:	eb000005 	bl	0x10104
   100ec:	e1a09000 	mov	r9, r0
   100f0:	e1a0a001 	mov	sl, r1
   100f4:	e3008004 	movw	r8, #4
   100f8:	e3408000 	movt	r8, #0
   100fc:	e04dd008 	sub	sp, sp, r8
   10100:	e1a0c00d 	mov	ip, sp
   10104:	eb000005 	bl	0x10120
   10108:	e3008004 	movw	r8, #4
   1010c:	e3408000 	movt	r8, #0
   10110:	e08dd008 	add	sp, sp, r8
   10114:	e1a00000 	nop			@ (mov r0, r0)
   10118:	e3a07001 	mov	r7, #1
   1011c:	ef000000 	svc	0x00000000
   10120:	e1a00009 	mov	r0, r9
   10124:	e1a0100a 	mov	r1, sl
   10128:	eaffffff 	b	0x1012c
   1012c:	e50de004 	str	lr, [sp, #-4]
   10130:	e3008044 	movw	r8, #68	@ 0x44
   10134:	e3408000 	movt	r8, #0
   10138:	e04dd008 	sub	sp, sp, r8
   1013c:	e3000224 	movw	r0, #548	@ 0x224
   10140:	e3400001 	movt	r0, #1
   10144:	e3a01001 	mov	r1, #1
   10148:	e3a02002 	mov	r2, #2
   1014c:	e3a03003 	mov	r3, #3
   10150:	e58d0004 	str	r0, [sp, #4]
   10154:	e58d1008 	str	r1, [sp, #8]
   10158:	e58d200c 	str	r2, [sp, #12]
   1015c:	e58d3010 	str	r3, [sp, #16]
   10160:	e59d0004 	ldr	r0, [sp, #4]
   10164:	e59d1008 	ldr	r1, [sp, #8]
   10168:	e59d200c 	ldr	r2, [sp, #12]
   1016c:	e59d3010 	ldr	r3, [sp, #16]
+  10170:	eb000046 	bl	0x10290
   10174:	e300022e 	movw	r0, #558	@ 0x22e
   10178:	e3400001 	movt	r0, #1
   1017c:	e3a01001 	mov	r1, #1
   10180:	e3a02002 	mov	r2, #2
   10184:	e3a03003 	mov	r3, #3
   10188:	e3a04004 	mov	r4, #4
   1018c:	e58d0014 	str	r0, [sp, #20]
   10190:	e58d1018 	str	r1, [sp, #24]
   10194:	e58d201c 	str	r2, [sp, #28]
   10198:	e58d3020 	str	r3, [sp, #32]
   1019c:	e58d4024 	str	r4, [sp, #36]	@ 0x24
   101a0:	e59d0014 	ldr	r0, [sp, #20]
   101a4:	e59d1018 	ldr	r1, [sp, #24]
   101a8:	e59d201c 	ldr	r2, [sp, #28]
   101ac:	e59d3020 	ldr	r3, [sp, #32]
   101b0:	e59d4024 	ldr	r4, [sp, #36]	@ 0x24
+  101b4:	eb000035 	bl	0x10290
   101b8:	e300023b 	movw	r0, #571	@ 0x23b
   101bc:	e3400001 	movt	r0, #1
   101c0:	e3a01001 	mov	r1, #1
   101c4:	e3a02002 	mov	r2, #2
   101c8:	e3a03003 	mov	r3, #3
   101cc:	e3a04004 	mov	r4, #4
   101d0:	e3a05005 	mov	r5, #5
   101d4:	e58d0028 	str	r0, [sp, #40]	@ 0x28
   101d8:	e58d102c 	str	r1, [sp, #44]	@ 0x2c
   101dc:	e58d2030 	str	r2, [sp, #48]	@ 0x30
   101e0:	e58d3034 	str	r3, [sp, #52]	@ 0x34
   101e4:	e58d4038 	str	r4, [sp, #56]	@ 0x38
   101e8:	e58d503c 	str	r5, [sp, #60]	@ 0x3c
   101ec:	e59d0028 	ldr	r0, [sp, #40]	@ 0x28
   101f0:	e59d102c 	ldr	r1, [sp, #44]	@ 0x2c
   101f4:	e59d2030 	ldr	r2, [sp, #48]	@ 0x30
   101f8:	e59d3034 	ldr	r3, [sp, #52]	@ 0x34
   101fc:	e59d4038 	ldr	r4, [sp, #56]	@ 0x38
   10200:	e59d503c 	ldr	r5, [sp, #60]	@ 0x3c
+  10204:	eb000021 	bl	0x10290
   10208:	e3a00000 	mov	r0, #0
   1020c:	e1a00000 	nop			@ (mov r0, r0)
   10210:	e3008044 	movw	r8, #68	@ 0x44
   10214:	e3408000 	movt	r8, #0
   10218:	e08dd008 	add	sp, sp, r8
   1021c:	e51de004 	ldr	lr, [sp, #-4]
   10220:	e12fff3e 	blx	lr

Disassembly of section .plt:

00010270 <.plt>:
   10270:	e52de004 	push	{lr}		@ (str lr, [sp, #-4]!)
   10274:	e300a2a4 	movw	sl, #676	@ 0x2a4
   10278:	e340a001 	movt	sl, #1
   1027c:	e1a0e00a 	mov	lr, sl
   10280:	e59ef000 	ldr	pc, [lr]
   10284:	e300c2a8 	movw	ip, #680	@ 0x2a8
   10288:	e340c001 	movt	ip, #1
   1028c:	e59cf000 	ldr	pc, [ip]
+  10290:	e300c2ac 	movw	ip, #684	@ 0x2ac
   10294:	e340c001 	movt	ip, #1
   10298:	e59cf000 	ldr	pc, [ip]

0x10290 is the starting address of printf@plt. However, in the text section, there are three places using bl instruction to call printf(), and each of these places has several str and ldr instructions to manipulate the stack beforehand.

@jserv jserv requested review from ChAoSUnItY and fennecJ August 14, 2025 03:11
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 14, 2025
STAGE0_FLAGS ?= --dump-ir
STAGE1_FLAGS ?=
ifeq ($(DYNLINK),1)
BUILTIN_LIBC := dyn-c.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid dyn-c.c?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename it to c.h to reflect that it is a header file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants