Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An issue with global objects #1650

Closed
Frankenween opened this issue Feb 7, 2025 · 11 comments
Closed

An issue with global objects #1650

Frankenween opened this issue Feb 7, 2025 · 11 comments

Comments

@Frankenween
Copy link

Hi, thanks for this great tool. I'm trying to do some LLVM IR instrumentation and I hit an issue, which looks like a bug. I have this sample code:

struct interesting {
	int dummy1;
	void (*f1)();
	void (*f2)();
	int dummy2;
};

struct nested_ptr {
	int dummy1;
	struct interesting* ptr;
	int dummy2;
};

extern struct interesting interesting_stub;
extern struct nested_ptr nested_ptr_stub;

void interesting_1_fstub() {
	interesting_stub.f1();
}

void interesting_2_fstub() {
	interesting_stub.f2();
}

void store_down() {
	interesting_stub = *nested_ptr_stub.ptr;
}

void store_up() {
	nested_ptr_stub.ptr = &interesting_stub;
	interesting_stub.f1 = interesting_1_fstub;
	interesting_stub.f2 = interesting_2_fstub;
}

void f1();
void f2();

void consume_nested_ptr(struct nested_ptr* v) {
	nested_ptr_stub = *v;
}

struct interesting i1 = {
	.f1 = f1,
	.f2 = f2
};

struct nested_ptr n1 = {
	.ptr = &i1
};

void test_ptr() {
    consume_nested_ptr(&n1);
}

I compile with clang-14 -O0 and run wpa -ander -dump-callgraph -ff-eq-base(SVF is the latest one).
I expect interesting_stub.f1 to point to interesting_1_fstub and f1, however interesting_1_fstub -> f1 is not present in the call graph.

If I understand it correctly, n1.ptr points to i1. Later the pointer is copied, so nested_ptr_stub.ptr should point to i1 or interesting_stub. And we end up with interesting_stub = *nested_ptr_stub.ptr, so interesting_stub should have the same values as i1 + _fstub pointers.

When I change nested_ptr.ptr type to struct interesting, everything works fine.

If it is an expected behavior, could you please suggest any ways to make it work as I described?
example.zip

@yuleisui
Copy link
Collaborator

yuleisui commented Feb 7, 2025

What do you mean by "When I change nested_ptr.ptr type to struct interesting, everything works fine.".

Could I ask you to do the below so that we can debug and have a look?

(1) simplify the example to leave only necessary variables and functions
(2) provide another code that you made a change that "everything works fine"

Thanks,

@Frankenween
Copy link
Author

What do you mean by "When I change nested_ptr.ptr type to struct interesting, everything works fine.".
Code is changed in this way:

struct interesting {
	int dummy1;
	void (*f1)();
	void (*f2)();
	int dummy2;
};

struct nested_ptr {
	int dummy1;
	struct interesting ptr;
	int dummy2;
};
...
struct nested_ptr n1 = {
	.ptr = {
		.f1 = f1,
		.f2 = f2
	}
};

I reduced the code size and left important things only

example.tar.gz

@yuleisui
Copy link
Collaborator

yuleisui commented Feb 8, 2025

Thanks and what is the minimum change to the below code that does not work? Would be good to paste it here too.

What do you mean by "When I change nested_ptr.ptr type to struct interesting, everything works fine.".
Code is changed in this way:

struct interesting {
int dummy1;
void (*f1)();
void (*f2)();
int dummy2;
};

struct nested_ptr {
int dummy1;
struct interesting ptr;
int dummy2;
};
...
struct nested_ptr n1 = {
.ptr = {
.f1 = f1,
.f2 = f2
}
};
I reduced the code size and left important things only

example.tar.gz

@Frankenween
Copy link
Author

I pasted the part of the code. To be more clear: I change the second field type in struct nested_ptr, so it is now a structure, not a pointer to it.
Working code:

struct interesting {
	int dummy1;
	void (*f1)();
	void (*f2)();
	int dummy2;
};

struct nested_ptr {
	int dummy1;
	struct interesting ptr;
	int dummy2;
};

extern struct interesting interesting_stub;
extern struct nested_ptr nested_ptr_stub;

void interesting_1_fstub() {
	interesting_stub.f1();
}

void interesting_2_fstub() {
	interesting_stub.f2();
}

void exchange_value() {
	interesting_stub = nested_ptr_stub.ptr;
	nested_ptr_stub.ptr = interesting_stub;
	interesting_stub.f1 = interesting_1_fstub;
	interesting_stub.f2 = interesting_2_fstub;
}

void f1();
void f2();

struct nested_ptr n1 = {
	.ptr = {
		.f1 = f1,
		.f2 = f2
	}
};

void test_ptr() {
	nested_ptr_stub = n1;
}

Not working one:

struct interesting {
	int dummy1;
	void (*f1)();
	void (*f2)();
	int dummy2;
};

struct nested_ptr {
	int dummy1;
	struct interesting* ptr;
	int dummy2;
};

extern struct interesting interesting_stub;
extern struct nested_ptr nested_ptr_stub;

void interesting_1_fstub() {
	interesting_stub.f1();
}

void interesting_2_fstub() {
	interesting_stub.f2();
}

void exchange_value() {
	interesting_stub = *nested_ptr_stub.ptr;
	nested_ptr_stub.ptr = &interesting_stub;
	interesting_stub.f1 = interesting_1_fstub;
	interesting_stub.f2 = interesting_2_fstub;
}

void f1();
void f2();

struct interesting i1 = {
	.f1 = f1,
	.f2 = f2
};

struct nested_ptr n1 = {
	.ptr = &i1
};

void test_ptr() {
	nested_ptr_stub = n1;
}

@yuleisui
Copy link
Collaborator

yuleisui commented Feb 9, 2025

I have further simplified your two programs.

/// First program with direct struct copy in `test_ptr` via llvm.memcpy
struct interesting {
        void (*f1)();
        void (*f2)();
};
struct nested_ptr {
        struct interesting ptr;
};
void f1();
void f2();
struct nested_ptr n1 = {
        .ptr = {
                .f1 = f1,
                .f2 = f2
        }
};
void test_ptr() {
        struct interesting interesting_stub;
        interesting_stub = n1.ptr;
        interesting_stub.f2();
}
///  Second program with a pointer dereference for a struct copy in `test_ptr` via a load instruction and then an llvm.memcpy
struct interesting {
        void (*f1)();
        void (*f2)();
};
struct nested_ptr {
        struct interesting* ptr;
};
void f1();
void f2();
struct interesting i1 = {
        .f1 = f1,
        .f2 = f2
};
struct nested_ptr n1 = {
        .ptr = &i1
};
void test_ptr() {
        struct interesting interesting_stub;
        interesting_stub = *n1.ptr;
        interesting_stub.f1();
}

For the first program, SVF works correctly as an LLVM memcpy will be introduced for interesting_stub = n1.ptr;. SVF will copy each field of n1 and its nested fields to interesting_stub, hence the function pointer and indirect call to f1 works correctly in the call graph

define void @test_ptr() #1 {
  %1 = alloca %struct.interesting, align 8
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %1, ptr align 8 @n1, i64 16, i1 false)
  %2 = getelementptr inbounds %struct.interesting, ptr %1, i32 0, i32 1
  %3 = load ptr, ptr %2, align 8
  call void %3()
  ret void
}

For the second program, LLVM IR translates interesting_stub = *n1.ptr; into two separate instructions: first, a load instruction to retrieve the content of n1, and then an LLVM memcpy. When loading n1, the intended semantics are to load the entire object, as n1 has only one field. However, SVF models each field separately, even when there is only one field. This leads to a mismatch: (1) SVF stores &i1 into n1.ptr, the first field, and (2) LLVM IR loads the base object via %2 = load ptr, ptr @n1. The field and base objects are two different ones, therefore, you can't load the value from where you stored it. To address this, we need to enable -ff-eq-base to treat the first field as the entire base object when analysing this bitcode.

; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
define void @test_ptr() #1 {
  %1 = alloca %struct.interesting, align 8
  %2 = load ptr, ptr @n1, align 8.                             // require turning on `-ff-eq-base`
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %1, ptr align 8 %2, i64 16, i1 false)
  %3 = getelementptr inbounds %struct.interesting, ptr %1, i32 0, i32 0
  %4 = load ptr, ptr %3, align 8
  call void %4()
  ret void
}

If you add another field declaration (e.g., int dummy) in struct nested_ptr before the field struct interesting* ptr;, then there is no need to enable -ff-eq-base, as the first field int dummy and n1.ptr will be handled separately. In this case, the compiler-generated LLVM IR will not load @n1 to access the second field ptr.

A possible solution is to enhance SVF to handle the first field and the entire object in a smarter and automated manner, rather than requiring manual toggling of -ff-eq-base. For example, SVF could automatically treat the first field as the entire base object when a struct contains only one field or when the first field is a pointer that can be used interchangeably with the base object during pointer dereferencing.

@jumormt @bjjwwang we may need to look into this first-field-eq-base issue later.

The llvm bcs and callgraphs and pags are attached.

bcs-graphs.zip

@Frankenween
Copy link
Author

If you add another field declaration (e.g., int dummy) in struct nested_ptr before the field struct interesting* ptr;, then there is no need to enable -ff-eq-base, as the first field int dummy and n1.ptr will be handled separately. In this case, the compiler-generated LLVM IR will not load @n1 to access the second field ptr.
I believe my problem is not about bases: I added both dummy fields in struct interesting and in struct nested_ptr

struct interesting {
        int dummy;
        void (*f1)();
        void (*f2)();
};
struct nested_ptr {
        int dummy;
        struct interesting* ptr;
};
void f1();
void f2();
struct interesting i1 = {
        .f1 = f1,
        .f2 = f2
};
struct nested_ptr n1 = {
        .ptr = &i1
};
void test_ptr() {
        struct interesting interesting_stub;
        interesting_stub = *n1.ptr;
        interesting_stub.f1();
        interesting_stub.f2();
}

IR is straightforward and uses GEPs:

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @test_ptr() #1 {
  %1 = alloca %struct.interesting, align 8
  %2 = load %struct.interesting*, %struct.interesting** getelementptr inbounds (%struct.nested_ptr, %struct.nested_ptr* @n1, i32 0, i32 1), align 8
  %3 = bitcast %struct.interesting* %1 to i8*
  %4 = bitcast %struct.interesting* %2 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %3, i8* align 8 %4, i64 24, i1 false)
  %5 = getelementptr inbounds %struct.interesting, %struct.interesting* %1, i32 0, i32 1
  %6 = load void (...)*, void (...)** %5, align 8
  call void (...) %6()
  %7 = getelementptr inbounds %struct.interesting, %struct.interesting* %1, i32 0, i32 2
  %8 = load void (...)*, void (...)** %7, align 8
  call void (...) %8()
  ret void
}

However, neither f1 call, nor f2 are resolved(but if I remove dummy variable from struct interesting, f1 is resolved, but not f2)

@yuleisui
Copy link
Collaborator

It is good for the trying. It looks to me that the PAG is not fully built when handling llvm.memcpy. When translating llvm.memcpy.p0i8.p0i8.i64, it is supposed to copy each of the three fields of the object that %2 points to into the corresponding fields of %1. However, the PAG only shows one GEP edge (offset is 0). There should be two more GEP edges originating from ValVarID 26, as shown in the screenshot below. We will need to fix these two missing edges when translating llvm.memcpy. @shuangxiangkan. Need to take a look at these two lines to make sure the size is 3 rather than 1

const Type* stype = getBaseTypeAndFlattenedFields(S, srcFields, szValue);
const Type* dtype = getBaseTypeAndFlattenedFields(D, dstFields, szValue);

Image

@yuleisui
Copy link
Collaborator

@Frankenween could you try this pull request to see whether it fixes the problem (maybe try to tweak the programs to see any other issues)?

#1652 (comment)

@yuleisui
Copy link
Collaborator

@Frankenween Xiao made another update and could you try the pull request again?

@Frankenween
Copy link
Author

@yuleisui It's not crashing now and it resolves more calls, but it is still not working in all cases. Left one more test and LLVM IR in the PR.

@Frankenween
Copy link
Author

Sorry for the late reply.
I tested the already merged patch an it seems to fix the problem, I observed no crashes in my project

Thanks @jumormt and @yuleisui for the fast resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants