Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aarch64 be patches #985

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 50 additions & 6 deletions gum/arch-arm64/gumarm64writer.c
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ gum_arm64_writer_init (GumArm64Writer * writer,
writer->label_defs = NULL;
writer->label_refs.data = NULL;
writer->literal_refs.data = NULL;
writer->data_endian = GUM_ENDIAN_NATIVE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment should go just before target_os.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


gum_arm64_writer_reset (writer, code_address);
}
Expand Down Expand Up @@ -1992,15 +1993,37 @@ gum_arm64_writer_commit_literals (GumArm64Writer * self)
if (r->width != GUM_LITERAL_64BIT)
continue;

/*
* Whilst instructions in aarch64 are always in little endian (even on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Whilst instructions in aarch64 are always in little endian (even on
* Whilst instructions in AArch64 are always in little endian (even on

(Here and elsewhere.) For consistency with our existing comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* big-endian systems), the data is in native endian. Thus since we wish to
* support writing code for big-endian systems on little-endian targets and
* vice versa, we need to check the writer configuration.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this comment to where the data_endian field is defined. We can then avoid duplicating it further down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (slot = first_slot; slot != last_slot; slot++)
{
if (GINT64_FROM_LE (*slot) == r->val)
WorksButNotTested marked this conversation as resolved.
Show resolved Hide resolved
break;
if (self->data_endian == GUM_ENDIAN_LITTLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the optimizer able to avoid repeatedly reading and checking this field's value, or should we introduce a local variable used throughout the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be very surprised if the optimizer didn't take care of it anyway, but the actual code generation part isn't hugely performance critical (to the point where a few cycles matter), its the performance of the generated code which is most important (for applications such as fuzzing).

{
if (GINT64_FROM_LE (*slot) == r->val)
break;
}
else
{
if (GINT64_FROM_BE (*slot) == r->val)
break;
}

}

if (slot == last_slot)
{
*slot = GINT64_TO_LE (r->val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Like above.)

if (self->data_endian == GUM_ENDIAN_LITTLE)
{
*slot = GINT64_TO_LE (r->val);
}
else
{
*slot = GINT64_TO_BE (r->val);
}
last_slot = slot + 1;
}

Expand All @@ -2024,15 +2047,36 @@ gum_arm64_writer_commit_literals (GumArm64Writer * self)
if (r->width != GUM_LITERAL_32BIT)
continue;

/*
* Whilst instructions in aarch64 are always in little endian (even on
* big-endian systems), the data is in native endian. Thus since we wish to
* support writing code for big-endian systems on little-endian targets and
* vice versa, we need to check the writer configuration.
*/
for (slot = first_slot; slot != last_slot; slot++)
{
if (GINT32_FROM_LE (*slot) == r->val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Like above.)

break;
if (self->data_endian == GUM_ENDIAN_LITTLE)
{
if (GINT32_FROM_LE (*slot) == r->val)
break;
}
else
{
if (GINT32_FROM_BE (*slot) == r->val)
break;
}
}

if (slot == last_slot)
{
*slot = GINT32_TO_LE (r->val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Like above.)

if (self->data_endian == GUM_ENDIAN_LITTLE)
{
*slot = GINT32_TO_LE (r->val);
}
else
{
*slot = GINT32_TO_BE (r->val);
}
last_slot = slot + 1;
}

Expand Down
10 changes: 10 additions & 0 deletions gum/arch-arm64/gumarm64writer.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ G_BEGIN_DECLS

typedef struct _GumArm64Writer GumArm64Writer;
typedef guint GumArm64IndexMode;
typedef guint GumArm64DataEndian;

struct _GumArm64Writer
{
Expand All @@ -51,6 +52,15 @@ struct _GumArm64Writer
GumMetalArray label_refs;
GumMetalArray literal_refs;
const guint32 * earliest_literal_insn;

GumArm64DataEndian data_endian;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GumArm64DataEndian data_endian;
gint data_endian;
  • Let's avoid introducing a new enum here. Our Vala bindings will be using the ByteOrder enum from glib-2.0.vapi.
  • This field should be moved up. The existing three configuration values are in their own section, starting with target_os. I'd say this one should go first, as it's more important/fundamental than target_os.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I kept the typedef guint GumArm64DataEndian;, was that right? Also added a comment to it for what the values should be (since the type is otherwise not very constrained).

};

enum _GumArm64DataEndian
{
GUM_ENDIAN_LITTLE = __ORDER_LITTLE_ENDIAN__,
GUM_ENDIAN_BIG = __ORDER_BIG_ENDIAN__,
GUM_ENDIAN_NATIVE = __BYTE_ORDER__,
};

enum _GumArm64IndexMode
Expand Down
7 changes: 5 additions & 2 deletions gum/gumdefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,12 @@ typedef GumMipsCpuContext GumCpuContext;
* The only non-legacy big-endian configuration on 32-bit ARM systems is BE8.
* In this configuration, whilst the data is in big-endian, the code stream is
* still in little-endian. Since Capstone is disassembling the code stream, it
* should work in little-endian even on BE8 systems.
* should work in little-endian even on BE8 systems. On big-endian 64-bit ARM
* systems, the code stream is likewise in little-endian.
*/
#if G_BYTE_ORDER == G_LITTLE_ENDIAN || defined (__arm__)
#if G_BYTE_ORDER == G_LITTLE_ENDIAN || \
defined (__arm__) || \
defined (__aarch64__)
# define GUM_DEFAULT_CS_ENDIAN CS_MODE_LITTLE_ENDIAN
#else
# define GUM_DEFAULT_CS_ENDIAN CS_MODE_BIG_ENDIAN
Expand Down
Loading