Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support cross compilation to ARM #47

Open
GoogleCodeExporter opened this issue Apr 8, 2015 · 23 comments
Open

Changes to support cross compilation to ARM #47

GoogleCodeExporter opened this issue Apr 8, 2015 · 23 comments

Comments

@GoogleCodeExporter
Copy link

What need is this feature going to satisfy?
1. Support x86 or ARM, depending on desired target

What is the expected output? What do you have currently? Any workaround?

I have changes in my sandbox that support compiling to x86 or an ARM target. I 
can clean up the modifications and submit as a patch if the maintainers agree 
this is useful. 

Thanks - Vince
[email protected]


Original issue reported on code.google.com by [email protected] on 30 Oct 2013 at 9:12

@GoogleCodeExporter
Copy link
Author

We think this is useful and we'll be happy to add support for compiling on x86 
or ARM. 
Currently we don't have an option to test it on ARM, but if you have a working 
patch we will push it.

Original comment by [email protected] on 31 Oct 2013 at 6:57

@GoogleCodeExporter
Copy link
Author

Hi Vince,

We would be glad to support ARM; however, few guidelines for your patch:

1. make sure it is aligned with the current architecture of SockPerf.  Try to 
follow the way we added support for WIN32 - See os_abstract.cpp.h, and places 
in the code with #ifdef WIN32.
2. make sure your code doesn't break anything in the functionality and 
performance of sockperf for Linux and for WIN32.
(3. and of course, make sure that you don't break compilation for Linux and 
WIN32)

Also, feel free to add Wiki page or "README" file on how to use your feature.

Cheers,
 Avner

Original comment by [email protected] on 31 Oct 2013 at 8:26

  • Added labels: Pjm-Visited
  • Removed labels: Pjm-New

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 31 Oct 2013 at 8:28

  • Added labels: OpSys-Arm

@GoogleCodeExporter
Copy link
Author

Thanks, will work on cleaning up the changes per guidelines and add a Wiki. 
I'll get back to you when it's ready. 

Cheers

Original comment by [email protected] on 31 Oct 2013 at 3:23

@GoogleCodeExporter
Copy link
Author

one additional thing, make sure to build your changes on the top revision of 
the source

Original comment by [email protected] on 1 Nov 2013 at 6:40

@GoogleCodeExporter
Copy link
Author

Issue 33 has been merged into this issue.

Original comment by [email protected] on 6 Nov 2013 at 12:54

@GoogleCodeExporter
Copy link
Author

I've completed my changes and testing. My testing involved running
sockperf from an ARM-based system as a client and server to an x86 system
running as a client and server. Binaries used on both sides were compiled
from my modified source. Changes are described below, along with how to
build for ARM, and an attached patch file. Please let me know if there's
anything else I can to facilitate acceptance of this patch onto the project
trunk.
I pulled Sockperf from

svn co http://sockperf.googlecode.com/svn/trunk

Changes to Sockperf Source

I kept the source changes to a minimum. Only the things that absolutely had
to change in order to cross compile for ARM or compile to x86 were changed.
The patch is attached (created by svn diff > arm-patch.diff), and the
changes described below.

1)      Added entries in configure.ac to support an ARM Linux target
compilation.

2)      Had to append a GCC compiler option to extend the maximum number of
inline instructions for ARM code generation since Sockperf makes heavy use
of function inlining, examines all warnings, and treats warnings as errors.
GCC docs say

*max-inline-insns-single:*

Several parameters control the tree inliner used in gcc. This number sets
the maximum number of instructions (counted in GCC's internal
representation) in a single function that the tree inliner will consider
for inlining. This only affects functions declared inline and methods
implemented in a class declaration (C++). The default value is 500.

I extended the maximum inline instructions from the default of 500 to 1000.

3)      gettimeoftsc() in Ticks.h was modified to conditionally compile use
of clock_gettime in the case of ARM, and use rdtsc() otherwise since ARM
does not not ubiquitously and uniformly support an equivalent rdtsc()
instruction.

To build for ARM

1)      Define CROSS_COMPILE in the environment to point to the cross
compilation tools. In the case tested, CROSS_COMPILE was set as follows�

set
CROSS_COMPILE=/opt/gcc-linaro-arm-linux-gnueabihf-4.7-2012.11-20121123_linux/bin
/arm-linux-gnueabihf-

2)      Use ./autogen.sh to create the configure script.

3)      Invoke ./configure with the following options

$ ./configure CXX=${CROSS_COMPILE}g++ STRIP=${CROSS_COMPILE}strip
LD=${CROSS_COMPILE}ld CC=${CROSS_COMPILE}gcc --host i386



4)      Invoke make

5)      sockperf for ARM is located at ./src/sockperf

The same procedure can then be used, and switch between building for x86
and ARM.

1)      Use ./autogen.sh to create the configure script

2)      Invoke ./configure with no options to create the Makefile

3)      Invoke make

4)      sockperf for x86 is located at ./src/sockperf

Patch is as follows �

Index: configure.ac

===================================================================

--- configure.ac        (revision 236)

+++ configure.ac        (working copy)

@@ -23,6 +23,7 @@

   i?86-*-*) TARGET=X86; TARGETDIR=x86;;

   ia64*-*-*) TARGET=IA64; TARGETDIR=ia64;;

   powerpc*-*-linux* | powerpc-*-sysv*) TARGET=POWERPC; TARGETDIR=powerpc;;

+  arm*-*-linux*) TARGET=ARM; TARGETDIR=arm;;

   powerpc-*-beos*) TARGET=POWERPC; TARGETDIR=powerpc;;

   powerpc-*-darwin*) TARGET=POWERPC_DARWIN; TARGETDIR=powerpc;;

   powerpc-*-aix* | rs6000-*-aix*) TARGET=POWERPC_AIX; TARGETDIR=powerpc;;

@@ -41,6 +42,7 @@

 AM_CONDITIONAL(X86, test x$TARGET = xX86)

 AM_CONDITIONAL(IA64, test x$TARGET = xIA64)

 AM_CONDITIONAL(POWERPC, test x$TARGET = xPOWERPC)

+AM_CONDITIONAL(POWERPC, test x$TARGET = xARM)

 AM_CONDITIONAL(POWERPC_AIX, test x$TARGET = xPOWERPC_AIX)

 AM_CONDITIONAL(POWERPC_DARWIN, test x$TARGET = xPOWERPC_DARWIN)

 AM_CONDITIONAL(POWERPC_FREEBSD, test x$TARGET = xPOWERPC_FREEBSD)

@@ -74,7 +76,7 @@

 #AC_PROG_LIBTOOL





-CXXFLAGS="-Werror -Winline -Wall  --param inline-unit-growth=200"

+CXXFLAGS="-Werror -Winline -Wall  --param inline-unit-growth=200 --param
max-inline-insns-single=1000"

 CXXFLAGS="$CXXFLAGS  -D__LINUX__  -DVERSION=$VERSION"





Index: src/Ticks.h

===================================================================

--- src/Ticks.h (revision 236)

+++ src/Ticks.h (working copy)

@@ -114,13 +114,22 @@

 tscval_t get_tsc_rate_per_second();

 static inline tscval_t gettimeoftsc()

 {

+       tscval_t tm;

+#if defined(__arm__) && defined(__linux__)

+        struct timespec ts;

+       clock_gettime(CLOCK_REALTIME, &ts);

+       tm = ts.tv_sec;

+       tm = tm * 1000000000ULL;

+       tm += ts.tv_nsec;

+#else

        register uint32_t upper_32, lower_32;

-

        // ReaD Time Stamp Counter (RDTCS)

        __asm__ __volatile__("rdtsc" : "=a" (lower_32), "=d" (upper_32));



        // Return to user

-       return (((tscval_t)upper_32) << 32) | lower_32;

+       tm= (((tscval_t)upper_32) << 32) | lower_32;

+#endif

+       return tm;

 }

Original comment by [email protected] on 7 Nov 2013 at 4:22

@GoogleCodeExporter
Copy link
Author

Thanks Vince,

We appreciate your contribution and we'll inspect your work soon.  Thanks for 
your comprehensive explanations. 

one fast comment at the moment, you wrote "svn co ../trunk".  I hope you aware 
that the latest version of sockperf is not in the trunk, but in 
branches/sockperf_v2 and you meant for that.

you can see that described in 
https://code.google.com/p/sockperf/source/checkout 

(additional fast comment, I prefer that  --param max-inline-insns-single=1000, 
we'll only be applied for ARM and not for other environments.  Hence, leaving 
the optimizer the freedom to do its best job where possible).

Original comment by [email protected] on 7 Nov 2013 at 7:43

@GoogleCodeExporter
Copy link
Author

Thank you for the comments. I did use ../trunk and not the sockperf_v2
branch. I missed the point on the reference, assuming latest code was on
the trunk.

I'll look at addressing the inline parameter for an ARM specific compile
and create a patch for the /sockperf_v2 branch.

My apologies for missing those points. Please look forward to another
posting in a day or two.

Vince

Original comment by [email protected] on 7 Nov 2013 at 9:49

@GoogleCodeExporter
Copy link
Author

Please abandon my previous submission. I'll address your comments and provide 
another submission. Thank you for your comments and patience. 

Vince

Original comment by [email protected] on 7 Nov 2013 at 9:59

@GoogleCodeExporter
Copy link
Author

I've completed my changes on the sockperf_v2 branch and testing. I was
able to avoid using the --param max-inline-insns-single option on the
sockperf_v2 branch. My testing involved running sockperf from an ARM-based
system as a client and server to an x86 system running as a client and
server. Binaries used on both sides were compiled from my modified source.
Changes are described below, along with how to build for ARM, and an
attached patch file. The default use of rdtsc for ARM is same as using
clock_gettime(). I decided to leave that as a documented limitation for ARM
since someone may decide to address that for a particular ARM processor at
some future date. I'm not able to address that at this time with the
hardware I have available.
Please let me know if there's anything else I can to facilitate acceptance
of this patch onto the project trunk.
I pulled Sockperf from

svn checkout *http://sockperf.googlecode.com/svn/branches/sockperf_v2/
<http://sockperf.googlecode.com/svn/branches/sockperf_v2/> *sockperfv2

Changes to Sockperf Source

I kept the source changes to a minimum. Only the things that absolutely had
to change in order to cross compile for ARM or compile to x86 were changed.
The patch is attached (created by svn diff > arm-patch.diff), and the
changes described below.

1)      Added entries in configure.ac to support an ARM Linux target
compilation.

2)      os_gettimeoftsc() in ticks_os.h was modified to conditionally
compile use of clock_gettime in the case of ARM, and use rdtsc() otherwise
since ARM does not not ubiquitously and uniformly support an equivalent
rdtsc() instruction.

To build for ARM

1)      Define CROSS_COMPILE in the environment to point to the cross
compilation tools. In the case tested, CROSS_COMPILE was set as follows�

set
CROSS_COMPILE=/opt/gcc-linaro-arm-linux-gnueabihf-4.7-2012.11-20121123_linux/bin
/arm-linux-gnueabihf-

2)      Use ./autogen.sh to create the configure script.

3)      Invoke ./configure with the following options

$ ./configure CXX=${CROSS_COMPILE}g++ STRIP=${CROSS_COMPILE}strip
LD=${CROSS_COMPILE}ld CC=${CROSS_COMPILE}gcc --host i386



4)      Invoke make

5)      sockperf for ARM is located at ./src/sockperf

The same procedure can then be used, and switch between building for x86
and ARM.

1)      Use ./autogen.sh to create the configure script

2)      Invoke ./configure with no options to create the Makefile

3)      Invoke make
4)      sockperf for x86 is located at ./src/sockperf
Inline patch follows, and is attached ....


Index: configure.ac
===================================================================
--- configure.ac    (revision 236)
+++ configure.ac    (working copy)
@@ -23,6 +23,7 @@
   i?86-*-*) TARGET=X86; TARGETDIR=x86;;
   ia64*-*-*) TARGET=IA64; TARGETDIR=ia64;;
   powerpc*-*-linux* | powerpc-*-sysv*) TARGET=POWERPC; TARGETDIR=powerpc;;
+  arm*-*-linux*) TARGET=ARM; TARGETDIR=arm;;
   powerpc-*-beos*) TARGET=POWERPC; TARGETDIR=powerpc;;
   powerpc-*-darwin*) TARGET=POWERPC_DARWIN; TARGETDIR=powerpc;;
   powerpc-*-aix* | rs6000-*-aix*) TARGET=POWERPC_AIX; TARGETDIR=powerpc;;
@@ -41,6 +42,7 @@
 AM_CONDITIONAL(X86, test x$TARGET = xX86)
 AM_CONDITIONAL(IA64, test x$TARGET = xIA64)
 AM_CONDITIONAL(POWERPC, test x$TARGET = xPOWERPC)
+AM_CONDITIONAL(ARM, test x$TARGET = xARM)
 AM_CONDITIONAL(POWERPC_AIX, test x$TARGET = xPOWERPC_AIX)
 AM_CONDITIONAL(POWERPC_DARWIN, test x$TARGET = xPOWERPC_DARWIN)
 AM_CONDITIONAL(POWERPC_FREEBSD, test x$TARGET = xPOWERPC_FREEBSD)
Index: src/ticks_os.h
===================================================================
--- src/ticks_os.h  (revision 236)
+++ src/ticks_os.h  (working copy)
@@ -87,6 +87,11 @@
        lp.QuadPart = (LONGLONG) ((lp.QuadPart) / (PCFreq));
        return (ticks_t)(lp.QuadPart);
 #else
+#if defined(__arm__) && defined(__linux__)
+        struct timespec ts;
+   clock_gettime(CLOCK_REALTIME, &ts);
+        return timespec2nsec(ts);
+#else
    register uint32_t upper_32, lower_32;
    // ReaD Time Stamp Counter (RDTCS)
@@ -95,6 +100,7 @@
    // Return to user
    return (((ticks_t)upper_32) << 32) | lower_32;
 #endif
+#endif
 }
 inline void os_ts_gettimeofclock(struct timespec *pts)   {

Original comment by [email protected] on 7 Nov 2013 at 10:57

@GoogleCodeExporter
Copy link
Author

looks good.  Thanks Vince.

One minor thing, since you don't support rdtsc or any other high precision 
clock in ARM, I think you shouldn't return clock_gettime when user expects 
rdtsc.  Instead abort with error if rdtsc is requested for ARM (till 
you/someone decide to support it).
I recommend for you to change command-line default *for ARM only* to consider 
--no-rdtsc is on by default.
( Also, please notice that CLOCK_MONOTONIC seems to be better than 
CLOCK_REALTIME 
http://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and
-clock-monotonic )

I think we don't have any additional comments, and with this change will take 
your patch.

Original comment by [email protected] on 10 Nov 2013 at 9:11

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Here's the next submission addressing your latest comments. Note that I had to 
initialize upper_32 and lower_32 to something in os_gettimeoftsc(), and avoid 
compilation of the x86 specific rdtsc() instruction for an ARM target. I force 
the user of an ARM target compilation to specific --no-rdtsc, these seemed like 
the least intrusive modifications. I've tested this on x86 and ARM Linux. 

Thanks - Vince

$ svn diff
Index: SockPerf.cpp
===================================================================
--- SockPerf.cpp    (revision 236)
+++ SockPerf.cpp    (working copy)
@@ -1928,6 +1928,12 @@
        if ( !rc && aopt_check(common_obj, OPT_NO_RDTSC) ) {
            s_user_params.b_no_rdtsc = true;
        }
+#if defined (__arm__)  
+       if (s_user_params.b_no_rdtsc == false) {
+           log_msg("ARM target build does not support rdtsc, use --no-rdtsc");
+           rc = SOCKPERF_ERR_BAD_ARGUMENT;
+       }
+#endif

        if ( !rc && aopt_check(common_obj, OPT_LOAD_VMA) ) {
            const char* optarg = aopt_value(common_obj, OPT_LOAD_VMA);
Index: ticks_os.h
===================================================================
--- ticks_os.h  (revision 236)
+++ ticks_os.h  (working copy)
@@ -89,9 +89,15 @@
 #else
    register uint32_t upper_32, lower_32;

+#if defined(__arm__)
+   // so the compiler will not complain. for
+   // arm compile, this inline is not used 
+   // since no rdtsc supported on most arm processors
+   upper_32 = lower_32 = 0;
+#else
    // ReaD Time Stamp Counter (RDTCS)
    __asm__ __volatile__("rdtsc" : "=a" (lower_32), "=d" (upper_32));
-
+#endif
    // Return to user
    return (((ticks_t)upper_32) << 32) | lower_32;
 #endif


Original comment by [email protected] on 25 Nov 2013 at 12:32

Attachments:

@GoogleCodeExporter
Copy link
Author

Thanks Vince,

I saw you decided you to leave the default value for --no-rdtsc without change 
even for for ARM and you require ARM users to explicitly specify --no-rdtsc in 
command line.  

I am leaving this decision for you.  We'll take your patch as is.
Cheers,
  Avner

Original comment by [email protected] on 25 Nov 2013 at 1:45

@GoogleCodeExporter
Copy link
Author

Hi Vince,

I submitted your patch (http://code.google.com/p/sockperf/source/detail?r=237).

Please verify latest version is compiling for you on ARM target.

Thanks!

Original comment by [email protected] on 25 Nov 2013 at 2:39

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Looks like a throw() was added to the patch I submitted this morning to the 
function os_gettimeoftsc(). This is causing the program to abort on arm with a 
statement to stderr

terminate called after throwing an instance of 'char const*'
Aborted

This occurs if I use the --no-rdtsc option or not. This means os_gettimeoftsc() 
is being used regardless of the command line options selected, at least during 
program startup. Is this what's intended?

If you remove the throw that was added, the patch that made it to the branch 
should be fine. 

Vince

Original comment by [email protected] on 25 Nov 2013 at 3:28

@GoogleCodeExporter
Copy link
Author

Does this also support aarch64 ( arm64)

Original comment by [email protected] on 25 Nov 2013 at 3:36

@GoogleCodeExporter
Copy link
Author

@Vince
The question is do you want to stay with calls to os_gettimeoftsc() under ARM 
even though ARM does not support reading TCS register?
I would expect that you remove any left call in the first place.  Perhaps it is 
good that the throw that Or added catch such things.  It might be good idea to 
grep for all these calls in the code and make sure you are safe with all of 
them (or maybe #ifdef them out).  What do you think?
Avner

Original comment by [email protected] on 25 Nov 2013 at 3:57

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 25 Nov 2013 at 4:02

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

No I do not want to stay with calls to os_gettimeoftsc() in the ARM case,
and I would expect that no calls to os_gettimeoftsc() would be made if I
specified the --no-rdtsc option on the command line, but that does not seem
to be the case. I would also expect that if a throw statement was added to
the code that a useful error message would be displayed instead of what I
reported. Seems there are issues in the code that assume the use of rdtsc.
I can sort through all of that, but it will take some more time. I was
trying to stay with the least intrusive changes to the code, and support
ARM.

The patch as submitted did not permit sockperf to run if the user was
attempting to use rdtsc for timing, and did not throw an exception and
abort with a cryptic error message. There is a conditional check in the
routine that checks the default and command line options. If the user
implicitly selects the use of rdtsc for the ARM target, it aborts and
prints a message stating the user should add the --no-rdtsc option, which
is how I interpreted comment #12.

I suggest taking out the throw statement that was added to the patch I
submitted since it doesn't produce a useful error message.

Vince

Original comment by [email protected] on 25 Nov 2013 at 4:10

@GoogleCodeExporter
Copy link
Author

To Chentanho, comment #18 - I've not been able to test on an ARM64 system
(yet - I don't have one). There's nothing that I'm aware of that prevents
this code from compiling to an ARM64 target architecture.

Vince

Original comment by [email protected] on 25 Nov 2013 at 4:13

@GoogleCodeExporter
Copy link
Author

Vince,

The --no-rdtsc flag was not designed for ARM, but only for systems that *do not 
trust* RDTSC sync across cores/threads.

The --no-rdtsc command line argument only tells sockperf not to use RDTSC for 
calculating timing of packets.  A programmer is still allowed to call this 
method for other needs.  When you return zero in this method it is not safe and 
could yield division by zero, or other errors, in situations that you didn't 
test.  Hence, throwing an exception is a good programming practice that helped 
you catch the error immediately before your customer encounter it.

Anyhow, I understood Or ifdef'ed out the last call to this method before our 
comments.  Hence, you can test it again.

Last thing, if you don't like sockperf error reporting, then decide by yourself 
what is the way you choose to deal with it.

Original comment by [email protected] on 25 Nov 2013 at 4:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant