Replies: 2 comments 4 replies
-
"And for future steps, could you please provide me with a complicated example of a controller,PM,Graph triple? As a complex testcase." For your evaluation, you should be able to find the studies (graphml) and programs (python) in https://github.com/ControlCore-Project/concore/tree/main/demo If you are looking for something more complicated, https://github.com/ControlCore-Project/concore/tree/main/gi and https://github.com/ControlCore-Project/concore/tree/main/ratc should provide some. You should look more into the current C++ implementation https://github.com/ControlCore-Project/concore/blob/main/concore.hpp and compare against the Python implementation (which is used more as the standard). A Verilog implementation is at https://github.com/ControlCore-Project/concore/blob/main/concore.v When you write your proposal, please make sure to include a link to this discussion thread in your proposal. |
Beta Was this translation helpful? Give feedback.
-
Hmm, I thought my proposal was quite detailed but oh well it was rejected ... I hope it was because another contributor made a better proposal, in that case please share my proposal with them so that they might benefit from it - otherwise I can't really understand the reason of rejection and I'll be applying again the next year? |
Beta Was this translation helpful? Give feedback.
-
@pradeeban Thank you for your work on this excellent project! I'm an applicant for GSOC 2023 and here are some key points for a draft of a real-time reimplementation in C++. Here, I should post what I think would be useful for someone working on this project.
The way I plan to implement is this:
The Concore compiler wraps files supplied by in the graph file with a namespace. These are compiled as a single file with only an entry point, leaving the rest to the compiler to optimize. As well as these, a concore cpp file corresponding to concore API is included. Tasks are run with C++ threads, and since we compiled it as a single file, this is a single process with multiple threads, hence inter-thread communication is simply through mutexes and this is efficient.
The first thing I plan to do is to create a documentation section for readthedocs.io for every member in the Python API. This will benefit users of this library, as well as helping me see if I understood what every function does correctly.
As you've noted, we need to decide on how to handle concurrency. There are four candidates: C++ threads, pthreads, OpenMP, Intel Thread Building Blocks. What I suggest is C++ threads, and my plan is to implement in C++14. I believe that compilers supporting this are highly accessible. What's great about C++ threads is that it's then compatible with Windows as well.
The other alternative is pthreads, which are native Unix threads. It can work on Windows through very hacky ways, such as a library called "pthreads-win32". I believe that this is not necessary.
We can also use both C++ threads and pthreads, chosen at compile time, but this increases the maintainability problems since the code should be tested twice and it would not be readable. Anyway, I think that C++ threads will be efficient.
As for OpenMP, I've seen it used in mostly scientific projects, but I'm not familiar with it. It scares me a little that it's an abuse of preprocessor but there are some performance comparisons in favor of OpenMP. I'm not sure if these are still valid today. For this project, we need more complex patterns, especially heterogeneous, than simply parallelizing a loop and if we consider inter-thread communication, OpenMP becomes tedious.
There are some key points about maximizing performance. Firstly, I believe that we should disable RunTime Type Information and Exceptions, because these increase overhead with no benefit to the capabilities of the program. For this,
-fno-rtti -fno-exceptions
would work. I heard that the overhead of these when not used is negligible, but disabling these does protect the project from contributions that might use these in the future.Secondly, we can use something called "amalgamated build", which is telling the compiler to build the project as a single file. It's
-fwhole-program
or-fwhole-program-vtables
with-flto
. This is what I referred to earlier as building as a single process.Thirdly, we can eliminate dynamic library calls with
-static-libgcc -static-libstdc++
or-static
Fourthly, if we want to optimize for the current machine,
-mtune=native
works well.One can also consider by profiling if
-march=native
increases speed.Fifthly, we aren't going to use STDIN for input, so disabling I/O synchronization of C++ significantly optimizes printing the results. But should we use C-style
printf
? I think, yes. Multithreaded printing can cause problems and it would be the most performant toprintf_unlocked
where available, etc used with custom mutex.The way compiler optimization should be implemented is by using CMake in my opinion. This is an extensible way of interacting with the compiler and it allows us to check what the compiler supports and warn the user if the compiler is too old, etc. It does not increase the burden on the user as one now just needs to do
cmake .
thenmake
.--
And for future steps, could you please provide me with a complicated example of a controller,PM,Graph triple? As a complex testcase.
Beta Was this translation helpful? Give feedback.
All reactions