-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory pool and register memory regions only once. #3
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mempool seems complicated. Can we find a simpler solution or some library to handle this?
|
||
/* Receive memory region is the first half of the main buffer. */ | ||
dataBuffer.limit(dataBuffer.position() + sendBufferOffset); | ||
receiveBuffer = dataBuffer.slice(); | ||
|
||
/* Send memory region is the second half of the main buffer. */ | ||
dataBuffer.position(sendBufferOffset); | ||
dataBuffer.limit(dataBuffer.position() + sendBufferOffset); | ||
dataBuffer.limit(dataBuffer.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dataBuffer.capacity always == position + sendBufferOffset? Or could getWRBuffer return a buffer with higher capacity and just limit set to something lower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same.
@@ -56,7 +56,8 @@ | |||
private ConcurrentHashMap<Integer, SVCPostSend> pendingPostSend; | |||
private ArrayBlockingQueue<SVCPostSend> freePostSend; | |||
private AtomicLong ticketCount; | |||
private int pipelineLength; | |||
private int sendPipelineLength; | |||
private int recvPipelineLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like most of the private variables are set in the constructor. Can we make them final?
@@ -14,6 +14,7 @@ | |||
private static final Logger logger = LoggerFactory.getLogger("com.ibm.darpc"); | |||
|
|||
private DaRPCServerGroup<R, T> group; | |||
private int eventPoolSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can we make these private variables final wherever applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final int defaultAllocationSize = 16 * 1024 * 1024; // 16MB | ||
final int defaultMinAllocationSize = 4 * 1024; // 4KB | ||
final int defaultAlignmentSize = 4 * 1024; // 4KB | ||
final int defaultHugePageLimit = 0; // no huge pages by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the default arguments static
protected int allocationSize; | ||
protected int minAllocationSize; | ||
protected int alignmentSize; | ||
protected int hugePageLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
if (isOpen) { | ||
isOpen = false; | ||
pdMemPool = null; | ||
for (Iterator<IbvMr> it = mrs.iterator(); it.hasNext(); ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use for(IbvMr mr : mrs) { }
MappedByteBuffer mappedBuffer = null; | ||
try { | ||
mappedBuffer = channel.map(MapMode.READ_WRITE, 0, | ||
allocationSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about alignment here? We only know that it is at least hugepage aligned but what if the user wants a bigger alignment requirement?
final int defaultHugePageLimit = 0; // no huge pages by default | ||
|
||
private HashMap<IbvPd, PdMemPool> pdMemPool; // One buddy allocator per protection domain | ||
private LinkedList<IbvMr> mrs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer ArrayList over LinkedList and you can use the List interface here and choose your implementation in the constructor
/* Only do one memory registration with the IB card. */ | ||
dataMr = registerMemory(dataBuffer).execute().free().getMr(); | ||
try { | ||
dataBuffer = rpcGroup.getWRBuffer(this, sendPipelineLength * rawBufferSize + recvPipelineLength * rawBufferSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Why make the mempool so complicated but than only get one big chunk out of it and slice them yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I get that each connection might have different send/recv queue size, but at least in the group they always have the same so it might make sense to push the mempool instanciation to the group and simplify it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the idea behind this implementation is that the memory pool registers chunks of memory with the card end every endpoint/every QP allocates a pieces of this already registered memory chunk.
The reason, why it is not instantiated in the group is to make the application (Crail) able to read properties, instantiate and "configure" the mempool and then pass to the DaRPC library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this allocator. Instead we use a simple one with fixed blocksize.
pdm.freeBuddies.get(size >> 1).add(bi2); | ||
|
||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks quite complicated. See comment above.
|
||
public static int getVersion(){ | ||
return DARPC_VERSION; | ||
} | ||
|
||
protected DaRPCEndpointGroup(DaRPCProtocol<R,T> protocol, int timeout, int maxinline, int recvQueue, int sendQueue) throws Exception { | ||
protected DaRPCEndpointGroup(DaRPCProtocol<R,T> protocol, DaRPCMemPool memPool, int timeout, int maxinline, int recvQueue, int sendQueue) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part I'm probably missing here, but why don't we have a simple interface defining a DaRPCMemPool, and then have applications pass any implementation of that interface while also providing a default implementation within DaRPC. Currently it looks like DaRPCMemPool is both the new MemPool type and the only implementation. Or I'm I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I changed it. There is an interface now and the application can instantiate a specific implementation of this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of decoupling interface and implementation for the MemPool. Then, making the default implementation simple and tailored to the existing use cases also might make sense.
04e1ce7
to
cc896b6
Compare
DaRPC uses now a very simple memory pool, which implements a simple interface. Most of the comments were about the memory allocator. I fixed most of them, but finally removed the memory allocator. Please have a look at the newest version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things otherwise looks good!
this.alignmentSize = 0; | ||
this.hugePagePath = hugePagePath; | ||
|
||
this.access = IbvMr.IBV_ACCESS_LOCAL_WRITE | IbvMr.IBV_ACCESS_REMOTE_WRITE | IbvMr.IBV_ACCESS_REMOTE_READ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call the other constructor with this(hugePagePath, defaultAllocationSize, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
if (hugePagePath == null) { | ||
System.out.println("Hugepage path must be set"); | ||
throw new IOException("Hugepage path must be set"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check this in the constructor and e.g. throw an IllegalArgumentException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
throw new IOException("Requested size does not match block size managed by memory pool."); | ||
} | ||
} | ||
r = freeList.removeFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This throws a NoSuchElementException if the freeList is empty: https://docs.oracle.com/javase/7/docs/api/java/util/LinkedList.html#removeFirst()
This is not obvious from the interface description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Changed interface showing that it throws a NoSuchElementException.
…onstructors and throw NoSuchElementException, if no more memory blocks are available.
Addressed Jonas' comments and uploaded new version of the code to the PR. Please have a look. Thanks Adrian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does MemPoolImpl write to stdout? All DaRPC code should only write to the logger, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we are moving in the right direction. I just have some minor comments about the design.
private ConcurrentHashMap<Long, IbvMr> memoryRegions; | ||
private int access; | ||
private DaRPCEndpointGroup<E,R,T> endpointGroup; | ||
private ConcurrentHashMap<IbvPd, LinkedBlockingQueue<ByteBuffer>> pdMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should pull this out and have a per PD memory pool in the endpoint group. And use a factory to create the memory pools whenever we need to create a new PD. This also simplifies the interface since you do not need to pass the endpoint to the memory pool.
} | ||
|
||
@Override | ||
public ByteBuffer getBuffer(RdmaEndpoint endpoint) throws IOException, NoSuchElementException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a new buffer type with the following methods added compared to the ByteBuffer: getRemoteKey(), getLocalKey() and a free() method which allows to return the buffer to the pool. This way we don't need to use hashmaps to find buffers when we free them and there is no uncertainty to which pool a buffer belongs to. I understand that this has some memory overhead but I believe that it outweighs the above.
+ ", alignmentSize = " + alignmentSize | ||
+ ", currentAllocationSize = " + currentAllocationSize | ||
+ ", allocationLimit = " + allocationLimit); | ||
throw new IOException("Out of memory. Cannot allocate more buffers from hugepages." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same string twice -> put it in variable
} catch (IOException e) { | ||
logger.error("Could not set allocation length of mapped random access file on huge page directory."); | ||
logger.error("allocaiton size = " + allocationSize + " , alignment size = " + alignmentSize); | ||
logger.error("allocation size and alignment must be a multiple of the hugepage size."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple calls to the logger or append everything like above? I think we should be consistent.
This pull request contains changes to improve the number of IOPS:
In addition
Note: This PR and the corresponding Crail PR have to merged together