Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory pool and register memory regions only once. #3

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

asqasq
Copy link
Member

@asqasq asqasq commented Nov 28, 2017

This pull request contains changes to improve the number of IOPS:

  • Register memory regions only once with the RDMA NIC (save resources and time on the hardware)
  • Add a memory pool out of which send and receive buffers can be allocated
  • Allow allocations either from the regular heap or from hugepages

In addition

  • Send and receive pipeline length are now independent.
  • Version increased to 1.4, because client and server endpoint group has a different signature

Note: This PR and the corresponding Crail PR have to merged together

Copy link
Contributor

@PepperJo PepperJo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mempool seems complicated. Can we find a simpler solution or some library to handle this?


/* Receive memory region is the first half of the main buffer. */
dataBuffer.limit(dataBuffer.position() + sendBufferOffset);
receiveBuffer = dataBuffer.slice();

/* Send memory region is the second half of the main buffer. */
dataBuffer.position(sendBufferOffset);
dataBuffer.limit(dataBuffer.position() + sendBufferOffset);
dataBuffer.limit(dataBuffer.capacity());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dataBuffer.capacity always == position + sendBufferOffset? Or could getWRBuffer return a buffer with higher capacity and just limit set to something lower

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the same.

@@ -56,7 +56,8 @@
private ConcurrentHashMap<Integer, SVCPostSend> pendingPostSend;
private ArrayBlockingQueue<SVCPostSend> freePostSend;
private AtomicLong ticketCount;
private int pipelineLength;
private int sendPipelineLength;
private int recvPipelineLength;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like most of the private variables are set in the constructor. Can we make them final?

@@ -14,6 +14,7 @@
private static final Logger logger = LoggerFactory.getLogger("com.ibm.darpc");

private DaRPCServerGroup<R, T> group;
private int eventPoolSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can we make these private variables final wherever applicable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final int defaultAllocationSize = 16 * 1024 * 1024; // 16MB
final int defaultMinAllocationSize = 4 * 1024; // 4KB
final int defaultAlignmentSize = 4 * 1024; // 4KB
final int defaultHugePageLimit = 0; // no huge pages by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the default arguments static

protected int allocationSize;
protected int minAllocationSize;
protected int alignmentSize;
protected int hugePageLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

if (isOpen) {
isOpen = false;
pdMemPool = null;
for (Iterator<IbvMr> it = mrs.iterator(); it.hasNext(); ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use for(IbvMr mr : mrs) { }

MappedByteBuffer mappedBuffer = null;
try {
mappedBuffer = channel.map(MapMode.READ_WRITE, 0,
allocationSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about alignment here? We only know that it is at least hugepage aligned but what if the user wants a bigger alignment requirement?

final int defaultHugePageLimit = 0; // no huge pages by default

private HashMap<IbvPd, PdMemPool> pdMemPool; // One buddy allocator per protection domain
private LinkedList<IbvMr> mrs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer ArrayList over LinkedList and you can use the List interface here and choose your implementation in the constructor

/* Only do one memory registration with the IB card. */
dataMr = registerMemory(dataBuffer).execute().free().getMr();
try {
dataBuffer = rpcGroup.getWRBuffer(this, sendPipelineLength * rawBufferSize + recvPipelineLength * rawBufferSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Why make the mempool so complicated but than only get one big chunk out of it and slice them yourself?

Copy link
Contributor

@PepperJo PepperJo Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get that each connection might have different send/recv queue size, but at least in the group they always have the same so it might make sense to push the mempool instanciation to the group and simplify it. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the idea behind this implementation is that the memory pool registers chunks of memory with the card end every endpoint/every QP allocates a pieces of this already registered memory chunk.

The reason, why it is not instantiated in the group is to make the application (Crail) able to read properties, instantiate and "configure" the mempool and then pass to the DaRPC library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this allocator. Instead we use a simple one with fixed blocksize.

pdm.freeBuddies.get(size >> 1).add(bi2);

return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks quite complicated. See comment above.

@patrickstuedi patrickstuedi self-assigned this Nov 28, 2017

public static int getVersion(){
return DARPC_VERSION;
}

protected DaRPCEndpointGroup(DaRPCProtocol<R,T> protocol, int timeout, int maxinline, int recvQueue, int sendQueue) throws Exception {
protected DaRPCEndpointGroup(DaRPCProtocol<R,T> protocol, DaRPCMemPool memPool, int timeout, int maxinline, int recvQueue, int sendQueue) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part I'm probably missing here, but why don't we have a simple interface defining a DaRPCMemPool, and then have applications pass any implementation of that interface while also providing a default implementation within DaRPC. Currently it looks like DaRPCMemPool is both the new MemPool type and the only implementation. Or I'm I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I changed it. There is an interface now and the application can instantiate a specific implementation of this interface.

Copy link
Member

@patrickstuedi patrickstuedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of decoupling interface and implementation for the MemPool. Then, making the default implementation simple and tailored to the existing use cases also might make sense.

@asqasq
Copy link
Member Author

asqasq commented Dec 13, 2017

DaRPC uses now a very simple memory pool, which implements a simple interface.

Most of the comments were about the memory allocator. I fixed most of them, but finally removed the memory allocator.

Please have a look at the newest version.

Copy link
Contributor

@PepperJo PepperJo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things otherwise looks good!

this.alignmentSize = 0;
this.hugePagePath = hugePagePath;

this.access = IbvMr.IBV_ACCESS_LOCAL_WRITE | IbvMr.IBV_ACCESS_REMOTE_WRITE | IbvMr.IBV_ACCESS_REMOTE_READ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call the other constructor with this(hugePagePath, defaultAllocationSize, 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

if (hugePagePath == null) {
System.out.println("Hugepage path must be set");
throw new IOException("Hugepage path must be set");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check this in the constructor and e.g. throw an IllegalArgumentException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

throw new IOException("Requested size does not match block size managed by memory pool.");
}
}
r = freeList.removeFirst();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws a NoSuchElementException if the freeList is empty: https://docs.oracle.com/javase/7/docs/api/java/util/LinkedList.html#removeFirst()
This is not obvious from the interface description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Changed interface showing that it throws a NoSuchElementException.

…onstructors and throw NoSuchElementException, if no more memory blocks are available.
@asqasq
Copy link
Member Author

asqasq commented Dec 14, 2017

Addressed Jonas' comments and uploaded new version of the code to the PR. Please have a look. Thanks Adrian

Copy link
Contributor

@PepperJo PepperJo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Member

@patrickstuedi patrickstuedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MemPoolImpl write to stdout? All DaRPC code should only write to the logger, I think.

Copy link
Contributor

@PepperJo PepperJo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we are moving in the right direction. I just have some minor comments about the design.

private ConcurrentHashMap<Long, IbvMr> memoryRegions;
private int access;
private DaRPCEndpointGroup<E,R,T> endpointGroup;
private ConcurrentHashMap<IbvPd, LinkedBlockingQueue<ByteBuffer>> pdMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should pull this out and have a per PD memory pool in the endpoint group. And use a factory to create the memory pools whenever we need to create a new PD. This also simplifies the interface since you do not need to pass the endpoint to the memory pool.

}

@Override
public ByteBuffer getBuffer(RdmaEndpoint endpoint) throws IOException, NoSuchElementException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a new buffer type with the following methods added compared to the ByteBuffer: getRemoteKey(), getLocalKey() and a free() method which allows to return the buffer to the pool. This way we don't need to use hashmaps to find buffers when we free them and there is no uncertainty to which pool a buffer belongs to. I understand that this has some memory overhead but I believe that it outweighs the above.

+ ", alignmentSize = " + alignmentSize
+ ", currentAllocationSize = " + currentAllocationSize
+ ", allocationLimit = " + allocationLimit);
throw new IOException("Out of memory. Cannot allocate more buffers from hugepages."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same string twice -> put it in variable

} catch (IOException e) {
logger.error("Could not set allocation length of mapped random access file on huge page directory.");
logger.error("allocaiton size = " + allocationSize + " , alignment size = " + alignmentSize);
logger.error("allocation size and alignment must be a multiple of the hugepage size.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple calls to the logger or append everything like above? I think we should be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants