-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes required to run SIMX on HPCAC #71
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,6 +61,14 @@ RUN \ | |
unzip \ | ||
valgrind \ | ||
wget \ | ||
autoconf \ | ||
automake \ | ||
libtool \ | ||
g++ \ | ||
vim \ | ||
iperf \ | ||
crash \ | ||
zstd \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the development convenience. Maybe it'd be cool to allow user the ability to provide his own docker file that will incrementally append needed things to the already existing image, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was always in back of my mind, but didn't investigate how to do it without rebuilding all images. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you just apply something on top of the existing image? |
||
&& dnf clean dbcache packages | ||
|
||
COPY --from=rpms /opt/rpms /opt/rpms | ||
|
@@ -69,4 +77,6 @@ ADD sshd_config ssh_host_rsa_key /etc/ssh/ | |
|
||
ADD basic-setup.sh kvm-setup.sh /root/ | ||
|
||
RUN /root/basic-setup.sh && /root/kvm-setup.sh | ||
RUN /root/basic-setup.sh | ||
|
||
RUN /root/kvm-setup.sh | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can be ignored |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
#!/bin/bash | ||
# --- | ||
# git_url: http://l-gerrit.mtl.labs.mlnx:8080/simx | ||
# git_commit: 41f602dc05b3c115b176ac3f7869e8bd390cbd92 | ||
# git_url: /global/home/users/ztiffany/test/simx | ||
# git_commit: 3f3c2c9338f3bbb73cf3bd298152e020e394086f | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can be ignored |
||
|
||
cat <<EOF > mlx-simx.spec | ||
%global debug_package %{nil} | ||
|
@@ -18,7 +18,7 @@ From simx.git | |
%build | ||
./mlnx_infra/config.status.mlnx --target=x86 --prefix=/opt/simx | ||
make %{?_smp_mflags} | ||
make %{?_smp_mflags} -C mellanox/ | ||
make %{?_smp_mflags} -C mellanox/ SIMX_PROJECT=mlx5 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tells SimX to only build the NIC part. I think it makes sense unless the switch part is planned to be used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it makes sense for now. A long time ago, I pitched this project to switch team, they even tried it, but decided to stick with VMs because of differences in technical level expertise between development team and verification team. |
||
|
||
#%install | ||
make DESTDIR=%{buildroot} install | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
#!/bin/bash | ||
# --- | ||
# git_url: git://repo.or.cz/smatch.git | ||
# git_commit: 9bb66fa2d7c73b3338a27fd6b38d7d509b2a1c1b | ||
# git_url: /global/home/users/artemp/scratch/.cache/mellanox/mkt/smatch.git | ||
# git_commit: 72c21a144a812cadbe349801da1b24bc331af256 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some reason, the site where we were building this can't access the original URL. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC if you preload the normal cache directory it doesn't require network access so long as the commit_id is already present. So these weird disconnected cases are solved by transfering the cache directory from some network connected machine There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is normal cache directory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ➜ kernel git:(master) ls ~/.cache/mellanox/mkt There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That’s the issue, it fails to download there from got saying connection refused. we just haven’t cleaned the version we ended up with for the sake of time consumption. |
||
|
||
cat <<EOF > smatch.spec | ||
Name: smatch | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
#!/bin/bash | ||
# --- | ||
# git_url: git://git.kernel.org/pub/scm/devel/sparse/sparse.git | ||
# git_commit: 8af2432923486c753ab52cae70b94ee684121080 | ||
# git_url: /global/home/users/artemp/scratch/.cache/mellanox/mkt/sparse.git | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above |
||
# git_commit: 49c98aa3ed1b315ed2f4fbe44271ecd5bdd9cbc7 | ||
|
||
cat <<EOF > sparse.spec | ||
Name: sparse | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,12 @@ def make_simx(args): | |
|
||
subprocess.call(cmd + ['-j%d' %(args.num_jobs)]) | ||
|
||
def make_rdmo_app(args): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started throwing in some stuff to make MKT build my rdmo app. I abandoned that, though. Ignore references to rdmo-app and the packages added to support.Dockerfile. I added packages to the VM image to build rdmo-app inside my VM instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Building inside VM looks simpler, but it misses the MKT concept. We wanted to separate build environment from run environment. It allows us to enjoy from specific optimizations and makes run fast. |
||
if args.clean: | ||
subprocess.check_output(['rm', '-rf', 'build']) | ||
return | ||
subprocess.call(['./build.sh']) | ||
|
||
def switch_to_user(args): | ||
with open("/etc/passwd","a") as F: | ||
F.write(args.passwd + "\n"); | ||
|
@@ -79,9 +85,13 @@ def setup_from_pickle(args, pickle_params): | |
subprocess.check_output(['make', 'headers_install', | ||
'INSTALL_HDR_PATH=/usr'], cwd=args.kernel) | ||
|
||
if not os.path.isdir('/images/ztiffany/ccache'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is likely not needed based on my later experience |
||
subprocess.check_output(['mkdir', '/images/ztiffany/ccache']) | ||
subprocess.check_output(['chmod', '0777', '/images/ztiffany/ccache']) | ||
|
||
switch_to_user(args) | ||
if os.path.isdir('/ccache'): | ||
os.environ['CCACHE_DIR'] = '/ccache' | ||
if os.path.isdir('/images/ztiffany/ccache'): | ||
os.environ['CCACHE_DIR'] = '/images/ztiffany/ccache' | ||
|
||
if args.shell: | ||
os.execvp('/bin/bash', ['/bin/bash']) | ||
|
@@ -97,3 +107,5 @@ def setup_from_pickle(args, pickle_params): | |
make_rdma_core(args) | ||
if args.project == "simx": | ||
make_simx(args) | ||
if args.project == "rdmo-app": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ignore this. |
||
make_rdmo_app(args) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,11 +64,17 @@ def remove_mounts(): | |
|
||
|
||
def is_passable_mount(v): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On HPCAC, this was needed to get the rdma-core directory passed through:
Is this expected? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, it means your config file is incomplete or another bug, we mount whole src directory There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be because /images is on tmpfs, but I don’t think we saw even an attempt to mount it |
||
print ("Checking mount: {}".format(v)) | ||
if v[2] == "nfs" or v[2] == "nfs4": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Permission denied" - let's debug, it shouldn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I am root on the node, I cannot LS my users home directory There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ls fails with Permission denied as well |
||
return False | ||
if v[1].startswith("/images/"): | ||
print ("YES!!!") | ||
return True | ||
if v[1].startswith("/plugins"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HPCAC nodes are diskless. Here is how plugins are mounted:
Here is from a working system:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, docker can't mount tmpfs, need to think about workaround There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does work if we add the above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I think it should work as is |
||
print ("YES!!!") | ||
return True | ||
if not v[0].startswith("/"): | ||
return False | ||
if v[1] == "/lab_tools": | ||
print ("NOT START WITH") | ||
return False | ||
return True | ||
|
||
|
@@ -106,8 +112,10 @@ def setup_fs(): | |
# Copy over local bind mounts, eg from docker -v | ||
cnt = 0 | ||
for dfn, v in get_mtab().items(): | ||
print ("Evaluating: {}".format(v[1])); | ||
if not is_passable_mount(v): | ||
continue | ||
print ("Passing: {}".format(v[1])); | ||
|
||
qemu_args["-fsdev"].add( | ||
"local,id=host_bind_fs%u,security_model=passthrough,path=%s" % | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,6 +67,7 @@ def run_ci_cmd(self, supos): | |
"rdma": "iproute2", | ||
"kernel": "kernel", | ||
"mlnx_infra": "simx", | ||
"rdmo-app": "rdmo-app", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ignore. |
||
} | ||
|
||
def build_list(): | ||
|
@@ -78,6 +79,7 @@ def set_args_project(args, section): | |
|
||
# "custom" project can't be sensed and must be provided explicitly | ||
for key, value in project_marks.items(): | ||
print("comparing {} and {}".format(key, args.project)) | ||
if os.path.isdir(key): | ||
args.project = value | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,7 @@ def get_cache_fn(fn): | |
an impact on the operation of mkt - at worst it will run slower.""" | ||
global cache_dir | ||
if cache_dir is None: | ||
cache_dir = os.path.expanduser("~/.cache/mellanox/mkt/") | ||
cache_dir = '/images/ztiffany/.cache/mellanox/mkt/' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the home dir is insufficient to hold these caches, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. .cache is general mechanism, it is worth to make symlink There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense |
||
# In MTL network, user home directories are located on /labhome | ||
# and doesn't have enough space to build cache efficiently. | ||
# Do nasty hack and replace labhome with swgwork | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c2f86ca