Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to host cs.k8s.io #2182

Open
ameukam opened this issue Mar 9, 2021 · 59 comments · Fixed by #3529 or #6657 · May be fixed by #7718
Open

Where to host cs.k8s.io #2182

ameukam opened this issue Mar 9, 2021 · 59 comments · Fixed by #3529 or #6657 · May be fixed by #7718
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@ameukam
Copy link
Member

ameukam commented Mar 9, 2021

https://cs.k8s.io is running on a baremetal server provided by Equinix Metal(ex Packet) under CNCF budget and operated until now by @dims.

The question was asked about whether or not we should host CodeSearch on aaa cluster.

Ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1615204807111900?thread_ts=1615189697.108500&cid=CCK68P2Q2

Issue open to track the discussions and the consensus about this.

@nikhita
Copy link
Member

nikhita commented Mar 9, 2021

@dims where is the original source code for cs.k8s.io? 👀

@nikhita
Copy link
Member

nikhita commented Mar 9, 2021

/wg k8s-infra

@ameukam
Copy link
Member Author

ameukam commented Mar 9, 2021

/sig contributor-experience
/priority backlog

/assign @spiffxp
cc @mrbobbytables @alisondy @cblecker @munnerz

@ameukam
Copy link
Member Author

ameukam commented Mar 9, 2021

@dims where is the original source code for cs.k8s.io? eyes

@nikhita You can find the config here https://github.com/dims/k8s-code.appspot.com/

@BenTheElder
Copy link
Member

What's the argument against hosting it on AAA?

@dims
Copy link
Member

dims commented Mar 11, 2021

@BenTheElder nothing other than someone has to do it :) oh, i don't know how to wire the ingress/dns stuff

i tried a long time ago :) #96

@ameukam
Copy link
Member Author

ameukam commented Mar 11, 2021

What's the argument against hosting it on AAA?

I would say lack of artifact destined for aaa (aka no up-to-date container image for hound). We could host the image on k8s-staging-infra-tools.

@nikhita
Copy link
Member

nikhita commented Mar 24, 2021

@ameukam should this issue be migrated to the k/k8s.io repo?

@ameukam
Copy link
Member Author

ameukam commented Mar 24, 2021

@nikhita I'm not sure about the right place of this issue. Just wanted to put this under SIG Contribex TLs and Chairs radar.

@BenTheElder
Copy link
Member

it should be under k/k8s.io imho. I think we should host it on AAA fwiw.

@nikhita
Copy link
Member

nikhita commented Jun 10, 2021

Moving to k8s.io repo. slack discussion - https://kubernetes.slack.com/archives/CCK68P2Q2/p1623300972130500

@nikhita nikhita transferred this issue from kubernetes/community Jun 10, 2021
@spiffxp
Copy link
Member

spiffxp commented Aug 9, 2021

/sig contributor-experience
/wg k8s-infra

@k8s-ci-robot k8s-ci-robot added sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. wg/k8s-infra labels Aug 9, 2021
@jimdaga
Copy link

jimdaga commented Aug 11, 2021

I took a stab at onboarding codesearch; @spiffxp could I get your input? I want to make sure I didn't miss anything.
I want to stage all the infra, and get it deployed via prow first. Then we can follow up with another PR to cut-over DNS when we are ready.

#2513
kubernetes/test-infra#23201

I could also work on adding the docker build logic after, but I haven't worked in that repo yet so I'll have to do some digging.

cc @dims

@spiffxp
Copy link
Member

spiffxp commented Aug 17, 2021

/priority important-soon
/milestone v1.23

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 17, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 17, 2021
@justaugustus
Copy link
Member

What about using https://sourcegraph.com/kubernetes to minimize the maintenance burden here?
This is something I suggested to @dims in the past, but didn't have the bandwidth to do at the time.

@dims
Copy link
Member

dims commented Aug 19, 2021

choices are:

  1. leave things where they are
  2. move to k8s wg infra
  3. redirect to cs.k8s.io to sourcegraph
  • i have been taking care of 1 already for a while with minimal downtime, so i am ok with continuing to do so
  • if someone wants to do 2, i am happy to work with help, show how things are setup and we can shut down the equinix vm
  • i personally don't like option 3, i love the the hound UX, if the consensus is we should go with 3, that is fine with me. I am happy to run a personal instance on a custom domain for myself (community is welcome to use)

if i missed any other options, please feel free to chime in.

@spiffxp
Copy link
Member

spiffxp commented Sep 17, 2021

/unassign

@ameukam
Copy link
Member Author

ameukam commented Apr 18, 2024

/assign @SohamChakraborty

@k8s-ci-robot
Copy link
Contributor

@ameukam: GitHub didn't allow me to assign the following users: SohamChakraborty.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @SohamChakraborty

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SohamChakraborty
Copy link
Contributor

I think this is now ready for migration from the bare metal server to aaa cluster. I spoke with Arnaud and he will decide on a path for migration.

@BenTheElder
Copy link
Member

We will need to raise this to
/priority important-soon
Given https://www.theregister.com/2024/11/18/equinix_ends_metal_iaas/

We have time, and other pressing work, but we need to be tracking moving it.

IMHO: while there are other options out there, we have a lot of links to this in issues etc, if it's cheap to operate I think we should keep it for now and just lift it over to AAA cluster or similar.

What's left for this one?

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 3, 2024
@BenTheElder
Copy link
Member

https://cs-canary.k8s.io/ seems to have performance issues but also maybe isn't running the same version:

https://cs.k8s.io/?q=NodeStageVolume&i=nope&files=&excludeFiles=&repos=
Screenshot 2024-12-03 at 3 24 56 PM

https://cs-canary.k8s.io/?q=NodeStageVolume&i=nope&files=&excludeFiles=&repos=
Screenshot 2024-12-03 at 3 24 59 PM

(note the response times)

@ameukam
Copy link
Member Author

ameukam commented Dec 4, 2024

Maybe run this inside a GCP MIG or a AWS ASG ?

@vielmetti
Copy link

Re-upping this as it came up in conversation today on the sig-infra call and because I am actively working on timelines and budgets for the Equinix Metal wind-down.

@BenTheElder
Copy link
Member

We should check if performance is disk, CPU or memory, we have options (like switching AAA to larger nodes, using a faster disk type, etc) but that needs investigating.

@BenTheElder
Copy link
Member

And then we should just rotate cs.k8s.io to point to the deployment at cs-canary.k8s.io and wind down the equinix machine.

@BenTheElder
Copy link
Member

I tried the test in #2182 (comment) again, this time cs-canary.k8s.io gave an error page eventually.

Looking at the pod logs (it is in aaa cluster in kubernetes-public GCP project) there are a lot of git fetch errors, and one replica is not ready:

2025-01-16 10:07:34.137 PST
Continuing...
2025-01-16 10:07:34.139 PST
2025/01/16 18:07:34 Failed to git reset [git reset --hard origin/main] at "/data/data/vcs-b90b1681de4da9db8d670ab238c7ed231fc1b8a2", see output below
2025-01-16 10:07:34.139 PST
fatal: ambiguous argument 'origin/main': unknown revision or path not in the working tree.
2025-01-16 10:07:34.139 PST
Use '--' to separate paths from revisions, like this:
2025-01-16 10:07:34.139 PST
'git <command> [<revision>...] -- [<file>...]'
2025-01-16 10:07:34.139 PST
: exit status 128
2025-01-16 10:07:34.139 PST
Continuing...
2025-01-16 10:07:34.994 PST
2025/01/16 18:07:34 Failed to git fetch [git fetch --prune --no-tags --depth 1 origin +main:remotes/origin/main] at "/data/data/vcs-191ecf511a99d78d8755448c91064a390b0762fe", see output below
2025-01-16 10:07:34.994 PST
fatal: couldn't find remote ref main
2025-01-16 10:07:34.994 PST
: exit status 128
2025-01-16 10:07:34.994 PST
Continuing...
2025-01-16 10:07:35.027 PST
2025/01/16 18:07:35 Failed to git reset [git reset --hard origin/main] at "/data/data/vcs-191ecf511a99d78d8755448c91064a390b0762fe", see output below
2025-01-16 10:07:35.028 PST
fatal: ambiguous argument 'origin/main': unknown revision or path not in the working tree.
2025-01-16 10:07:35.028 PST
Use '--' to separate paths from revisions, like this:
2025-01-16 10:07:35.028 PST
'git <command> [<revision>...] -- [<file>...]'
2025-01-16 10:07:35.028 PST
: exit status 128
2025-01-16 10:07:35.028 PST
Continuing...
2025-01-16 10:07:36.117 PST
2025/01/16 18:07:36 Failed to git fetch [git fetch --prune --no-tags --depth 1 origin +main:remotes/origin/main] at "/data/data/vcs-431070b759cc3f63b26e44d16ad8aeb7cd1fd4aa", see output below
2025-01-16 10:07:36.216 PST
fatal: couldn't find remote ref main
2025-01-16 10:07:36.216 PST
: exit status 128
2025-01-16 10:07:36.216 PST
Continuing...
2025-01-16 10:07:36.216 PST
2025/01/16 18:07:36 Failed to git reset [git reset --hard origin/main] at "/data/data/vcs-431070b759cc3f63b26e44d16ad8aeb7cd1fd4aa", see output below
2025-01-16 10:07:36.216 PST
fatal: ambiguous argument 'origin/main': unknown revision or path not in the working tree.
2025-01-16 10:07:36.216 PST
Use '--' to separate paths from revisions, like this:
2025-01-16 10:07:36.216 PST
'git <command> [<revision>...] -- [<file>...]'
2025-01-16 10:07:36.216 PST
: exit status 128
2025-01-16 10:07:36.216 PST
Continuing...
2025-01-16 10:07:37.228 PST
2025/01/16 18:07:37 Failed to git fetch [git fetch --prune --no-tags --depth 1 origin +main:remotes/origin/main] at "/data/data/vcs-2f5713f4bdffe87ea49b3f54b5a803caed48ba3c", see output below
2025-01-16 10:07:37.417 PST
fatal: couldn't find remote ref main
2025-01-16 10:07:37.417 PST
: exit status 128
2025-01-16 10:07:37.417 PST
Continuing...
2025-01-16 10:07:37.417 PST
2025/01/16 18:07:37 Failed to git reset [git reset --hard origin/main] at "/data/data/vcs-2f5713f4bdffe87ea49b3f54b5a803caed48ba3c", see output below
2025-01-16 10:07:37.417 PST
fatal: ambiguous argument 'origin/main': unknown revision or path not in the working tree.
2025-01-16 10:07:37.417 PST
Use '--' to separate paths from revisions, like this:
2025-01-16 10:07:37.417 PST
'git <command> [<revision>...] -- [<file>...]'
2025-01-16 10:07:37.417 PST
: exit status 128
2025-01-16 10:07:37.417 PST
Continuing...
2025-01-16 10:07:40.537 PST
2025/01/16 18:07:40 merge 0 files + mem
2025-01-16 10:07:41.150 PST
2025/01/16 18:07:41 43287889 data bytes, 7067142 index bytes
2025-01-16 10:07:41.150 PST
2025/01/16 18:07:41 Searcher started for etcd-io/raft
2025-01-16 10:07:42.145 PST
2025/01/16 18:07:42 merge 2 files + mem
2025-01-16 10:07:43.516 PST
2025/01/16 18:07:43 Failed to git fetch [git fetch --prune --no-tags --depth 1 origin +main:remotes/origin/main] at "/data/data/vcs-f34cbc91cc3c0e7599d0191326cbe17f95b59ade", see output below
2025-01-16 10:07:43.622 PST
fatal: couldn't find remote ref main
2025-01-16 10:07:43.622 PST
: exit status 128
2025-01-16 10:07:43.622 PST
Continuing...
2025-01-16 10:07:43.622 PST
2025/01/16 18:07:43 Failed to git reset [git reset --hard origin/main] at "/data/data/vcs-f34cbc91cc3c0e7599d0191326cbe17f95b59ade", see output below
2025-01-16 10:07:43.622 PST
fatal: ambiguous argument 'origin/main': unknown revision or path not in the working tree.
2025-01-16 10:07:43.622 PST
Use '--' to separate paths from revisions, like this:

I probably won't be able to dig further for a bit, but clearly the canary deployment needs some work before we can switch.

@dims
Copy link
Member

dims commented Jan 19, 2025

Specs from current cs.k8s.io

16 CPU(s)

root@k8s-code-2022:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            13
    CPU max MHz:         5000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            6799.81
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
                         constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
                          sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ep
                         b invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
                          mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_
                         l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    2 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Mitigation; Enhanced IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Mitigation; TSX disabled

32 GB Memory:

root@k8s-code-2022:~# free -mh
               total        used        free      shared  buff/cache   available
Mem:            31Gi       7.0Gi       658Mi        11Mi        23Gi        29Gi
Swap:          1.9Gi        40Mi       1.9Gi

@dims
Copy link
Member

dims commented Jan 19, 2025

opened a PR to bump cs-canary to 4 CPU + 16 GB Memory for now - #7695

@dims dims linked a pull request Jan 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
Status: Done
Status: Needs Triage