Three things are true at the same time. Hold all three in your head and the whole plan makes sense.
Phones ring, lanes score, cameras record, cards clear. Day to day, it runs.
One firewall, one internet circuit, one compute cluster on dead-end hardware — each carries the whole business. Any one failing takes identity, phones, cameras, and sales down together.
A prior integrator can still reconfigure your Cisco fabric, your Wi-Fi controller lives in someone else's Oracle Cloud, and ~20 of 27 admins have no 2FA — including every owner.
Your firewall is effectively off. The first inbound rule is literally allow any → any:any, and it fires before default-deny — every tighter rule beneath it is dead weight. WAN is pingable. Every internal host — phone gateway, VMs, cameras, cardholder terminals — is potentially reachable from the open internet.
Six management planes exposed to the internet, several outside Meraki entirely: the Wi-Fi controller .193, Expressway-E collab edge .201, two FlexVPN heads .202/.203, and a second uncontrolled front door — Cisco ISR4431 "pacbowlcube1" on its own public IP 50.245.164.200 with public SSH, bypassing your firewall completely.
27 admins, ~15 full-org write, 2FA off on ~20 (including every owner). Four live API keys — two are personal Gmail; santiesteban54@gmail.com's key was active today. Full-org write held by an AV vendor, ~10 cisco.com accounts, and never-logged-in dangling invites. One phished password = full takeover.
Cardholder / PCI exposure. Two Square readers sit on the open guest Wi-Fi; a spoofed-MAC host 10.0.191.100 is parked on the cardholder VLAN; a hidden "TV" SSID uses PSK 12345678 and a hidden "bar" SSID drops straight onto the POS VLAN.
Internet at the top, the shared compute cluster at the bottom. Color = risk. Everything critical funnels into one 3-host cluster on end-of-life hardware — that concentration is the story.
3× Cisco UCS C240-M3 on ESXi 6.7 — end of support since 2022. Identity, phones, all 28 cameras, POS back-ends, and vCenter are co-resident here. One chassis loss = identity + phones + cameras + sales down together. Phone path: Flowroute SIP → CUBE → CUCM → desk phones.
Worked top to bottom. The two contested facts at the end move everything else and get resolved first.
| # | Severity | Risk | Evidence |
|---|---|---|---|
| 1 | CRIT | Inbound firewall wide open — ANYTHING allow fires before default-deny; every internal host reachable | fw_inbound rule 1 |
| 2 | CRIT | Privileged control outside your identity — personal-Gmail key live today, external full-org admins, dangling invites, 2FA off on owners; prior MSP (ASP-NSO + cisco.com admins) can reconfigure the whole fabric | admins · ESXi |
| 3 | HIGH | Multiple internet-exposed mgmt planes + toll-fraud — vWLC, Expressway-E, 2× FlexVPN SSH, ISR bypass edge, public CUBE | nat_1to1 |
| 4 | HIGH | Edge single point of failure — one MX250, no WAN2, warm-spare disabled → any firewall/ISP failure downs internet + VPN + voice + wireless | warmspare |
| 5 | HIGH | Everything on one EoL cluster — AD/DNS/RADIUS, CUCM×2/Unity/VCS, all 28 cameras, POS, on 3× UCS C240-M3 / ESXi 6.7 (vCenter fixed tonight — was CRIT) | tonight |
| 6 | HIGH | Integrator + Oracle Cloud dependency — vWLC + collab transit in an OCI tenancy of unknown ownership; fabric operable only by the prior expert | multiple |
| 7 | HIGH | Flat 10/8 blast radius — MX routes all of 10.0.0.0/8 to one hop + into AutoVPN; native VLAN 1 mixes storage + HVAC + personal phones | staticroutes |
| 8 | MED | Cardholder / PCI exposure — Square readers on open guest Wi-Fi; spoofed-MAC host on cardholder VLAN | POS |
| 9 | MED | Extra remote-access doors — Client VPN pool + an un-pulled 2nd Meraki admin network | clients |
| 10 | MED | Weak Wi-Fi keys — "TV" 12345678, "bar" static PSK onto cardholder VLAN; no PMF | ssids |
| 11–13 | MED | Single-PSU cores · surveillance gaps (cam3/cam15/Lane-1) · power/UPS + CIMC monitoring offline | device status |
| 14–15 | LOW | Contested facts (resolve first): what is 10.0.254.1? · who owns the OCI tenancy + ASP-NSO orchestrator? | probe #1/#2 |
With vCenter down there was no safe, supported way to back up, export, or migrate a single VM — every move would have been blind. Root cause (understood, not guessed): after a July-1 cert regeneration, the Lookup Service kept trusting the expired old machine-SSL cert, so vpxd-svcs failed pre-start with "Invalid certificate." Fix: reconciled the trust anchors with ls_update_certs.py and restarted — not the destructive full-reset that the obvious diagnosis would have led to.
This is what flips the whole migration from risky to methodical: we can take clean per-VM backups before touching anything.
| Layer | Move | Destination | $/mo |
|---|---|---|---|
| Identity / AD | RENT | Google Cloud Identity (you already own the tenant) + JumpCloud cloud-RADIUS/MDM · decouple cameras from AD first | 200–300 |
| Cisco voice | RENT | Zoom Phone (best value) · port Flowroute DIDs · retiring VCS-E removes an internet exposure | 150–350 |
| Cameras + NVR | ONSITE | Dedicated NVR appliance — stays local, see §6 | ~0 |
| Bar / POS | RENT | Square for Restaurants + P2PE readers → collapses PCI from SAQ-D to SAQ-A | 70–165 |
| Accounting | RENT | QuickBooks Online Plus (Desktop is being sunset anyway); keep Payentry payroll short-term | ~99 |
| File storage | RENT | Google Shared Drives (already in Workspace); footage → onsite NVR | 0–60 |
| Lane scoring | KEEP | QubicaAMF Conqueror X on the Dell fleet — modernize, don't retire | — |
| The rest of the VMs + UCS/QNAP | RETIRE | DC1/2, Unity, VCS, PowerChute VM, ASP-NSO, then the hosts + both QNAPs — final vCenter-backed export first | e-waste |
| 2nd WAN (resilience prereq) | ADD | Fiber + cable/5G failover — the go-live gate for voice/POS/RADIUS | 100–300 |
This replaces the aging-hardware + MSP + power liability — it does not add to it. Card processing + Workspace you already pay.
Onsite NVR + 60–80 TB RAID (~$5–10k) · Square hardware (~$2–4k) · handsets/paging (~$1–3k) · 2nd WAN + warm-spare MX (~$2–6k) · misc (~$1–2k).
Phase 1 (foundation + security): turn the firewall back on, close the exposed planes, stand up cloud identity/RADIUS, submit the Flowroute port order (long pole), order long-lead hardware. Phase 2 (cutovers): camera NVR first (no WAN dependency), then Wi-Fi/RADIUS, voice, POS, accounting, files — each proven in parallel before cutting. Phase 3 (decommission): only after each consumer is verified migrated — DCs → Cisco UC + MSP overlay → vCenter/hosts → both QNAPs last. Go-live gate: don't cut voice/POS/RADIUS live until the 2nd WAN is in.
Keep every Axis camera. Retire the Windows "video" VM and the dying cluster it rides. Replace it with one dedicated warrantied NVR on its own storage, its own PoE switch, and its own UPS — a self-contained recording island on the camera VLAN that keeps recording even if the internet, the cloud, and the old datacenter are all down.
| Blended bitrate / cam | 32 cams / day | 30 days | 60 days | 90 days |
|---|---|---|---|---|
| 1.5 Mbps (H.265 + Zipstream) | 0.52 TB | 16 TB | 31 TB | 47 TB |
| 2.5 Mbps (planning anchor) | 0.86 TB | 26 TB | 52 TB | 78 TB |
| 4.0 Mbps (legacy H.264) | 1.38 TB | 41 TB | 83 TB | 124 TB |
Your current ~36 TB ÷ 0.86 TB/day ≈ 42–60 days today — matches the anchor. Design target: ~52 TB usable → 60 days (60–90 real days once H.265 + Zipstream is on), i.e. ~72–96 TB raw in RAID-6.
| Option | Type | Capex | Verdict |
|---|---|---|---|
| AXIS S1264 (64 TB) + Camera Station Pro | Single-vendor appliance | $18–22k | Primary — cleanest. All-Axis, perpetual license, 5-yr warranty |
| Synology RS2821RP+ + Surveillance Station | Linux appliance, BTRFS self-heal | $5.5–6.5k | Best value — my pick if capex matters. ⅓ the cost, self-healing; put the savings toward the 2nd WAN |
| DIY server + Blue Iris / Frigate | Parted-out | $3–4k | Second-archive only — no SLA, fails like the current VM did |
My call: AXIS S1264 + ACS Pro as system-of-record if you want the single cleanest thing to hand an integrator; the Synology RS2821RP+ at ~⅓ the price is the honest best-value pick. Either way: RAID-6 + hot spare, surveillance-grade drives only, 60-day baseline / 90-day on entrances-bar-ATM-parking (confirm the alcohol-venue legal minimum with counsel + insurer), H.265 + Zipstream on, and fix cam3 / cam15 / Lane-1 during cutover.