How bad is it

Three things are true at the same time. Hold all three in your head and the whole plan makes sense.

It mostly works today

Phones ring, lanes score, cameras record, cards clear. Day to day, it runs.

One bad day from a very bad day

One firewall, one internet circuit, one compute cluster on dead-end hardware — each carries the whole business. Any one failing takes identity, phones, cameras, and sales down together.

You don't fully control it

A prior integrator can still reconfigure your Cisco fabric, your Wi-Fi controller lives in someone else's Oracle Cloud, and ~20 of 27 admins have no 2FA — including every owner.

The sharp edges — what's actually dangerous right now

🔴

Your firewall is effectively off. The first inbound rule is literally allow any → any:any, and it fires before default-deny — every tighter rule beneath it is dead weight. WAN is pingable. Every internal host — phone gateway, VMs, cameras, cardholder terminals — is potentially reachable from the open internet.

🔴

Six management planes exposed to the internet, several outside Meraki entirely: the Wi-Fi controller .193, Expressway-E collab edge .201, two FlexVPN heads .202/.203, and a second uncontrolled front door — Cisco ISR4431 "pacbowlcube1" on its own public IP 50.245.164.200 with public SSH, bypassing your firewall completely.

🟠

27 admins, ~15 full-org write, 2FA off on ~20 (including every owner). Four live API keys — two are personal Gmail; santiesteban54@gmail.com's key was active today. Full-org write held by an AV vendor, ~10 cisco.com accounts, and never-logged-in dangling invites. One phished password = full takeover.

🟠

Cardholder / PCI exposure. Two Square readers sit on the open guest Wi-Fi; a spoofed-MAC host 10.0.191.100 is parked on the cardholder VLAN; a hidden "TV" SSID uses PSK 12345678 and a hidden "bar" SSID drops straight onto the POS VLAN.

The network, on one screen

Internet at the top, the shared compute cluster at the bottom. Color = risk. Everything critical funnels into one 3-host cluster on end-of-life hardware — that concentration is the story.

Pacific Avenue Bowl — logical topology (Meraki-visible + Cisco-inferred + tonight's ESXi ground truth)

exposed

🌐 Public Internet

50.245.164.0/28

6 management planes reachable from the open internet

Edge

no failover

Meraki MX250 "bowlmx"

50.245.164.205

The one front door · no WAN2 · warm-spare off

bypass

Cisco ISR4431 "pacbowlcube1"

50.245.164.200

2nd uncontrolled door · public SSH · SIP CUBE (toll-fraud) · DMVPN→OCI

Core / routing

2× Meraki MS425-32

10.0.254.2 / .3

"DC CORE" — but L2 only, single-PSU each

unknown

Cisco 3850 "pacbowlswitch1"

10.0.254.101

Real L3 core candidate · is this the 10.0.254.1 /8 next-hop? — probe #1

Wi-Fi controller (vWLC)

10.8.0.100 · in OCI

Controls ALL Cisco Wi-Fi · lives in someone's Oracle Cloud

VLAN zones

🖥️ Servers + Voice VLAN 150

DC1 / DC2 — Active Directory .1 / .2
CUCM ×2 · Unity · VCS-C .150 / .151 / .111
Expressway-E — internet edge 10.0.111.1
vCenter — recovered .250

Crown jewels + EoL RHEL6 voice — all on the one cluster below

🎥 Cameras VLAN 199

28 Axis cameras .101–.131
Windows NVR "video" 10.0.150.100
Auth against Active Directory

Stays onsite → new dedicated NVR (§5). cam15 dead, cam3 + Lane-1 mis-VLANed.

💳 POS + Cardholder 190 / 191 / 192

Qubica lane/scoring PCs VLAN 190
Bar POS · Castles · bar-vm-1/2/3 VLAN 191
Castles term + Hantle ATM VLAN 192

PCI exposure · rogue host 10.0.191.100 · static "bar" PSK

🗄️ Storage VLAN 250 / 1

QNAP bowlnas 10.0.250.150
QNAP bowlnas3 .170 / OOB .9
Storage legs on native VLAN 1 192.168.14.x

NFS datastores · un-isolated · half-up bond (APIPA)

🔧 Management 254 / 100

6× Cisco switches .101–.106
Term Server (OOB) 10.0.254.100
UCS CIMC · APC UPS · PowerChute

Term Server = the OOB path to all Cisco gear

📶 Guest + Phones VLAN 500

~313 personal devices
2 Square card readers on open Wi-Fi
~11 Cisco SEP phones VLAN 194

Cardholder devices on the open guest network

⚠️ One shared compute cluster carries all of it

esxi2 · 10.0.250.2 ✓ esxi3 · 10.0.250.3 ✓ esxi1 · 10.0.250.1 (down · play box) QNAP NFS storage

3× Cisco UCS C240-M3 on ESXi 6.7 — end of support since 2022. Identity, phones, all 28 cameras, POS back-ends, and vCenter are co-resident here. One chassis loss = identity + phones + cameras + sales down together. Phone path: Flowroute SIP → CUBE → CUCM → desk phones.

Ranked risk register

Worked top to bottom. The two contested facts at the end move everything else and get resolved first.

#	Severity	Risk	Evidence
1	CRIT	Inbound firewall wide open — ANYTHING allow fires before default-deny; every internal host reachable	fw_inbound rule 1
2	CRIT	Privileged control outside your identity — personal-Gmail key live today, external full-org admins, dangling invites, 2FA off on owners; prior MSP (ASP-NSO + cisco.com admins) can reconfigure the whole fabric	admins · ESXi
3	HIGH	Multiple internet-exposed mgmt planes + toll-fraud — vWLC, Expressway-E, 2× FlexVPN SSH, ISR bypass edge, public CUBE	nat_1to1
4	HIGH	Edge single point of failure — one MX250, no WAN2, warm-spare disabled → any firewall/ISP failure downs internet + VPN + voice + wireless	warmspare
5	HIGH	Everything on one EoL cluster — AD/DNS/RADIUS, CUCM×2/Unity/VCS, all 28 cameras, POS, on 3× UCS C240-M3 / ESXi 6.7 (vCenter fixed tonight — was CRIT)	tonight
6	HIGH	Integrator + Oracle Cloud dependency — vWLC + collab transit in an OCI tenancy of unknown ownership; fabric operable only by the prior expert	multiple
7	HIGH	Flat 10/8 blast radius — MX routes all of 10.0.0.0/8 to one hop + into AutoVPN; native VLAN 1 mixes storage + HVAC + personal phones	staticroutes
8	MED	Cardholder / PCI exposure — Square readers on open guest Wi-Fi; spoofed-MAC host on cardholder VLAN	POS
9	MED	Extra remote-access doors — Client VPN pool + an un-pulled 2nd Meraki admin network	clients
10	MED	Weak Wi-Fi keys — "TV" 12345678, "bar" static PSK onto cardholder VLAN; no PMF	ssids
11–13	MED	Single-PSU cores · surveillance gaps (cam3/cam15/Lane-1) · power/UPS + CIMC monitoring offline	device status
14–15	LOW	Contested facts (resolve first): what is 10.0.254.1? · who owns the OCI tenancy + ASP-NSO orchestrator?	probe #1/#2

What we fixed tonight

✅ vCenter is back — after being down since February

With vCenter down there was no safe, supported way to back up, export, or migrate a single VM — every move would have been blind. Root cause (understood, not guessed): after a July-1 cert regeneration, the Lookup Service kept trusting the expired old machine-SSL cert, so vpxd-svcs failed pre-start with "Invalid certificate." Fix: reconciled the trust anchors with ls_update_certs.py and restarted — not the destructive full-reset that the obvious diagnosis would have led to.

✓ vCenter healthy ✓ esxi2 + esxi3 CONNECTED ⚠ esxi1 down (your play box) ✓ clean backup/export path unlocked

This is what flips the whole migration from risky to methodical: we can take clean per-VM backups before touching anything.

The cloud plan — what, where, why, cost

The commodity principle, in one line: rent every commodity — identity, email, phones, POS, accounting, files — from vendors who run it better and cheaper than we ever could; own only the two things that are differentiating or physically wrong to rent: lane scoring and camera footage. Everything else becomes a predictable monthly bill instead of a hardware liability, and the whole EoL UCS/ESXi/QNAP estate goes to e-waste.

Layer	Move	Destination	$/mo
Identity / AD	RENT	Google Cloud Identity (you already own the tenant) + JumpCloud cloud-RADIUS/MDM · decouple cameras from AD first	200–300
Cisco voice	RENT	Zoom Phone (best value) · port Flowroute DIDs · retiring VCS-E removes an internet exposure	150–350
Cameras + NVR	ONSITE	Dedicated NVR appliance — stays local, see §6	~0
Bar / POS	RENT	Square for Restaurants + P2PE readers → collapses PCI from SAQ-D to SAQ-A	70–165
Accounting	RENT	QuickBooks Online Plus (Desktop is being sunset anyway); keep Payentry payroll short-term	~99
File storage	RENT	Google Shared Drives (already in Workspace); footage → onsite NVR	0–60
Lane scoring	KEEP	QubicaAMF Conqueror X on the Dell fleet — modernize, don't retire	—
The rest of the VMs + UCS/QNAP	RETIRE	DC1/2, Unity, VCS, PowerChute VM, ASP-NSO, then the hosts + both QNAPs — final vCenter-backed export first	e-waste
2nd WAN (resilience prereq)	ADD	Fiber + cable/5G failover — the go-live gate for voice/POS/RADIUS	100–300

New recurring SaaS: ~$650–1,300 / mo

This replaces the aging-hardware + MSP + power liability — it does not add to it. Card processing + Workspace you already pay.

One-time capex: ~$11–25k

Onsite NVR + 60–80 TB RAID (~$5–10k) · Square hardware (~$2–4k) · handsets/paging (~$1–3k) · 2nd WAN + warm-spare MX (~$2–6k) · misc (~$1–2k).

Sequence — parallel-run, then cut, never a flag day

Phase 1 (foundation + security): turn the firewall back on, close the exposed planes, stand up cloud identity/RADIUS, submit the Flowroute port order (long pole), order long-lead hardware. Phase 2 (cutovers): camera NVR first (no WAN dependency), then Wi-Fi/RADIUS, voice, POS, accounting, files — each proven in parallel before cutting. Phase 3 (decommission): only after each consumer is verified migrated — DCs → Cisco UC + MSP overlay → vCenter/hosts → both QNAPs last. Go-live gate: don't cut voice/POS/RADIUS live until the 2nd WAN is in.

Cameras + NVR — stays onsite, outlives the move

Keep every Axis camera. Retire the Windows "video" VM and the dying cluster it rides. Replace it with one dedicated warrantied NVR on its own storage, its own PoE switch, and its own UPS — a self-contained recording island on the camera VLAN that keeps recording even if the internet, the cloud, and the old datacenter are all down.

Retention math — this drives the cost (32 cameras, continuous 24/7)

Blended bitrate / cam	32 cams / day	30 days	60 days	90 days
1.5 Mbps (H.265 + Zipstream)	0.52 TB	16 TB	31 TB	47 TB
2.5 Mbps (planning anchor)	0.86 TB	26 TB	52 TB	78 TB
4.0 Mbps (legacy H.264)	1.38 TB	41 TB	83 TB	124 TB

Your current ~36 TB ÷ 0.86 TB/day ≈ 42–60 days today — matches the anchor. Design target: ~52 TB usable → 60 days (60–90 real days once H.265 + Zipstream is on), i.e. ~72–96 TB raw in RAID-6.

Option	Type	Capex	Verdict
AXIS S1264 (64 TB) + Camera Station Pro	Single-vendor appliance	$18–22k	Primary — cleanest. All-Axis, perpetual license, 5-yr warranty
Synology RS2821RP+ + Surveillance Station	Linux appliance, BTRFS self-heal	$5.5–6.5k	Best value — my pick if capex matters. ⅓ the cost, self-healing; put the savings toward the 2nd WAN
DIY server + Blue Iris / Frigate	Parted-out	$3–4k	Second-archive only — no SLA, fails like the current VM did

My call: AXIS S1264 + ACS Pro as system-of-record if you want the single cleanest thing to hand an integrator; the Synology RS2821RP+ at ~⅓ the price is the honest best-value pick. Either way: RAID-6 + hot spare, surveillance-grade drives only, 60-day baseline / 90-day on entrances-bar-ATM-parking (confirm the alcohol-venue legal minimum with counsel + insurer), H.265 + Zipstream on, and fix cam3 / cam15 / Lane-1 during cutover.

What happens next — and what needs you

🔴 Needs you — nothing moves without these

Credentials + a management tunnel into the Meraki org and the Cisco/ESXi gear (read-only first). Everything downstream depends on this.
MSP offboarding — the ownership decision. Recover Cisco/CUCM/ISR/Flowroute/OCI creds from ciscoasp.net; resolve who owns the Oracle Cloud tenancy (it's the kill-switch for your Wi-Fi controller). A contract action only you can start.
Owner decisions that size the project: camera retention window (with counsel/insurer) · Sekure-vs-direct Square rate review · Julianne's QuickBooks + payroll handover · capital approval (~$11–25k) · is esxi1 truly just your play box?
Approve the two contested-fact probes (§3): box-login pacbowlswitch1 via the Term Server, and the OCI/ASP-NSO ownership trace.

🟠 Security quick wins — do first, low interdependency

Turn the firewall back on — rewrite the MX inbound rules to default-deny, killing the ANYTHING allow, without stranding the Client VPN. Highest leverage single change.
Close the six internet-facing mgmt planes — kill public SSH on the ISR bypass edge + FlexVPN heads; lock down Expressway-E + vWLC; power off internet-exposed VCS-E early.
Lock the phone gateway to Flowroute (pin dial-peers) to kill the toll-fraud surface.
Fix the admin plane — org-wide MFA (owners first), revoke dangling invites + external full-org admins, rotate all four API keys now (the personal-Gmail key was live today).
Rotate weak Wi-Fi keys, enable PMF, quarantine the spoofed-MAC host on the cardholder VLAN.
Order long-lead hardware + submit the Flowroute number-port order in parallel.

🟢 Resilience, cutover, decommission

Take full per-VM backups now via the restored vCenter — before any power-cycle.
Install the 2nd diverse WAN + warm-spare MX — the go-live gate for voice/POS/RADIUS.
Cutover (parallel-run → cut): camera NVR first, then Wi-Fi/RADIUS → voice → Square P2PE → QBO → Google Drives.
Decommission (only after each is verified migrated): DC1/DC2 → Cisco UC + MSP/OCI overlay → vCenter/hosts → both QNAPs last, rehoming our onsite agent to a mini-PC first.