Mythos 模型深度体验:成本、安全与实战评测
# Mythos 模型深度体验:成本、安全与实战评测
原文作者:zek (@zekramu) 原文链接:https://x.com/zekramu/status/2064131568495677490 (opens new window) 发布时间:2026年6月8日
As promised, here are my thoughts after spending all day with Mythos. I hope to god Anthropic doesn't sue the fuck outta me but yolo. Fair warning, this is a long one.
如约而至,这是我花了一整天时间体验 Mythos 后的想法。老天保佑 Anthropic 别告死我,但人生苦短,豁出去了。提前预警,这是一篇长文。
# 1. The Cost(成本)
Mythos pricing, at least for our enterprise was uhh expensive. I thought being a pilot company would mean they'd let us try it for free but no lmao. They did give a decent amount of free tokens from the API at least, but cost estimates put us well above a million dollars spent on it. In comparison, my company spent 2 million on inference for the entirety of last month for everyone in the company. So yeah, shit is pricey as hell.
Mythos 的定价——至少对我们企业来说——呃,很贵。我以为作为试点公司他们会让我们免费试用,结果想多了哈哈。他们确实从 API 给了一些免费额度,但成本估算显示我们在这上面花的钱远超一百万美元。相比之下,我司上个月全公司的推理总支出也就两百万美元。所以,是的,这玩意儿贵得离谱。
# 2. The Harness(安全框架)
The biggest surprise to me was that they actually sent us a harness that was NOT Claude Code. It's sort of dinky and, looks to me largely AI generated. Most of it focused on ensuring Mythos did not "escape containment" along with some shitty security skills. So, they are definitely taking the sandboxing seriously.
最大的惊喜是,他们发给我们的安全框架居然不是 Claude Code。它看起来有点寒酸,而且在我看来大部分是 AI 生成的。框架的主要内容是确保 Mythos 不会"逃离沙盒",外加一些蹩脚的安全技能。所以,他们确实在认真对待沙盒隔离这件事。
IMO it's pretty shit/restrictive harness. Half of the guard rails don't work, lmao, and apparently this is basically what "Project Glasswing" is, which is pretty funny considering the harness is shit. I'm not sure that the harness will be released with the model API when it drops either, it seemed like that was part of the deal.
在我看来,这个框架相当糟糕/限制性很强。一半的护栏都不管用,笑死,而且这基本上就是所谓的"Project Glasswing"——考虑到框架本身这么烂,这就挺滑稽的。我也不确定模型 API 发布时会不会附带这个框架,看起来这是交易的一部分。
Quite interested to see what they do when it drops/how it gets opened up. I was able to use Mythos outside of the harness (OMP btw)… more on that in a sec, though, I did have to hack around as they really don't want people to do this (what I was told at least).
我很好奇它正式发布时会怎样、如何开放。我确实能在框架外使用 Mythos(顺便说一句是 OMP)……稍后再细说,不过我确实得绕过一些东西,因为他们真的很不希望人们这么做(至少我是这么被告知的)。
# 3. The Model(模型本身)
Probably the part everyone is most interested in. I will say, the model is good. Is it expensive? Fuck yes. But it's good.
这大概是大家最感兴趣的部分。我得说,这模型确实不错。贵吗?他妈的贵。但它确实不错。
To me, it feels like it is fine-tuned explicitly for this sort of security research tasks. For general coding, which I wasn't able to play with much, it wasn't that surprising. But, it is indeed very good at security-based tasks. Far better than Opus / 5.5 xhigh.
对我来说,它给人的感觉是专门为安全研究这类任务做过微调。对于通用编程——我没怎么细玩——表现没那么惊艳。但在安全相关任务上,它确实非常强。比 Opus / 5.5 xhigh 强多了。
That said, I don't feel as though it's some omnipresent danger/threat to society. I watched it get confused trying to use our build tool, actually to the point where I had to build the code for it and then run the model against the full build. You'd think an omnipresent model could do this, but nothing on the market has been able to figure it out. And it's just Bazel with some custom shit we built. Nothing crazy.
话虽如此,我并不觉得它是什么无所不在的危险/对社会的威胁。我看着它在尝试使用我们的构建工具时一脸懵逼,到最后我得帮它构建代码,然后再让模型对着完整的构建产物运行。你以为一个全能模型能搞定这个,但市面上没有任何模型能弄明白。而这只不过是用 Bazel 加上我们自己定制的一些东西,没什么疯狂的。
That said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. But only a genuinely skilled engineer/security researcher.
不过话又说回来,如果有人既有钱又有丰富的框架知识,那确实可能用它来做一些恶意的事情。但这需要真正的资深工程师/安全研究员。
# 4. The Results(实际效果)
Mythos was able to find quite a bit of vulnerabilities across a few of our products (like products probably everyone on this app has interacted with indirectly, maybe a small few directly). I think the final total was like ~800 major threats. Definitely enough to rethink some of the security strategy.
Mythos 确实在我们好几个产品中发现了相当多的漏洞(这些产品大概在这个平台上的每个人都间接用过,少数人可能直接用过)。最终统计大概有约 800 个主要威胁。这绝对足以让我们重新思考安全策略。
# 5. Final Thoughts(总结)
It's a good model sir. It's not an existential threat to humanity as Anthropic might lead you to believe, but it's genuinely good. Cost-wise I would like to try a comparison with 5.5 xhigh but alas I don't have a million dollars to throw at it to do a proper comparison.
这是个好模型,朋友。它不是 Anthropic 可能让你以为的那种对人类的存在性威胁,但它确实很好。成本方面,我很想拿它和 5.5 xhigh 做个对比,但遗憾的是我没有一百万美元可以砸进去做个正经的对比实验。
# 附:前情提要
I have it on good word (from Anthropic) that Mythos is in fact dropping this summer… get to play with it next week… my current expectations are that it's shit and Anthropic is selling hot garbage but we will see.
我从可靠渠道(Anthropic 方面)得知,Mythos 确实会在今年夏天发布……下周我就能上手试试……我目前的预期是它很烂,Anthropic 在卖狗屎,但走着瞧吧。
相关链接: