🥟 Chao-Down #353 Anthropic funds third party evaluations of AI models, AI outperforms financial analysts on routine tasks, Where autonomous vehicles still have room to improve

Jul 10, 2024

Enjoy today’s slice of the AI internet!

-Alex, your resident Chaos Coordinator.

Anthropic launches fund to measure capabilities of AI models (InfoWorld)

How Disinformation From a Russian AI Spam Farm Ended up on Top of Google Search Results (WIRED)

Can AI be superhuman? Flaws in top gaming bot cast doubt (Nature)

AI can outperform some financial analysts, draft study finds (qz.com)

GPT-4 autonomously hacks zero-day security flaws with 53% success rate (New Atlas)

Autonomous Vehicles Are Great at Driving Straight (IEEE Spectrum)

Mitigating Skeleton Key, a new type of generative AI jailbreak technique (Microsoft Security Blog)

GenAI in Hollywood: Threat or Opportunity? (Substack)

AI is the reason interviews are harder now (softwaredesign.ing)

reworkd/tarsier: Vision utilities for web interaction agents 👀 (Github)

GitHub - willccbb/mlx_parallm: Fast parallel LLM inference for MLX (Github)

microsoft/typespec - TypeSpec is a language for defining cloud service APIs and shapes. (Github)

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? (arxiv)

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning (arxiv)

Scaling Synthetic Data Creation with 1,000,000,000 Personas (arxiv)

Airport complaints hit near record according to Department of Transportation (qz.com)

How Much Happiness Can Your Salary Buy? Researchers Can’t Agree (WSJ)

Why Chronic Illness Symptoms are Commonly Dismissed as Just Stress (The New York Times)

Rich Freeze Their Bodies and Their Fortunes in Bet on Cryogenics (Bloomberg)

How public universities hooked America on meat (Vox)

The New Age of Endless Parenting (The Atlantic)