Llama 3.1 Nemotron Instruct 70B

NVIDIA

Open

Context

Release Date: Oct 15, 2024
Knowledge Cutoff: Dec 01, 2023
Window: 128k

PricingPer 1M tokens

Input: $0.12
Output: $0.3
Blended 3:1: $0.165

Capabilities

Speed: 41 t/s
Input
Output
Reasoning tokens

Latency

TTFT: 0.53 ms
500 token response: 12.68 s

Benchmarks

Intelligence: ●●○○○
Math: ●●○○○
Coding: ●○○○○
MMLU Pro: 69.0%
GPQA: 46.5%
HLE: 4.6%
SciCode: 23.3%
AIME: 24.7%
MATH 500: 73.3%
LiveCodeBench: 16.9%
HumanEval: 81.5%

Llama 3.1 Nemotron Instruct 70B