Skip to content

Instantly share code, notes, and snippets.

model correctness % correct incorrect
15 qwen:110b 46.25 777 903
16 qwen:110b-chat-v1.5-q8_0 45.65 767 913
21 qwen:110b-chat-v1.5-q2_K 42.76 667 893
model correctness % correct incorrect
2 llama3:70b 66.25 2195 1118
11 dolphin-llama3:70b-v2.9 51.58 865 812
20 deepseek-llm:67b 43.99 1653 2105
29 qwen:72b 34.12 1146 2213
31 llama2:70b 33.27 1118 2242
49 llama2-uncensored:70b 25.21 847 2513
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 3.
,testplan,testset,model,correct,correct_answer,answer,question
0,Basic Financial Q&A,20240425-1,qwen:32b-chat-v1.5-fp16,true,"{""answer"": ""NVDA""}","{""answer"": ""NVDA""}","During multiple trading sessions, an investor buys 200 shares of NVIDIA (NVDA) at $510, sells 100 at $515, buys 50 more at $512. Buys 300 shares of AMD (AMD) at $85, sells 150 at $90, buys 75 more at $88. Buys 180 shares of Salesforce (CRM) at $250, sells 90 at $255, buys 45 more at $253. Buys 210 shares of PayPal (PYPL) at $190, sells 105 at $195, buys 52 more at $192. Which stock has the highest average selling price? use only stock ticker in answer."
1,Basic Financial Q&A,20240425-1,qwen:32b-chat-v1.5-fp16,false,"{""answer"": ""PYPL""}","{""answer"": ""CRM""}","A portfolio includes: Buys 220 shares of Salesforce (CRM) at $250, sells 110 at $255, buys 55 more at $252. Buys 240 shares of PayPal (PYPL) at $190, sells 120 at $195, buys 60 more at $192. Buys 250 shares of Meta (META) at $275, sells 125 at $280, buys 62 more at $278. Buys
@alvincho
alvincho / gist:cbc450e7ab27d390b6a34925848ceddb
Last active May 4, 2024 02:36
20240425-1 Models Results
rank testplan testset model correctness correct incorrect
0 Basic Financial Q&A 20240425-1 llama3:70b-instruct-q8_0 0.6693 2206 1090
1 Basic Financial Q&A 20240425-1 llama3:70b-instruct-fp16 0.6645 2216 1119
2 Basic Financial Q&A 20240425-1 llama3:70b 0.6625 2195 1118
3 Basic Financial Q&A 20240425-1 mixtral:8x22b-instruct-v0.1-q8_0 0.6419 2156 1203
4 Basic Financial Q&A 20240425-1 mixtral:8x22b 0.6051 2033 1327
5 Basic Financial Q&A 20240425-1 wizardlm2:8x22b 0.594 1996 1364
6 Basic Financial Q&A 20240425-1 wizardlm2:8x22b-q8_0 0.5925 1989 1368
7 Basic Financial Q&A 20240425-1 qwen:32b-chat-v1.5-q8_0 0.5649 949 731
8 Basic Financial Q&A 20240425-1 qwen:32b-chat-v1.5-fp16 0.55 924 756
@alvincho
alvincho / Main.hs
Created June 24, 2021 09:59
New Project
{-# LANGUAGE OverloadedStrings #-}
module Escrow where
import Language.Marlowe.Extended
main :: IO ()
main = print . pretty $ contract
-- We can set explicitRefunds True to run Close refund analysis
-- but we get a shorter contract if we set it to False