🌸Подборка NeurIPS:
LLM-статьи 🌸 #nlp #про_nlp #nlp_papers
Вот и прошёл NeurIPS 2024, самая большая конференция по машинному обучению. Ниже — небольшая подборка статей, которые мне показались наиболее интересными. Про некоторые точно стоит сделать отдельный обзор.
Агенты🟣StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
arxiv 🟣SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering arxiv
🟣AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
arxiv 🟣DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
arxiv Бенчмарки 🟣DevBench: A multimodal developmental benchmark for language learning
arxiv 🟣CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
arxiv 🟣LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
arxiv 🟣CLUE - Cross-Linked Unified Embedding for cross-modality representation learning
arxiv 🟣EmoBench: Evaluating the Emotional Intelligence of Large Language Models
arxiv LLM 🟣The PRISM Alignment dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
arxiv 🟣UniGen: A Unified Framework for Textual Dataset Generation via Large Language Models
arxiv🟣A Watermark for Black-Box Language Models
arxiv