Name: Beyond Scaling: Making Large Language Models Efficient
Start: 2026-06-17T13:00:00.000Z
End: 2026-06-17T14:00:00.000Z
Location: Bengaluru, Karnataka

This workshop explores practical research directions for improving transformer efficiency in small and medium-sized language models. As models grow larger, compute cost, memory usage, inference latency, and deployment complexity increase significantly. The session covers architectural experiments, training optimisations, efficient attention mechanisms, memory-efficient techniques, and inference-focused design choices being explored while building open-source LLMs at FrontiersMind.

Speaker: Abhay Kumar, co-founder of FrontiersMind, an AI research lab focused on efficient small and medium-sized language models optimised for enterprise and real-world deployment.

What to expect:

Overview of transformer efficiency challenges
Practical techniques for reducing compute and memory
Insights from building open-source LLMs
Q&A with the speaker

Pre-read: Basics of Transformer architectures, The Illustrated Transformer, Memory-Efficient Attention (MHA vs. MQA vs. GQA vs. MLA), Understanding DeepSeek's Multi-Head Latent Attention.