Estimativa de esforço em story point a partir do texto da user story com aprendizagem de máquina e LLM

2026-06-162026-06-162025-09-16Néo, Giseldo da Silva. Estimativa de esforço em Story Point a partir do texto da User Story com aprendizagem de máquina e LLM / Giseldo da Silva Néo. – 2026. 306 f. : il. color. Tese (doutorado em Ciência da Computação) – Universidade Federal de Campina Grande, Centro de Engenharia Elétrica e Informática, 2025. “Orientação: Prof. Dr. José Antão Beltrão Moura”. Referências. 1. Estimativa de Esforço. 2. Large Language Models. 3. Story point. 4. User story. I. Moura, José Antão Beltrão. II. Título.https://repositorio.ifal.edu.br/handle/123456789/3089Effort estimation in agile software projects remains a persistent challenge in the industry, especially when using textual artifacts such as User Stories to predict Story Points. This thesis investigates the use of Natural Language Processing (NLP) and Machine Learning (ML) techniques in effort prediction, considering the textual description of User Stories as the main source of information. Initially, a systematic literature review identified prevalent techniques for the said estimation, such as Term Frequency – Inverse Document Frequency (TF-IDF) combined with Support Vector Machine (SVM), and highlighted gaps related to the use of readability, sentiment, and subjectivity attributes, as well as the sacant of application of Large-Scale Language Models (LLMs) for this task. The research proposed and evaluated three main approaches: (i) the Neo Legibility Effort Model, which uses attributes automatically extracted from User Story text to predict effort; (ii) the Neo User Story Tutor, an LLM-based application that suggests improvements in User Story writing to improve estimation accuracy; and (iii) the Neo LLM Predictor, which uses LLMs to directly estimate Story Points using different strategies (few-shot, zero-shot, and fine-tuning). To support the experiments a new dataset collected from real projects hosted on GitLab, was built (aka NeoDataset). The proposed models were evaluated using metrics such as MAE and compared with established baselines in the literature. The results demonstrated that both readability attributes and LLMs can significantly contribute to improving effort estimates in agile environments. The thesis presents evidence that it is possible to increase estimate accuracy through the combination of textual analysis and machine learning, in addition to highlighting the relevance of linguistic aspects in the quality of User Stories.pt-BREstimativa de esforçoLarge language modelsStory pointUser storyEstimativa de esforço em story point a partir do texto da user story com aprendizagem de máquina e LLMTeseCIENCIAS EXATAS E DA TERRA: CIENCIA DA COMPUTACAO