Background

Technologies

This project is for the APS360 Applied Fundamentals of Deep Learning course I am currently taking in Summer 2025, with teammates Jason Guo and Jessie Zhu. We wanted to create something innovative and useful for modern society, with some sort of medical field aspect involved. We explored MRI datasets and wanted to approach an idea that we could picture ourselves using; and in the end we decided on this project topic!

PyTorch
NumPy
Streamlit

Project Overview

The aim of this project was to investigate whether machine learning could provide a more objective way of assessing personality. Traditional personality tests like MBTI [1] and even the scientifically recognized NEO-FFI [2] rely on self-reporting, which introduces bias and subjectivity [3]. By contrast, brain imaging offers measurable biological data that could reveal consistent markers of personality traits [4], [5]. We set out to build a deep learning pipeline that takes resting-state fMRI connectivity matrices and predicts the five NEO-FFI traits. While the project was experimental, the long-term vision is that models like this could complement existing personality assessments, opening new possibilities for applications in mental health and neuroscience [6].

A demo video for the project, part of the final presentation of the APS360 course.

Dataset

Our team obtained the data from the Human Connectome Project (HCP), which contains one of the largest open-source neuroimaging databases in the world [7]. Specifically, we utilized the 1,206 Subjects Data Release, which includes resting-state fMRI data and approximately 600 behavioral assessments (including NEO-FFI traits) for each participant. In this project, we focus on predicting participants’ personality traits using the resting-state fMRI data.

Example connectivity matrix for Patient 962058.

Preprocessing

Each subject’s resting-state fMRI data consists of 20,408 nodes, each associated with a time series of 1,200 time points, yielding around 14 minutes of data per session. The matrix would contain around 416 million connections and 208 million unique connections after taking the upper/lower triangle data (excluding the diagonal) if 20,408 nodes were computed directly into a 20,408 × 20,408 connectivity matrix. Therefore, to reduce the file size while preserving meaningful connections, we utilized Yeo’s 17-network topology [8] to partition into 17 functional networks and compute each network’s connectivity matrices to reduce to approximately 15 million unique connections. In order to further reduce the size of our data, we applied Principal Component Analysis (PCA), a standard technique for dimensionality reduction in neuroimaging [9]. Via PCA, which transforms our high-dimensional connectivity matrix into a lower-dimensional representative, we managed to reduce all networks from a [Ni, Ni] matrix to a [Ni,10] matrix, where Ni varies across different networks but stays consistent across subjects.

Dimensionality reduction using PCA on network 1 connectivity matrix.

Data Labels

The target labels of this project are the five NEO-FFI personality traits whose scores range from 0 to 48: A: Agreeableness, O: Open-ness, C: Conscientiousness, N: Neuroticism, E: Extraversion. The labels are sourced from the behavioural assessments in the HCP dataset [2], and the values are mapped to each subject’s unique ID in an open-sourced CSV file. We excluded any subjects who did not complete the personality test to ensure label completeness when defining the PyTorch dataset.

Final output shape.