- Research
- Open access
- Published:
Integrated single-cell and bulk transcriptomic analysis identifies a novel macrophage subtype associated with poor prognosis in breast cancer
Cancer Cell International volume 25, Article number: 119 (2025)
Abstract
Background
Tumor-associated macrophages (TAMs) are pivotal components of the breast cancer (BC) tumor microenvironment (TME), significantly influencing tumor progression and response to therapy. However, the heterogeneity and specific roles of TAM subpopulations in BC remain inadequately understood.
Methods
We performed an integrated analysis of single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing (RNA-seq) data from BC patients to comprehensively characterize TAM heterogeneity. Utilizing the MetaTiME computational framework and consensus clustering, we identified distinct TAM subtypes and assessed their associations with clinical outcomes and treatment responses. A machine learning-based predictive model was developed to evaluate the prognostic significance of TAM-related gene expression profiles.
Results
Our analysis revealed three distinct TAM subgroups. Notably, we identified a novel macrophage subtype, M_Macrophage-SPP1-C1Q, characterized by high expression of SPP1 and C1QA, representing an intermediate differentiation state with unique proliferative and oncogenic properties. High infiltration of M_Macrophage-SPP1-C1Q was significantly associated with poor overall survival (OS) and chemotherapy resistance in BC patients. We developed a Random Forest (RF)-based predictive model, Macro.RF, which accurately stratified patients based on survival outcomes and chemotherapy responses, independent of established prognostic parameters.
Conclusion
This study uncovers a previously unrecognized TAM subtype that drives poor prognosis in BC. The identification of M_Macrophage-SPP1-C1Q enhances our understanding of TAM heterogeneity within the TME and offers a novel prognostic biomarker. The Macro.RF model provides a robust tool for predicting clinical outcomes and guiding personalized treatment strategies in BC patients.
Introduction
Breast cancer (BC) is the most common cancer among women globally, accounting for nearly 30% of all female cancers [1], with high incidence and mortality rates. Despite advances in surgery, chemotherapy, and immunotherapy, clinical outcomes for many BC patients remain suboptimal, even in early-stage disease [2]. This highlights the need for more effective treatment strategies and prognostic markers.
BC is traditionally classified based on the expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), which guides therapeutic decisions. However, this classification does not fully capture the complexity of BC’s heterogeneity. Identifying additional biomarkers is essential for more precise patient stratification and individualized treatment.
The tumor microenvironment (TME), consisting of cancer cells, immune cells, stromal cells, and vasculature, plays a pivotal role in tumor progression and response to therapy. Tumor-infiltrating immune cells (TIICs), particularly tumor-associated macrophages (TAMs), are crucial in regulating tumor growth, metastasis, and therapy responses [3, 4]. While traditional methods like bulk RNA sequencing (RNA-seq) have provided valuable insights [5], they lack the resolution to assess cellular heterogeneity [6]. Single-cell RNA sequencing (scRNA-seq) has overcome this limitation, offering detailed insights into immune cell subpopulations and their molecular characteristics [7]. In BC, the heterogeneity and functional diversity of TAMs at the single-cell level remain poorly understood [8]. Identifying TAM subgroups associated with clinical outcomes and treatment responses could lead to novel prognostic biomarkers and therapeutic targets [9, 10]. Some TAM subpopulations may contribute to treatment resistance and poor prognosis, emphasizing the need for further investigation.
In this study, we combined scRNA-seq and bulk RNA-seq analyses to characterize the macrophage landscape in BC. We identified three distinct TAM subgroups, including a novel subgroup characterized by SPP1 and C1QA expression, designated M_Macrophage-SPP1-C1Q. This subgroup exhibited high proliferative and oncogenic potential, correlating with poor prognosis and chemotherapy resistance. Additionally, we developed a machine learning-based predictive model, Macro.RF, which accurately forecasts clinical outcomes and chemotherapy responses based on TAM-related gene expression profiles. Our findings enhance the understanding of TAM heterogeneity in BC and provide a foundation for precision medicine by identifying potential biomarkers and therapeutic targets within the TME.
Materials and methods
Data sources
We obtained scRNA-seq data of BC samples from the Gene Expression Omnibus (GEO) under the accession number GSE169246 [11]. After excluding non-breast tissue samples, a total of 46,440 single cells from five samples were included in our study, comprising three samples from chemotherapy-resistant patients and two from chemotherapy-sensitive patients (Supplementary Table 1). For bulk RNA-seq data, we downloaded BC gene expression datasets with complete prognostic and clinicopathological annotations from The Cancer Genome Atlas (TCGA) and the Molecular Taxonomy of BC International Consortium (METABRIC). Additionally, we acquired datasets relevant to BC and chemotherapy response from GEO, specifically GSE58812 [12], GSE18728 [13], and GSE14094 [14].
Data processing and quality control
In our study, we performed initial quality control and downstream analyses of scRNA-seq data using the Seurat package in R. Our quality control measures included excluding cells that expressed fewer than 200 genes to eliminate low-quality or empty droplets and removing cells with fewer than 1,000 Unique Molecular Identifiers (UMIs) to ensure a high complexity of transcriptomic data. Additionally, cells with mitochondrial gene expression exceeding 20% of their total gene expression were removed, as high mitochondrial content is often indicative of cell stress or apoptosis. After implementing these stringent filters, we retained a total of 46,440 high-quality cells for further analysis. We then log-transformed the expression matrix of TME cells and processed it through the MetaTiME annotator tool (https://github.com/yi-zhang/MetaTiME). This tool provided each cell with a MeC signature score and identified the enriched MeC status for each cluster, using the top-enriched MeC for each to facilitate uniform manifold approximation and projection (UMAP) visualization [15].
Differential MeC feature analysis
Following the annotation with MetaTiME, we conducted differential feature analysis on the scRNA-seq data to explore changes in MeC characteristics under various conditions, such as pre- versus post-treatment and between responders versus non-responders. We named each cluster based on its top-enriched MeC and compared the feature strengths of all enriched MeCs within each cluster across the different conditions. For statistical comparisons, we utilized two-sample t-tests for data that followed a normal distribution and Wilcoxon rank-sum tests for data that did not. We calculated the effect sizes by determining the mean differences in MeC scores between conditions to quantify the impact of treatments or response status on cellular characteristics.
Ligand-receptor communication analysis among TIICs
To elucidate the cell–cell communication networks among TIICs, we utilized the CellChat package (version 1.1.3) [16]. CellChat is equipped with a curated database of 1,939 ligand-receptor pairs, which enabled us to analyze the number and strength of interactions between different BC groups and to compare the overall signal flow across various signaling pathways. For this analysis, we established parameters including a minimum cell count of ten per group to ensure robustness in our intercellular communication analysis. We also set a significance threshold where ligand-receptor pairs with a p-value less than 0.05 were considered significant. Using these criteria, we identified significant pathways and corresponding ligand-receptor pairs that are related to BC prognosis. Furthermore, we visualized the average expression levels across groups and samples using the ComplexHeatmap package (version 2.6.2), which helped in depicting the complex interactions and signaling dynamics within the TME.
Single-cell trajectory analysis
We employed Monocle 2 (version 2.8.0) [17] and ClusterGVis (version 0.1.0) to assess the stemness and pseudotime trajectories of macrophage subgroups within the scRNA-seq data. Using Monocle 2, we prepared the data by creating a CellDataSet object with the expression family set to negative binomial. We then proceeded with dimension reduction using the default settings and constructed trajectories that included genes with an average expression level greater than 1. This approach allowed us to perform differential gene testing across pseudotime, identifying dynamically differentially expressed genes (DEGs) with a significance threshold of q < 0.001. For assessing cell stemness, we utilized CytoTRACE, a tool that calculates a stemness score for each cell based on gene expression profiles correlated with gene counts in scRNA-seq data. The scores range from 0 to 1, where higher scores indicate greater stemness, suggesting less differentiation.
Functional enrichment analysis
To explore the biological functions of TAM subgroups within our scRNA-seq data, we conducted a comprehensive functional enrichment analysis. Initially, we identified cluster-specific DEGs that showed an absolute average fold change greater than 1.5 and an adjusted p-value less than 0.05. We specifically excluded mitochondrial and ribosomal genes to avoid confounding effects that might skew the biological interpretation. Following DEG selection, we performed pathway analysis using the clusterProfiler package (version 4.1.4) [18].This analysis helped us to identify enriched biological pathways based on the established DEGs, providing insights into the functional roles of different TAM subgroups in the TME. Additionally, we utilized both ssGSEA and integrative rank-based GSEA (irGSEA) to calculate enrichment scores for specific gene sets or pathway activities.
Consensus clustering
To identify molecular subtypes based on TAM characteristics, we performed consensus clustering using the ConsensusClusterPlus package [19]. Our approach involved employing K-means clustering paired with the Pearson correlation coefficient as a similarity measure. We chose the Euclidean distance metric to assess dissimilarity among data points in our analysis. The parameters for our clustering were carefully selected to optimize the robustness and reliability of our subtype classifications. Specifically, we evaluated a range of possible cluster numbers from 2 to 8 to determine the optimal subdivision of data. In addition, we set the resampling parameters with pItem at 0.8 and pFeature at 1, ensuring comprehensive data coverage during cluster computation. To ensure stability and reproducibility of our results, we performed 1,000 iterations of clustering. The final clustering algorithm used was the ‘Partitioning around medoids’ (Pam) method, which is well-suited for identifying robust clusters in molecular data.
Model development and validation
To develop a predictive model for patient survival benefit, we employed seven different machine learning algorithms: Naive Bayes (NB), Random Forest (RF), K-Nearest Neighbors (KNN), Logistic Boosting (LogiBoost), Extreme Gradient Boosting (XGBoost), Elastic Net, and Lasso Regression. Each model was trained using the training dataset, with parameters optimized through a 10-repeated fivefold cross-validation to prevent overfitting. After training, the performance of each model was assessed using both the training and validation sets. The model with the highest predictive accuracy, the RF algorithm, was selected as the final Macro.RF model. The predictive value of the Macro.RF model was evaluated using Receiver Operating Characteristic (ROC) analysis conducted with the timeROC R package. Additionally, survival analyses were performed to compare outcomes between different patient groups using the survival R package (version 3.2.13), and the results were illustrated with Kaplan–Meier survival curves.
Statistical analysis
All statistical analyses were performed using R software (version 4.2.1). Continuous variables between two groups were compared using two-tailed Wilcoxon rank-sum tests to assess statistical differences. For categorical variables, the chi-squared (χ2) test was employed to evaluate differences between groups. Univariate and multivariate Cox proportional hazards regression analyses were conducted to identify factors associated with survival outcomes. The results of these Cox analyses were visualized using forest plots generated with the forestplot R package (version 1.10.1). A p-value less than 0.05 was considered statistically significant for all statistical tests.
Results
TAMs as major contributors to clinical outcomes in BC
To identify tumor infiltrating lymphocyte (TIL) subsets associated with anti-tumor immunity in BC, we reanalyzed scRNA-seq datasets from BC samples [11] using the recently developed data-driven framework MetaTiME [15]. This method addresses limitations in resolution and consistency inherent in manual annotation by known gene markers, allowing for a comprehensive and continuous characterization of cell states.
Initially, we categorized the single cells into three primary immune cell clusters based on classical marker genes: myeloid cells (marked by CSF1R and AIF1), T cells and natural killer (NK) cells (marked by CD3D, PTPRC, and NKG7), and B cells and plasma cells (marked by CD79A) (Fig. 1A and B; Supplementary Fig. 1).
TAM as a Major Contributor to BC Clinical Outcomes. A UMAP visualization depicts three primary immune cell clusters from BC (n = 5) tissues, including T cells & NK cells, B cells & plasma cells, and myeloid cells. B Expression of classical marker genes used to identify the main clusters (myeloid cells: CSF1R, AIF1; T & NK cells: CD3D, PTPRC, NKG7; B & plasma cells: CD79A). C MetaTiME cell state annotations based on top-enriched MeC in BC cell clusters. D Differential enrichment of cell states in BC pre- and post-treatment, evaluated using MeC feature testing. X-axis: difference in average feature scores between pre- and post-chemotherapy conditions. Y-axis: -log(p-value) from two-sided t-tests. Significant cluster difference signatures are labeled as “EnrichedMeC@ClusterName”; when the enriched MeC coincides with the cluster name, the signature is marked as “ClusterName”. Red dots represent signatures with increased difference; dot size is proportional to the average feature score of cells in the cluster under post-immunotherapy conditions. Blue dots indicate suppressed features; dot size is proportional to the average feature score of cells in the cluster under pre-chemotherapy conditions. E Differential enrichment of cell states between BC treatment responders and non-responders, assessed using MeC feature testing
Using MetaTiME, we further annotated these three primary immune cell clusters, subdividing them into 37 distinct immune cell subclasses (Fig. 1C; Supplementary Table 2). Given the variability in cell type proportions and the small number of patients in each cohort, we focused on changes in MeC characteristics rather than absolute cell count proportions to better understand immune responses following tumor treatment.
We analyzed samples from the same BC cohort collected both before and after treatment (Fig. 1D; Supplementary Table 3) and compared treatment responders to non-responders (Fig. 1E; Supplementary Table 4). Notably, TAMs exhibited the most pronounced dynamic changes in MeC characteristics, irrespective of treatment duration or response status.
These dynamic changes were primarily concentrated in a specific TAM subgroup characterized by high expression of SPP1 and C1QA, which we designated as M_Macrophage-SPP1-C1Q. Quantitative comparisons reinforced this observation, showing consistent changes in this subgroup across different conditions (Supplementary Figs. 2A and 2B).
In summary, our findings highlight the significant functional complexity and dynamic nature of TAMs in the BC TME. The pronounced changes observed in the M_Macrophage-SPP1-C1Q subgroup suggest that it may play a crucial role in influencing prognosis and treatment efficacy in BC patients.
Functional characteristics of TAM subgroups in BC
To investigate the functional heterogeneity of TAM subgroups in BC, we analyzed DEGs within each cluster (Supplementary Fig. 2E, Supplementary Table 5). Enrichment analysis of classical M1 and M2 macrophage gene signatures revealed distinct patterns among the three TAM clusters. The M_Macrophage-IL1-JUN cluster exhibited high enrichment for M1-associated genes, indicative of a pro-inflammatory phenotype, while the M_Macrophage-RNASE1-C1Q cluster showed higher enrichment for M2-associated genes, reflecting an anti-inflammatory or tissue-repair phenotype. By contrast, the M_Macrophage-SPP1-C1Q cluster displayed low enrichment for both M1 and M2 signatures, suggesting it does not conform to the traditional M1/M2 classification (Fig. 2A). Further GSEA demonstrated that M_Macrophage-SPP1-C1Q was enriched in cell proliferation pathways such as MTORC1 signaling and MYC targets, as well as oncogenic pathways including KRAS signaling, P53 pathway, and epithelial-mesenchymal transition (EMT) (Fig. 2B). These findings highlight the distinct proliferative and oncogenic properties of this TAM subgroup.
Functional Characteristics of TAM Subgroups in BC. A Scatter plot displaying the enrichment scores for M1- and M2-macrophage gene signatures across three TAM subsets. B GSEA heatmap illustrating the enrichment of specific gene sets in each TAM subgroup. Blue cells indicate no statistically significant enrichment, while red cells indicate statistically significant enrichment (P < 0.05). The number of asterisks within a cell denotes the level of statistical significance, with more asterisks corresponding to smaller P-values. The dendrogram on the left clusters gene sets based on their expression patterns across TAM subgroups, while the bar at the top indicates whether the enriched gene sets are upregulated or downregulated in each subgroup. C Density heatmap illustrating tumor-related functional activities enriched in different TAM subgroups. The color gradient represents the density of enrichment, with red indicating higher density and blue indicating lower density. The dashed lines represent statistical distribution summaries: the mean (solid black line), the 25th percentile, the 50th percentile (median), and the 75th percentile of the data for each subgroup. D Box plots showing stemness scores of different TAM subgroups. E Potential polarization trajectories of different macrophage subgroups inferred by Monocle 2. F Visualization of functional characteristics and gene changes in TAM subgroups among responders and non-responders using the ClusterGVis package
To delve deeper into the unique features of M_Macrophage-SPP1-C1Q, we examined the enrichment of specific functional pathways within this cluster. It showed high enrichment in cell proliferation (MTORC1 signaling, MYC targets V1), inflammatory responses, complement signaling pathways, oxidative phosphorylation, and interferon response (Fig. 2C). These results further support the notion that M_Macrophage-SPP1-C1Q has a unique functional profile distinct from traditional TAM subsets.
We next explored the unique features of M_Macrophage-SPP1-C1Q by analyzing its functional pathway enrichment. This cluster showed high enrichment in cell proliferation (MTORC1 signaling, MYC targets V1), inflammatory responses, complement signaling, oxidative phosphorylation, and interferon response pathways (Fig. 2C), reinforcing its distinct functional profile. Further, stemness analysis using cytoTRACE revealed that M_Macrophage-SPP1-C1Q exhibited the highest developmental potential among the three TAM clusters, while M_Macrophage-RNASE1-C1Q exhibited the lowest (Fig. 2D; Supplementary Fig. 2F). Differentiation trajectory analysis using Monocle suggested that M_Macrophage-SPP1-C1Q serves as an intermediate transitional state in TAM polarization, while M_Macrophage-RNASE1-C1Q and M_Macrophage-IL1-JUN occupied opposite ends of the trajectory, representing more differentiated states (Fig. 2E). This result suggests that M_Macrophage-SPP1-C1Q may serve as a key transitional state in TAM dynamics.
In the context of treatment response, M_Macrophage-SPP1-C1Q was enriched in genes and pathways associated with oncogenesis and an inhibitory TME, such as TGFB1, SPP1 [20], FABP5 [21], SLC11A1 [22], and VCAN [23], in treatment responders (Supplementary Fig. 2E). Meanwhile, M_Macrophage-RNASE1-C1Q was linked to immunoglobulin-mediated immune responses and B cell-mediated immunity, promoting an anti-tumor TME [24, 25], while M_Macrophage-IL1-JUN was associated with apoptosis-related functions. In non-responders, M_Macrophage-SPP1-C1Q was enriched in pathways related to neutrophil chemotaxis and migration, which have been implicated in chemotherapy resistance through TGF-β activation [26]. Conversely, M_Macrophage-RNASE1-C1Q was enriched in complement activation pathways [27], known for promoting immunosuppression and tumor progression, while M_Macrophage-IL1-JUN was enriched in cell development and growth-related pathways (Fig. 2F).
These results demonstrate that the functional characteristics and differentiation potential of TAM subgroups, particularly the unique properties of M_Macrophage-SPP1-C1Q, significantly influence the TME and treatment outcomes in BC.
High infiltration of M_macrophage-SPP1-C1Q is associated with poor prognosis in BC
TAMs play a pivotal role in tumor progression and response to therapy. To investigate the prognostic significance of TAM subgroups in BC, we employed CellChat analysis to examine potential ligand-receptor interactions among the main TIIC subgroups in treatment responders and non-responders. In the responder group, M_Macrophage-RNASE1-C1Q and M_Macrophage-SPP1-C1Q demonstrated strong interactions with other immune cells, particularly as predominant ligand-expressing cell types (Fig. 3A). This suggests that these TAM subgroups may actively communicate with immune cells in the TME, influencing anti-tumor immune responses [28]. Interestingly, M_Macrophage-SPP1-C1Q showed a higher signal-sending propensity in both responders and non-responders, with a more pronounced effect in non-responders (Fig. 3B), indicating its potential role in modulating the TME, particularly under conditions of treatment resistance.
High Infiltration of M_Macrophage-SPP1-C1Q in BC Correlates with Poor Prognosis. A Heatmap displaying the relative expression levels of ligand-receptor pairs among all immune cells in treatment responders compared to non-responders. B Key ligand-receptor pairs between different treatment response groups. Dot size represents the probability of communication for specific ligand-receptor pairs between sender and receiver clusters. C Heatmap showing Spearman correlation coefficients between proportions of different cell subsets. D Forest plot illustrating univariate Cox regression analysis of OS for each TAM subset in the GSE58812 cohort. E Infiltration abundance of M_Macrophage-SPP1-C1Q in chemotherapy-sensitive and chemotherapy-resistant patients. F Infiltration abundance of M_Macrophage-SPP1-C1Q in patients with and without metastasis. G, H Kaplan–Meier survival curves for MFS (left) and OS (right) among patients with high and low M_Macrophage-SPP1-C1Q infiltration in the GSE58812 and METABRIC cohorts
Further exploration of TAM interactions revealed that M_Macrophage-IL1-JUN exhibited strong correlations with regulatory T cells (Tregs) and naive T cells, suggesting its involvement in creating an immunosuppressive TME (Fig. 3C). Conversely, M_Macrophage-SPP1-C1Q displayed significant correlations with broader functional gene sets, including Pan-Metallothionein and Pan-Proliferation, as well as with dendritic cell subsets DC_pDC and DC_cDC2-MHCII, reflecting its distinct immunological role in the TME. These findings highlight the heterogeneity and complex immunological functions of TAM subgroups in BC.
To assess the clinical relevance of M_Macrophage-SPP1-C1Q, we evaluated its infiltration levels in a cohort of 107 BC patients using ssGSEA and scRNA-seq-based overexpression signatures (Fig. 3D; Supplementary Table 7). Notably, M_Macrophage-SPP1-C1Q infiltration was significantly higher in chemotherapy-resistant patients compared to chemotherapy-sensitive patients (Fig. 3E). Similarly, patients with metastatic tumors exhibited higher infiltration levels of M_Macrophage-SPP1-C1Q than those with primary tumors (Fig. 3F), suggesting a strong association between M_Macrophage-SPP1-C1Q and treatment resistance. Furthermore, univariate Cox regression analysis revealed that M_Macrophage-SPP1-C1Q infiltration was significantly associated with overall survival (OS), while survival analyses demonstrated that patients with higher infiltration levels of this subgroup had significantly reduced OS and shorter metastasis-free survival (MFS) (Fig. 3G). These findings were validated in the METABRIC dataset, which confirmed the association between high M_Macrophage-SPP1-C1Q infiltration and poor survival outcomes in an independent BC cohort.
These results suggest that high infiltration of M_Macrophage-SPP1-C1Q is strongly associated with treatment resistance, reduced survival, and poorer clinical outcomes in BC.
M_macrophage-SPP1-C1Q identifies clinically distinct subgroups of BC patients
The inherent heterogeneity of BC necessitates effective stratification of patients into clinically meaningful subgroups. To explore the potential of M_Macrophage-SPP1-C1Q as a marker for such stratification, we identified 37 genes uniquely associated with this TAM subtype, referred to as M_Macrophage-SPP1-C1Q-specific core metagenes (Supplementary Tables 5 and 6, Fig. 4A). These genes, representing the distinctive transcriptional signature of the subtype, were used for consensus clustering analysis in the METABRIC BC cohort. This analysis classified patients into two distinct molecular subtypes, termed LM (Low Metagene expression) and HM (High Metagene expression) (Fig. 4B). The robustness of these clusters was validated using t-distributed stochastic neighbor embedding (t-SNE) and spectral pattern clustering, which confirmed consistent subgrouping (Supplementary Fig. 3A).
Identification of Potential Macrophage Subtypes in BC. A Venn diagram of core metagenes for M_Macrophage-SPP1-C1Q. B Consensus clustering in the METABRIC cohort based on 37 core metagenes. C Box plot illustrating the varying estimated proportions of M_Macrophage-SPP1-C1Q across different subtypes. D Heatmap of core metagenes of M_Macrophage-SPP1-C1Q and clinical features. E Alluvial diagram depicting the interrelations among BC patient subtypes, survival status, and Claudin subtype. F, G Kaplan–Meier analysis of OS (F) and relapse-free period (G) for BC patients belonging to two different molecular clusters. H Univariate and multivariate analyses of the association between clinical-pathological features and macrophage subtypes in the METABRIC cohort
Patients in the HM cluster exhibited significantly higher expression of the 37 core metagenes compared to those in the LM cluster (Fig. 4D). Correspondingly, ssGSEA analysis demonstrated a higher infiltration abundance of M_Macrophage-SPP1-C1Q in the HM cluster (Fig. 4C), indicating that this subgroup is enriched with the M_Macrophage-SPP1-C1Q subtype within the TME. Survival analysis revealed that patients in the HM cluster had significantly poorer outcomes, with reduced OS and shorter relapse-free periods compared to patients in the LM cluster (Fig. 4F and G; Supplementary Fig. 3B). This suggests that high expression of M_Macrophage-SPP1-C1Q-specific core metagenes is closely associated with adverse clinical outcomes.
To further evaluate the clinical relevance of this classification, we examined its relationship with the Claudin Subtype, a recognized molecular classification in BC. Both LM and HM clusters included patients from the Claudin Subtype, with significant survival differences observed within these subgroups (Fig. 4E). Additionally, univariate and multivariate Cox regression analyses demonstrated that the HM cluster classification remained significantly associated with OS after adjusting for confounding factors such as age, tumor size, grade, and hormone receptor status (Fig. 4H). These findings underscore the prognostic value of the M_Macrophage-SPP1-C1Q-based subtype classification as an independent predictor of patient outcomes.
These results demonstrate that molecular subtyping of BC patients based on M_Macrophage-SPP1-C1Q-specific core metagenes effectively stratifies patients into clinically distinct subgroups. The identification of the HM cluster, characterized by high infiltration of M_Macrophage-SPP1-C1Q and associated with poorer prognoses.
Development of an M_macrophage-SPP1-C1Q derived subtype system for predicting patient survival benefit
To evaluate the clinical utility of the M_Macrophage-SPP1-C1Q molecular subtype in predicting patient outcomes, we compiled bulk RNA-seq data and clinical information from multiple BC cohorts. These datasets were divided into a training set (n = 529), a test set (n = 228), an internal validation set (n = 325), and an external validation set (n = 1,980). Using the expression profiles of the 37 M_Macrophage-SPP1-C1Q-specific core metagenes, we developed predictive models based on seven machine learning algorithms, including NB, RF, KNN, LogiBoost, XGBoost, Elastic Net, and Lasso Regression (Fig. 5A and B). After a 10-repeated fivefold cross-validation to prevent overfitting, the RF model demonstrated the highest predictive accuracy with an AUC of 0.89 (Fig. 5C). This model, named the Macro.RF model, was selected as the final predictive tool.
Development of the Macro.RF Derived Subtype System for Prognosis and Treatment Efficacy Prediction in BC. A Flowchart illustrating the construction of the Macro.RF model using machine learning processes. B Comparison of multiple ROC curves representing the performance of different machine learning algorithms in the validation set. C Schematic representation of molecular subtypes based on M_Macrophage-SPP1-C1Q. D ROC curve depicting the performance of the final Macro.RF model in validation and test cohorts. E Kaplan–Meier curves comparing OS between patients in the LM (good prognosis) and HM (poor prognosis) groups in the validation and test sets. F, G Clinical response rates to chemotherapy in patients stratified into LM (good response) and HM (poor response) groups across different cohorts
To further validate the Macro.RF model’s robustness and generalizability, we tested it on an independent dataset. The model consistently exhibited high predictive performance, with an AUC comparable to the training cohort (Fig. 5D). Survival analyses revealed that patients classified as belonging to the M_Macrophage-SPP1-C1Q subtype had significantly shorter OS compared to those in the non-M_Macrophage-SPP1-C1Q group (all p < 0.001, Fig. 5E), underscoring its ability to stratify patients based on survival risk. These findings highlight the model’s potential to predict clinical outcomes accurately.
We also investigated the utility of the Macro.RF model in predicting chemotherapy responses, given the critical role of chemotherapy in BC treatment. Applying the model to two independent chemotherapy cohorts (GSE14094 and GSE18728), we categorized patients into ‘good’ and ‘poor’ prognostic groups. In the GSE14094 cohort, the remission rate in the ‘good’ group was 40.1%, significantly higher than the 33% observed in the ‘poor’ group (Fig. 5F). Similarly, in the GSE18728 cohort, remission rates were 35.2% in the ‘good’ group versus 20.6% in the ‘poor’ group (Fig. 5G). These results demonstrate that the Macro.RF model effectively predicts chemotherapy response, providing valuable clinical insights for patient management.
In summary, the analysis confirmed that the Macro.RF model, based on the M_Macrophage-SPP1-C1Q core metagenes, effectively stratifies BC patients, reflecting its robust predictive capability in clinical settings.
Discussion
In this study, we employed a targeted, clinically-driven scRNA-seq analysis to comprehensively characterize the transcriptional landscape of TAMs during BC treatment. Our findings reveal the existence of three distinct TAM subgroups, each with unique transcriptional programs and functional states. Notably, we identified a novel TAM subgroup, M_Macrophage-SPP1-C1Q, which does not conform to the traditional M1/M2 macrophage classification. This subgroup represents an intermediate TAM state characterized by high expression of SPP1 and C1QA, exhibiting proliferative signal transduction and oncogenic properties. Furthermore, we developed a machine learning-based model, Macro.RF, which effectively predicts clinical outcomes and chemotherapy responses in BC patients, independent of established prognostic parameters.
The interplay between tumor cells and host immune components within the TME is crucial in cancer initiation and progression. While early research focused predominantly on T cells as key mediators of anti-tumor immunity [29], recent studies have underscored the significant roles of other immune cells, including B cells, dendritic cells, and macrophages [30]. TAMs, in particular, have been recognized as pivotal players in the TME, influencing tumor growth, metastasis, and response to therapy. However, previous studies on TAMs have yielded variable and sometimes contradictory results regarding their prognostic value [31, 32], highlighting the need for a more nuanced understanding of TAM heterogeneity [33, 34].
Our use of the MetaTiME computational framework [15] allowed us to move beyond the conventional M1/M2 macrophage dichotomy by revealing the heterogeneity and plasticity of TAMs based on their metabolic preferences and transcriptional profiles. This approach enabled a more precise characterization of TAM subgroups and their roles in tumor biology. Specifically, the M_Macrophage-SPP1-C1Q subgroup emerged as a distinct entity with unique functional attributes.
The M_Macrophage-SPP1-C1Q subgroup was found to exhibit high developmental potential and was positioned as an intermediate state in the TAM differentiation trajectory, as revealed by Monocle analysis. This suggests that M_Macrophage-SPP1-C1Q may serve as a transitional state capable of influencing the polarization of other TAM subtypes. The high expression of SPP1 (osteopontin) in this subgroup is particularly noteworthy. SPP1 has been reported to be highly expressed in various cancers and is associated with poor prognosis [35]. Unlike traditional M1/M2 macrophage markers, SPP1-defined macrophage polarization has been significantly correlated with adverse outcomes [36, 37].
Our scRNA-seq analysis demonstrated that M_Macrophage-SPP1-C1Q possesses biological functions and cellular interactions markedly different from other TAM subtypes. This subgroup was enriched in pathways related to hypoxia, epithelial-mesenchymal transition (EMT), and metabolic processes—all indicative of pro-tumorigenic characteristics. Previous studies have shown that SPP1+ TAMs are closely associated with hypoxic conditions within tumors, promoting EMT and contributing to aggressive cancer phenotypes [38, 39]. The presence of M_Macrophage-SPP1-C1Q may thus facilitate tumor progression and metastasis through these mechanisms.
Importantly, our study systematically investigated the association between TAM functional states and clinical outcomes using large-scale bulk RNA-seq and scRNA-seq datasets. We identified M_Macrophage-SPP1-C1Q as a distinct TAM subgroup whose high infiltration was consistently associated with poor survival outcomes and chemotherapy resistance in BC patients. This association was validated across multiple independent cohorts, reinforcing the robustness of our findings. Notably, the prognostic significance of M_Macrophage-SPP1-C1Q infiltration remained independent of other clinical variables and gene expression measurement methodologies, suggesting its potential as a reliable prognostic biomarker.
To translate these insights into a clinically applicable tool, we developed the Macro.RF model—a random forest-based predictive algorithm integrating the transcriptional characteristics of M_Macrophage-SPP1-C1Q. The model demonstrated moderate to high predictive accuracy for OS and chemotherapy response across multiple datasets. By stratifying patients into ‘good’ and ‘poor’ prognostic groups, it offers a potential framework for risk assessment and personalized treatment planning. However, further validation is required before clinical implementation.
Despite these promising findings, several limitations must be acknowledged. First, our analyses relied on publicly available datasets, which may introduce selection bias and heterogeneity in patient demographics, sequencing platforms, and treatment regimens, potentially affecting generalizability. Second, while the Macro.RF model showed encouraging performance, its predictive accuracy depends on transcriptomic data quality and lacks prospective clinical validation. Third, our approach focused primarily on TAM-related gene expression, without incorporating other key prognostic factors such as tumor mutational burden or immune checkpoint expression. Lastly, the model's applicability to immunotherapy response remains untested. Future studies, including prospective validation and functional investigations, are needed to establish its clinical relevance.
Conclusion
In conclusion, our integrative analysis provides insight into the heterogeneity of TAMs in BC, particularly emphasizing the role of the M_Macrophage-SPP1-C1Q subgroup in influencing tumor progression and treatment resistance. This TAM subtype has potential as a prognostic biomarker, and the development of the Macro.RF model could enhance patient stratification and therapeutic decision-making. Further research is needed to elucidate the specific molecular mechanisms by which the M_Macrophage-SPP1-C1Q subgroup impacts tumor biology, which may lead to the discovery of new therapeutic targets and improve clinical outcomes for BC patients.
Availability of data and materials
No datasets were generated or analysed during the current study.
Abbreviations
- BC:
-
Breast cancer
- scRNA-seq:
-
Single-cell RNA sequencing
- TME:
-
Tumor microenvironment
- TAM:
-
Tumor-associated macrophages
- ER:
-
Estrogen receptor
- PR:
-
Progesterone receptor
- HER2:
-
Human epidermal growth factor receptor 2
- DEGs:
-
Differentially expressed genes
References
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7.
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7.
Salgado R, Denkert C, Demaria S, Sirtaine N, Klauschen F, Pruneri G, et al. The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014. Ann Oncol. 2015;26(2):259–71.
Lundgren C, Bendahl P-O, Ekholm M, Fernö M, Forsare C, Krüger U, et al. Tumour-infiltrating lymphocytes as a prognostic and tamoxifen predictive marker in premenopausal breast cancer: data from a randomised trial with long-term follow-up. Breast Cancer Res. 2020;22(1):140.
Tan Z, Chen X, Zuo J, Fu S, Wang H, Wang J. Comprehensive analysis of scRNA-Seq and bulk RNA-Seq reveals dynamic changes in the tumor immune microenvironment of bladder cancer and establishes a prognostic model. J Transl Med. 2023;21(1):223.
Pagès F, Mlecnik B, Marliot F, Bindea G, Ou F-S, Bifulco C, et al. International validation of the consensus Immunoscore for the classification of colon cancer: a prognostic and accuracy study. Lancet. 2018;391(10135):2128–39.
Zhang L, Li Z, Skrzypczynska KM, Fang Q, Zhang W, O’Brien SA, et al. Single-cell analyses inform mechanisms of myeloid-targeted therapies in colon cancer. Cell. 2020;181(2):442.
Ma P, Amemiya HM, He LL, Gandhi SJ, Nicol R, Bhattacharyya RP, et al. Bacterial droplet-based single-cell RNA-seq reveals antibiotic-associated heterogeneous cellular states. Cell. 2023;186(4):877.
Yamazaki CM, Yamaguchi A, Anami Y, Xiong W, Otani Y, Lee J, et al. Antibody-drug conjugates with dual payloads for combating breast tumor heterogeneity and drug resistance. Nat Commun. 2021;12(1):3528.
Luo H, Xia X, Huang L-B, An H, Cao M, Kim GD, et al. Pan-cancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat Commun. 2022;13(1):6619.
Zhang Y, Chen H, Mo H, Hu X, Gao R, Zhao Y, et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell. 2021;39(12):1578.
Jézéquel P, Loussouarn D, Guérin-Charbonnel C, Campion L, Vanier A, Gouraud W, et al. Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 2015;17:43.
Korde LA, Lusa L, McShane L, Lebowitz PF, Lukes L, Camphausen K, et al. Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer. Breast Cancer Res Treat. 2010;119(3):685–99.
Caimari A, Oliver P, Rodenburg W, Keijer J, Palou A. Slc27a2 expression in peripheral blood mononuclear cells as a molecular marker for overweight development. Int J Obes (Lond). 2010;34(5):831–9.
Zhang Y, Xiang G, Jiang AY, Lynch A, Zeng Z, Wang C, et al. MetaTiME integrates single-cell gene expression to characterize the meta-components of the tumor immune microenvironment. Nat Commun. 2023;14(1):2634.
Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using Cell Chat. Nat Commun. 2021;12(1):1088.
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381–6.
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb). 2021;2(3):100141.
Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–3.
Fabre T, Barron AMS, Christensen SM, Asano S, Bound K, Lech MP, et al. Identification of a broadly fibrogenic macrophage subset induced by type 3 inflammation. Sci Immunol. 2023;8(82):eadd8945.
Zhang C, Liao Y, Liu P, Du Q, Liang Y, Ooi S, et al. FABP5 promotes lymph node metastasis in cervical cancer by reprogramming fatty acid metabolism. Theranostics. 2020;10(15):6561–80.
Ma Y, Zhan L, Yang J, Zhang J. SLC11A1 associated with tumor microenvironment is a potential biomarker of prognosis and immunotherapy efficacy for colorectal cancer. Front Pharmacol. 2022;13:984555.
Li W, Han F, Fu M, Wang Z. High expression of VCAN is an independent predictor of poor prognosis in gastric cancer. J Int Med Res. 2020;48(1):300060519891271.
Yang P, Qin H, Li Y, Xiao A, Zheng E, Zeng H, et al. CD36-mediated metabolic crosstalk between tumor cells and macrophages affects liver metastasis. Nat Commun. 2022;13(1):5782.
Li Z, Sun C, Qin Z. Metabolic reprogramming of cancer-associated fibroblasts and its effect on cancer cell reprogramming. Theranostics. 2021;11(17):8322–36.
Mousset A, Lecorgne E, Bourget I, Lopez P, Jenovai K, Cherfils-Vicini J, et al. Neutrophil extracellular traps formed during chemotherapy confer treatment resistance via TGF-β activation. Cancer Cell. 2023;41(4):757.
Magrini E, Minute L, Dambra M, Garlanda C. Complement activation in cancer: Effects on tumor-associated myeloid cells and immunosuppression. Semin Immunol. 2022;60:101642.
Chen Y, Li R, Duan Q, Wu L, Li X, Luo A, et al. A DNA-modularized STING agonist with macrophage-selectivity and programmability for enhanced anti-tumor immunotherapy. Adv Sci (Weinh). 2024;11(32):e2400149.
Guo X, Zhang Y, Zheng L, Zheng C, Song J, Zhang Q, et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med. 2018;24(7):978–85.
de Visser KE, Joyce JA. The evolving tumor microenvironment: from cancer initiation to metastatic outgrowth. Cancer Cell. 2023;41(3):374–403.
Mao X, Xu J, Wang W, Liang C, Hua J, Liu J, et al. Crosstalk between cancer-associated fibroblasts and immune cells in the tumor microenvironment: new findings and future perspectives. Mol Cancer. 2021;20(1):131.
Cheng K, Cai N, Zhu J, Yang X, Liang H, Zhang W. Tumor-associated macrophages in liver cancer: from mechanisms to therapy. Cancer Commun (Lond). 2022;42(11):1112–40.
Mosser DM, Edwards JP. Exploring the full spectrum of macrophage activation. Nat Rev Immunol. 2008;8(12):958–69.
Murray PJ, Wynn TA. Protective and pathogenic functions of macrophage subsets. Nat Rev Immunol. 2011;11(11):723–37.
Hu Z, Jin X, Hong W, Sui Q, Zhao M, Huang Y, et al. Dissecting the single-cell transcriptome network of macrophage and identifies a signature to predict prognosis in lung adenocarcinoma. Cell Oncol (Dordr). 2023;46(5):1351–68.
Feng Y, Wang S, Xie J, Ding B, Wang M, Zhang P, et al. Spatial transcriptomics reveals heterogeneity of macrophages in the tumor microenvironment of granulomatous slack skin. J Pathol. 2023;261(1):105–19.
Bill R, Wirapati P, Messemaker M, Roh W, Zitti B, Duval F, et al. CXCL9:SPP1 macrophage polarity identifies a network of cellular programs that control human cancers. Science. 2023;381(6657):515–24.
Wei J, Chen Z, Hu M, He Z, Jiang D, Long J, et al. Characterizing intercellular communication of pan-cancer reveals SPP1+ tumor-associated macrophage expanded in hypoxia and promoting cancer malignancy through single-cell RNA-Seq data. Front Cell Dev Biol. 2021;9:749210.
Bai R, Li Y, Jian L, Yang Y, Zhao L, Wei M. The hypoxia-driven crosstalk between tumor and tumor-associated macrophages: mechanisms and clinical treatment strategies. Mol Cancer. 2022;21(1):177.
Acknowledgements
We would like to thank the staff members of the TCGA and METABRIC Research Network, as well as all the authors for making their valuable research data public.
Funding
This research was supported by research Projects of Ningde Normal University (No.: 2024ZX74), Fujian Provincial Natural Science Foundation (No.: 2024J01935) and Natural Science Foundation of ningde, China (No.: 2023J14).
Author information
Authors and Affiliations
Contributions
ZRJ and JL supported and designed the study. QW and YYS collected, processed, and analyzed the datasets. LQR, MYH, XMS and WC verified the data. QW and ZRJ wrote the manuscript. All authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
12935_2025_3750_MOESM1_ESM.tif
Supplementary material 1: Figure 1. (A–C) UMAP visualization of treatment efficacy, treatment duration, and patients from BC tissues (n=5). (D) Bar graph showing the proportion of three immune cell types across five BC patients.
12935_2025_3750_MOESM2_ESM.tif
Supplementary material 2: Figure 2. (A–B) Bar graphs visualizing the quantity of TAMs in BC before and after chemotherapy (A) and between different responses (B). (C, D) Kaplan-Meier survival curves for OS (left) and MFS (right) among patients with high and low infiltration of M_Macrophage-IL1-JUN (C) and M_Macrophage-RNASE1-C1Q (D) in the GSE58812 cohort.
12935_2025_3750_MOESM3_ESM.tif
Supplementary material 3: Figure 3. (A) t-SNE analysis of two subtypes. (B) Kaplan-Meier curves comparing OS between patients stratified into the “good prognosis” group (LM subtype) and “poor prognosis” group (HM subtype) in an external test set.
12935_2025_3750_MOESM6_ESM.csv
Supplementary material 6: Table 3. Differential enrichment of cell states in BC pre- and post-treatment, evaluated using MeC feature testing.
12935_2025_3750_MOESM7_ESM.csv
Supplementary material 7: Table 4. Differential enrichment of cell states between BC treatment responders and non-responders, assessed using MeC feature testing.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Q., Yu, Y., Ruan, L. et al. Integrated single-cell and bulk transcriptomic analysis identifies a novel macrophage subtype associated with poor prognosis in breast cancer. Cancer Cell Int 25, 119 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12935-025-03750-w
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12935-025-03750-w