Understanding online health issue-focused communities with text mining

An exemplary analysis of a German ileostomy forum with more than 6,000 threads and 100,000 posts

true , true , true , true , true
2022-12-01

“Without love we could not survive. Human beings are social creatures, and a concern for each other is the very basis of our life together.”

— Tenzin Gyatso, 14th Dalai Lama

Introduction

From the perspective of a market researcher interested in human behavior, the majority of existing data is now online. While this is in some way very convenient, it also poses some challenges. The challenges are that 1) a lot of the data exists in a very unstructured format that is not designed to answer the kind of questions we are interested in, and 2) the sheer amount and complexity of data can be overwhelming.

The classic online forum (or message board) is an often overlooked social media platform that offers some interesting advantages to tackle these challenges, however. First of all, the forum usually already centers around a topic of interest, that can be anything ranging from films to vintage sports cars to medical issues. Secondly, the structure of the forum is a fairly simple and a hierarchical one: The main forum is usually divided into sub-forums that zoom deeper into some aspects of the general topic, and the sub-forums then consist of discussion threads that in turn consist of posts by individual users. The users create the content of the forum by starting, and participating in, discussion threads on specific topics. Importantly for the users, the forum is also a community and can act as a source of emotional as well as informational support, which is especially relevant in the case of forums centered around health issues.

Online forums and advanced text analytics are one way to unlock the cornerstones for understanding a specific community or target group. In this blog post, we demonstrate the use of text mining1 as an online research method for gaining insight into the topics and trends that move the users of a medically focused online community. We discuss relevant characteristics and meta-information of the online-forum and the role of prominent users as opinion leaders. Although not demonstrated here extensively, the approach could easily be extended to answer further, more focused, questions on trending topics, mentions and sentiment regarding treatment options, and relevant products such as medications.

Background & Data

The analyzed online-forum is popular among people who, either temporarily or permanently, have an artificial bowel outlet, or in medical terms, an ileostomy. There are multiple reasons why such a surgical opening becomes necessary, however, patients who suffer from severe inflammatory bowel disease (IBD), specifically ulcerative colitis and Crohn’s disease, are often receivers of an ileostomy and the necessary ostomy pouch to collect intestinal waste. In any case, when drug treatment options are exhausted and the bowel must be removed, an ileostomy becomes indispensable. The procedure is of course a very invasive one, and marks a major life change for the person undergoing it, so it’s easy to see how a community of people in the same situation would be an invaluable resource.

The forum was founded in 2003 and changed ownership in 2010. Today, the forum follows a classic online forum format and is sectioned in two parts: (1) a public forum that consists of 7 sub-forums, in which every internet user can read, but only registered users can write, and (2) a private forum with 5 sub-forums, where only registered users can read and write.

As part of our text mining approach, we use web scraping to automatically and iteratively extract, collect and store the nested text content and meta-information of the public forum at one point in time2. The resulting data structure builds the foundation for our analysis and understanding of the community up to that point. Across all 7 sub-forums of the public section of the forum, we extracted 6,804 threads, inheriting 104,662 posts in total.

Results

The community is diverse in its topics and reflects that receiving an ostomy pouch touches on nearly every aspect of life. At Q, we have been researching IBD-related online communities for several years but are keen to learn more about ileostomy patients specifically.

We find that users come to the forum to ask for advice on day-to-day challenges, treatment options and advances, to share experiences, and to offer support to others in similar situations. This is reflected by the topics of the most discussed or active threads (all-time Top-1%) of each sub-forum, as depicted in the treemap below. The visualization is interactive: you can scroll, click and drag. The names of the sub-forums and threads are in German, the original language of the community.

Exploring the Top-1% of all-time threads, we discover challenges of everyday life with tips on how to master them, emotional patient stories, support requests, requests for information on bureaucratic processes and also threads that build a community.

To learn more about the community, we broaden our perspective and also look at the all-time Top-20 threads of the forum as a whole.

We see that many of them are on topics such as fear, cancer, relapse, and related concerns calling for emotional support. Such threads receiving as many posts as they do is indicative of a community that is looking out for its members.

Just like every long-lived online community, a supportive environment depends on and is shaped by the most active and influential users. Therefore, we zoom in on the most active users to get a better idea on their posting habits. In the following table we rank all 3,359 registered forum users by their respective number of total posts. Forum activity is measures as average posts per week throughout a user’s tenure in the forum3, and posts over time shows the user’s total yearly posts since the founding of the forum in 2003. The table is interactive, you can click on the column headers to sort, and on the page numbers to browse.

The top users average multiple posts per day for over more than a decade of activity and act as opinion leaders, thus having the power to define and shape the community. Although the group of top users is relatively small, the persistence of highly involved individuals reflects the significance of the forum in some users’ lives.

Let’s look at the most active user in the history of the forum in more detail: The user was an active member of the forum from February 2005 until January 2020 and contributed to the community with 8 posts per week, on average. Today, the user is most likely an elderly woman from northern Germany, the region of Hamburg, and lives with an ostomy pouch due to the removal of her large intestine. Her forum posts are full of advice and loving support, flavored with the typical northern German dialect.

The longevity of her condition and her experience with it, both in and outside of the forum threads, leads to offering educated suggestions and recommendations, often even in terms of specific services and products. In that regard, she is among the Top-5 users who make up 22% of all Coloplast4 mentions. She refers to the Coloplast website multiple times as well as recommends products such as Coloplast sealing rings.

Mentions of brands, services and products are of course of key importance in online market research. However, in order to correctly interpret fluctuating trends (i.e. changes in number of mentions) based on web data, you need to be aware of changes in number of users and user activity. To look into this, we analysed user activity over time, i.e., the number of monthly postings.

The success story of the forum began in early 2007: About 3.5 years after the first postings, forum activity exploded and put the forum on the map for German ileostomy patients. Monthly postings were steadily rising and finally peaked in January 2009 - one year prior to the change in ownership.

Since the peak in monthly postings, we observe a general downward trend, the causes of which can be several. Interestingly, we also see that the month-to-month activity is highly variable. We were interested in seeing whether there is a seasonal pattern behind this variability, so we delved further into the timeseries data.

In order to better understand seasonal trends in forum activity, we quantify the importance of every calender month in terms of posting behavior. We achieve that by looking at each month’s share of the total postings in a year. The following plot shows (1) one grey line for each year between 2007 and 20205, (2) a blue line representing the average percentage of posts in each calendar month, and (3) a dashed green line highlighting the naturally expected average share of posts for a calender month, which is 8.33% (or 1/12).

The blue line suggests a seasonal pattern: The months January, February and March stand out with a higher than average percentage of posts - although there is considerable year to year variability in this pattern. The months July, November and December, in contrast, display the lowest activity on average.

Additionally, to separate the general longer timescale trends in the forum activity from seasonal ones, we perform a “Seasonal and Trend decomposition using LOESS” (STL) with multiplicative annual seasonality. The results show that - even though we do see a seasonal pattern - the seasonality has a relatively small effect on the forum activity compared to the general downward trend. Moreover, we conclude that there were large - and likely external - events around 2009 and 2010 that influenced forum activity, because activity in those years is not well explained by trend or seasonality.

Conclusion

The need for an artificial bowel outlet is a major life change and doctors and other Health Care Professionals (HCPs) can only help so much. Patients have many questions, a need to connect with others in the same situation, and a need to find an emotional outlet. The anonymity of an online forum is a place to have those needs met, especially when the real life support system is not enough in the eyes of the patient.

We leveraged text mining as an online research method to create a deeper understanding of a medical issue-focused online community. We saw that health-related fears can be a constant unpleasant companion, that even the smallest daily challenges are worth talking about, and that emotional support from others in a similar situation is key to many users. Dedicated long term users stand out and are recognized opinion leaders. Their expressed values and opinions, e.g. on services or products, can shape other users’ perceptions.

Furthermore, we saw that the overall activity in this forum decreases over time. This general trend can have many causes, such as the rise in popularity of other competing social media channels. We can conclude from the mined data that the seasonality of the time series - where the first quarter sees more posts than the other months of the year, on average - cannot account for the decline in forum activity.

Finally, and importantly for us, we witness users develop and adapt to a forum-culture and a language-code that influences lives even beyond the borders of the forum. This highlights the importance of studying and understanding human behavior in relevant online communities in order to understand it also outside of them.

Acknowledgments

We thank Martí Medina-Hernández for his contributions to this article during his internship at Q.

Author Contributions

Data allocation and analysis was a joint effort of all authors. The final text was written by Swen Sieben and edited by Paavo Huoviala.


  1. Text mining in general refers to the process of turning text into data, e.g., via text structuring, text categorization and pattern recognition, in order to find relevant and novel information.↩︎

  2. We collected all public forum posts up to September 2021.↩︎

  3. For example: A user who has registered 10 weeks ago and has written 30 posts would have an average of 3 posts per week.↩︎

  4. Coloplast is a Danish company that among others specializes in medical devices and services for ostomy patients.↩︎

  5. We truncated the data to only include the years from 2007 to 2020, because data for 2021 is incomplete and to de-emphasize the highly variable posting behavior in the early foundation phase of the forum.↩︎