I worked at Royal Free Hospital for one year. When the 1st National Friends and Family test was published at the end of July 2013, I was surprised that Royal Free Hospital had the lowest net promoter score. I have worked in many hospitals and I actually thought that patients were treated reasonably well. In fact in the 2012 National Staff Survey when asked: ** ‘If a friend or relative needed treatment I would be happy with the standard of care provided by this organisation’, **72.5% answered ‘Agree/ Strongly Agree’ which was in the 4th quartile of performance.

So I asked myself a question, is there are correlation between the ‘Friends and Family test’ and the ‘NHS Staff Survey’?

I took both datasets, combined them and ran a linear regression analysis.

As you can see, the R-squared value is 0.1513 which represents a small positive correlation. I’m afraid that my statistics is not up to scratch to ask if this is statistically significant.

Interestingly, my old hospital does look to be an outlier.

I don’t know what this means but a few things but there are a few things to keep in mind

- The response rate for the friends and family tests for inpatients was only 10% at Royal Free Hospital
- The net promoter score is calculated very differently from the NHS Staff Survey (please see links above for an explanation).

From the website:

*Those that say they are ‘extremely likely’ are counted as promoters. ‘Likely’ is neutral, ‘neither unlikely nor likely’, ‘unlikely’ and ‘extremely unlikely’ are all counted as detractors.*

I welcome others to analyse this data. The sources and methods on how the raw data was collected and calculated are available from the sources (links above).

You can download my Excel Spreadsheet: NHS Staff Survey vs Friends Family or LibreOffice Spreadsheet

Any thoughts?

**Update: 0734 19/08/2013**

This post has generated quite a bit of interest with a lot of talk about sample sizes and response rates.

I wonder if someone able to create dynamic graph where the user can filter by the sample size and when mouse-overed, the tooltip will show the hospital the data pointrefers too? Maybe the size of the data point can correspond to sample size too (like a bubble graph?). All help greatly appreciated.

### Like this:

Like Loading...

*Related*

Thanks Carl Plant (@carlplant) for creating this visualisation using #d3js . I’m sure there will be more to come too.

http://ideapad.org.uk/drwong/drwong_correl.html

Interesting observation.

Since you have posted about open source, why not use an open source stats programme like R, and post the data in a safe, open format, rather than the excel file, with macros in it, that you have?

You can get R from http://www.r-project.org, and the fantastic R IDE, R Studio, from http://www.rstudio.com, both of which are fully FOSS. If these base packages are not up to the job you need then you can easily write your own code, or go to CRAN, http://cran.r-project.org, to find a multitude of packages that will solve almost any stats problem for you.

OK, now you’re open source so what:

If you paste the bit between quotes into a R source file and run it it will show you some of the things that are relevant to you problem:

`"`

x <- rnorm(100, 50, 15) #generate a 100 normally distributed random numbers with mean 50 and sd 15

y <- x + rnorm(100, 0, 15) #generate another 100 by taking the first lot and adding a random number to each with mean 0 and sd 15

fit1 <- lm(y ~ x) #fit y on x with a linear model

fit2 <- lm(x ~ y) #fit x on y

print(summary(fit1)) #look at some numbers calculated from the models

print(summary(fit2))

plot(x, y) #plot a scatter diagram of the points

curve(coef(fit1)[1] + coef(fit1)[2]*x, add=TRUE) #add the regression line for y~x

plot(x, resid(fit1)) #plot the residuals against x

abline(0,0)

#NOT RUN

#lm #gives you the code that actually runs when you use the lm() function

"

Then you can get to work on importing your file into R and doing the same.

You'll see that it will calculate R^2, the F-statistic and a p value for you in a single step, as well as the regression coefficients and p values for them. And if you want to know how it does it you can easily find out, because it's open source.

But just because you can, doesn't mean you should. For example, the two versions of the graph presented have the x and y axes interchanged. This matters, because the regression line is not the same for both versions, as you can see from my fake data version above. It also fundamentally affects how you interpret the results. There is evidence of correlation between storks and births: doi/10.1111/j.1365-3016.2003.00534.x. But what is your theory? There are at least three possible theories, which in cartoon form are:

You also need some idea of whether linear regression is a reasonable thing to do. I've used fake normally distributed data, and have randomly distributed residuals, as can be seen from the residual plot, so it is. But the real values cannot go above 100 or below 0.

Your point about outliers is important, because these can have a large effect on the results of linear regression. Detecting them and what to do about them is a whole other problem though.

Let me know how you get on.

Thanks Duncan. Will give your suggestions deeper thought but in the meantime, I have removed macros from the Excel spreadsheet (it was embedded in the original file from DoH) and also provided a Libreoffice spreadsheet as an alternative for users.

Posted on behalf of Anas El Turabi

Interesting stuff.

A few observations:

Validity of FFTSome statistics thingsIn summary – great work on the blog. Brilliant to see you converting your curiosity into stimulating material. Problem is FFT is a crap measure – GIGO. If you still wanted to use it though, you’d want to use something a bit more sophisticated (but not at all hard) like multivariable regression +/- time series analysis.

Thank you for the FFT discussion, I have had concerns over the usefulness of the friends and family test ever since the ‘pilot’ stage where it was obvious the Government were going to push the test through before it could be evaluated.

My contribution so far to this thread has been the bubble/scatter chart (http://ideapad.org.uk/drwong/drwong_correl.html) providing a visual viewpoint of the FFT vs Staff survey results.

I had a couple of hours spare so I’ve created a survey search/graphing tool bringing together GMC trainee satisfaction results (thank you Dr Wong for the direction here), FFT and NHS staff survey results. http://ideapad.org.uk/drwong/search/

My thoughts on the topic of using NHS data more effectively sit mainly around the formatting of the data. Many of the data releases are still using excel or some clunky table generators which has had near zero effort made on improving accessibility for people interested in understanding the inner workings of the NHS.

To create the search/graphing tool I had to spend majority of the time cleaning the data up before importing into a database for merging together the survey results.

I wish the NHS would involve Geeks and lay people in the process of releasing data to the public.

(cross posted with carlplant.com)