[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [EnergyPlus_Support] Proper use of R-squared evaluation for energy use



Jim,

 

I am not sure if the way you calculated R squared is correct. See the attached spreadsheet with scatter plots on which the trend lines with R2s are shown. In general, it does not make sense to pool different data sets for one regression analysis, unless you have reasons to believe that the data sets are measuring the same thing, and the possible causes of errors are the same. Although not shown in this case, the correlations between measured and predicted data for gas and electricity can be very different, therefore pooling them may give you a much worse R2.

 

Another point, R2 alone is insufficient to estimate the accuracy of model predictions, because systematic errors may be present but not reflected in R2. You may want to include some measures of absolute error level as well, such as root-mean-square error (RMSE).

 

Cheers,

 

Yi

 

From: EnergyPlus_Support@xxxxxxxxxxxxxxx [mailto:EnergyPlus_Support@xxxxxxxxxxxxxxx] On Behalf Of Jim Dirkes
Sent: 24 March 2013 20:35
To: EnergyPlus_Support@xxxxxxxxxxxxxxx
Subject: RE: [EnergyPlus_Support] Proper use of R-squared evaluation for energy use

 

 

Niraj,

Thanks for the offer!

I normally try to make one fuel correlate well with actual metered consumption. (This is normally gas, because it has less factors affecting it.)  Then I work on the other one, finally reviewing both for believability.  That’s why I use separate the R2 values plus the overall.

p.s., The spreadsheet I’ve attached has been copied from a much larger one so that you don’t have to wade through all of the other data needlessly.  I think it includes all relevant data and formulae.  Note that all energy use is made unitary (per square foot of floor)

 

 

James V Dirkes II, PE, BEMP, LEED AP
www.buildingperformanceteam.com
Energy Analysis, Commissioning & Training Services
1631 Acacia Drive, Grand Rapids, MI 49504 U SA
616 450 8653

 

From: EnergyPlus_Support@xxxxxxxxxxxxxxx [mailto:EnergyPlus_Support@xxxxxxxxxxxxxxx] On Behalf Of Niraj Poudel
Sent: Sunday, March 24, 2013 4:02 PM
To: EnergyPlus_Support@xxxxxxxxxxxxxxx
Subject: Re: [EnergyPlus_Support] Proper use of R-squared evaluation for energy use

 

 

Jim,

 

May I ask you:

&nbs p;

a) Are you looking for 1 regression model that captures the energy use for both, when you use electricity and gas?

 

b) I guess the reason why I am asking this is because:

 

This question arises because I am looking at a model which, when using 24 data pairs, shows:

·         R2 = 84% for electricity,

·         R2 = 61% for natural gas,

·         and R2 = 57% overall. 

 

When you say that using 24 data points gives you a R^2 of 84% for electricity and R^2 of 61% for natural gas, I cannot understand how you get separate R^2 values for each parameter (electricity and natural gas). Are you setting up your regression model with energy consumption as a dummy variable (if electri city = 1 if not = 0) kind of deal?

 

Anyways, if I could see the data and understand it better, I could go ahead and run the regression model in SAS and see if I get anything different. Plus, it would be only on monday that I could access the computer in my office. If that would be no problem. I could take a stab at it.

 

Niraj

 

 

 

 

On Sun, Mar 24, 2013 at 3:42 PM, Jim Dirkes <jim@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

 

Dear Forum,

I don’t recall if this has been a topic in the past – my apologies if it has! (Just point me to a past post)

When creating an energy model for an existing building that has historical (metered) consumption data, I typically calculate an R2 value for the goodness of fit between actual and predicted energy.  I do this for electricity use and also for natural gas / fossil fuel use.

That is fairly straightforward (or this statistics-challenged engineer wouldn’t calculate it at all!)   

Here’s the question:

·         There are normally 12 data pairs for electricity and 12 for gas

·         Assuming that I have made the dates coincident for electricity and gas,

·         and also use consistent units (e.g., kWh for both)

·         If I want the overall R2 value (composite of all energy), should I use 24 data pairs in the calculation or 12?

 

This question arises because I am looking at a model which, when using 24 data pairs, shows:

·         R2 = 84% for electricity,

·         R2 = 61% for natural gas,

·         and R2 = 57% overall. 

 

Note that the overall value is less than either of its components.  If I use 12 data pairs, the overall R2 = 69%.  Intuitively, 69% makes more sense because it falls between the other two values, so I am inclined toward use of 12 data pairs.  I’m even more inclined, however, to make the calculation correctly!  So I’ ;m reaching out for input, hoping that some of you are more knowledgeable in the area of statistics than I.  Which is more correct – 12 or 24 data pairs?  Thanks in advance.

 

 

 

James V Dirkes II, PE, BEMP, LEED AP
www.buildingperformanceteam.com
Energy Analysis, Commissioning & Training Services
1631 Acacia Drive, Grand Rapids, MI 49504 USA
616 450 8653



 

--

Niraj Poudel, Architectural Engineer.

PhD student, PDBE Program.

Clemson University, Clemson, SC.

 

Attachment not found:
d:\eudora\attach\Copy of R-squared calcs.xlsb
Attachment not found:
C:\Users\lklawrie\Downloads\Embedded\image00115.jpg
Attachment not found:
C:\Users\lklawrie\Downloads\Embedded\image0028.jpg