Thursday, 4 October 2012

Convert REDD & FigureEnergy raw data

I have finished the code of the mixture model of EGH extension, the prediction performance of this model is better as I expected, so it's good. More tests are required to confirm how robust the algorithm would be.

There are two real dataset: i) REDD, and ii) FigureEnergy. For the previous algorithm, I have converted these raw dataset into some machine readable using Matlab. However, it is such a pain to reuse the code, and generate the new format type of inputs by navigating the Matlab's code. So, I have switched to Python, which is objected-oriented and much easy to navigate the objects as well as function.

In REDD raw dataset, there are 6 houses. Each house consists of a list of appliances, measured the energy consumption by sub-meters. The energy consumption of each appliance is recorded every 3 seconds. As I am interested in predicting events, so I need to convert the raw dataset into ON-OFF Events. The algorithm is performed as follows:
  • Firstly, set a threshold of energy consumption (typically 55W), store all segments that the appliances turned on.
  • Secondly, set a GAP allowance parameter for two consecutive segments. If the gap between two consecutive segments are greater than the GAP allowance, we connect these two segments altogether, and considered as one segment.
  • Thirdly, select the NOISE removal parameter, then filter all the segment that the duration is less then the Noise parameter.
In FigureEnergy dataset, the list of events does come altogether as one list. So, the idea is to program in Python to separate the appliances for every users to every single files. This is only required time-devoted for coding.

At the moment, the problem with REDD, and FigureEnergy data is the number of days that the dataset can give is small, and not enough to test how robust the prediction algorithm would perform. The solution could be to sample more data from the empirical distribution for every single appliances.

2 comments:

  1. Hi Henry,

    I am planning to do a project on this area. Could you elaborate a little more on how you labeled events as "on or off"? In the REDD Dataset (low frequency data) I just see the timestamps and the corresponding power measurement. How do I check wether a certain appliance is on or off?

    ReplyDelete
  2. Thanks for your interest.
    I have described in the post above how to do them:

    - Firstly, set a threshold of energy consumption (typically 55W), store all segments that the appliances turned on.

    - Secondly, set a GAP allowance parameter for two consecutive segments. If the gap between two consecutive segments are greater than the GAP allowance, we connect these two segments altogether, and considered as one segment.

    - Thirdly, select the NOISE removal parameter, then filter all the segment that the duration is less then the Noise parameter.

    This is a method I use. Please note that the algorithm's parameters can be different according to the devices. For example, washing machine is normally run for 2 hours, while microwave is typically around 1-5 minutes.

    Hope this helps!

    ReplyDelete