Handling Data Errors in Forest Planning

Antti Mäkinen, University of Helsinki, Latokartanonkaari 7 (PL 27), Helsinki, 00790, Finland, antti.makinen@helsinki.fi
Jouni Kalliovirta, University of Helsinki, Latokartanonkaari 7 (PL 27), Helsinki, 00790, Finland, jouni.kalliovirta@helsinki.fi
Jussi Rasinmäki, University of Helsinki, Latokartanonkaari 7 (PL 27), Helsinki, 00790, Finland, jussi.rasinmaki@helsinki.fi

Decisions on the management actions of forest estates are usually made based on the projected future scenarios generated with forest planning systems. The quality of the data plays very important role in producing reliable and accurate projections of the future. In many cases inaccurate or flawed data can lead to erroneous projections and thus to wrong decisions. In Finland the data for forest management planning is collected in part by subjective sampling and mainly through visual standwise assessment, which is known to be prone to error. Although data collection methods and data processing have been improving, missing and erroneous data records still occur and cause problems in forest planning systems. As a part of an effort to develop a next generation forest planning system, we aim for developing tools for decreasing the effects of errors in the forest planning process.

In this study we discuss two main questions in handling the missing and erroneous data in the context of a forest planning system. The first question is how to find the missing and erroneous values in often large forest databases. The second question is how to replace these probably erroneous data values with more realistic ones in order to predict the forest development better.

Finding the missing values is a trivial task, but finding the incorrectly classified categorical and the erroneous continuous variables is more challenging. We examine the use of data mining techniques as a method for finding erroneous data records instead of traditional statistical outlier detection methods. The data mining techniques of particular interest are distance based, density based and clustering based outlier detection algorithms. These data mining techniques have been previously applied for example in detecting criminal activity in e-commerce and credit card frauds.

The second question of replacing the missing and the erroneous data values is assessed by examining different imputation methods. We are going to examine some of the well-established imputation techniques as a method for replacing the missing and the erroneous values. Even though imputed values can not be considered as good as original measured values, they can in case of erroneous measurements provide us with more realistic data values and also more reliable forest management plans.



















Decisions for Sustainability
June 12-14, 2007
Victoria, British Columbia, Canada

Forest Estate Models for the Future

Website Advertising
We will accept advertising to support the continuation of this website and future forest estate modelling conferences.
Conference Background
Home
Organizing Committee
Plenary Speakers
Program
Presentation Indices
Author
Title by Section
Search
Communication
Discussion List
Website Contact
Links
Forest Estate Models
Other Tools
Decision Support
Habitat Supply
Sustainability
Agencies
Universities
Consultants
Presentation PDF