Notes of Allama Iqbal open University
Concept of Measurement, Assessment and Evaluation
Measurement: In general, the term measurement is used to determine the attributes or dimensions of object. For example, we measure an object to know how big, tall or heavy it is.
Assessment: Assessment is a broad term that includes testing. For example, a teacher may assess the knowledge of English language through a test and assesses the language proficiency of the students through any other instrument for example oral quiz or presentation.
Evaluation: Evaluation as a continuous inspection of all available information in order to form a valid judgment of students’ learning and/or the effectiveness of education program.
Classroom Assessment: Why, What, How and When
According to Carole Tomlinson “Assessment is today's means of modifying tomorrow's instruction." It is an integral part of teaching learning process.
Why to Assess: Teachers have clear goals for instruction and they assess to ensure that these goals have been or are being met. If objectives are the destination, instruction is the path to it then assessment is a tool to keep the efforts on track and to ensure that the path is right. After the completion of journey assessment is the indication that destination is ahead.
What to Assess: Teachers cannot assess whatever they themselves like. In classroom assessment, teachers are supposed to assess students' current abilities in a given skill or task. The teacher can assess students’ knowledge, skills or behaviour related to a particular field.
Who to Assess: It may seem strange to ask whom a teacher should assess in the classroom, but the issue is of great concern. Teachers should treat students as 'real learners', not as course or unit coverers. They should also predict that some students are more active and some are less active; some are quick at learning and some are slow at it.
How to Assess: Teachers employ different instruments, formal or informal, to assess their students. Brown and Hudson (1998) reported that teachers use three sorts of assessment methods – selected-response assessments, constructed-response assessments, and personal-response assessments. They can adjust the assessment types to what they are going to assess.
When to Assess: There is a strong agreement of educationists that assessment is interwoven into instruction. Teachers continue to assess the students learning throughout the process of teaching. They particularly do formal assessments when they are going to make instructional decisions at the formative and summative levels, even if those decisions are small.
How much to Assess: There is no touchstone to weigh the degree to which a teacher should assess students. But it doesn't mean that teachers can evaluate their students to the extent that they prefer. It is generally agreed that as students differ in ability, learning styles, interests and needs etc so assessment should be limited to every individual's needs, ability and knowledge. Teachers’ careful and wise judgment in this regard can prevent teachers from over assessment or underassessment.
Characteristics of Classroom Assessment
1. Effective assessment of student learning begins with educational goals.
2. Assessment is most effective when it reflects an understanding of learning as multidimensional, integrated, and revealed in performance over time.
3. Assessment works best when it has a clear, explicitly stated purposes.
4. Assessment requires attention to outcomes but also and equally to the experiences that lead to those outcomes.
5. Assessment works best when it is ongoing not episodic.
6. Assessment is effective when representatives from across the educational community are involved.
7. Assessment makes a difference when it begins with issues of use and illuminates questions that people really care about.
8. Through effective assessment, educators meet responsibilities to students
and to the public.
Objectives and Educational Outcomes, Bloom Taxonomy, SOLO Taxonomy
Definition of Objectives: Education is, without any doubt, a purposeful activity. Every step of this activity has and should definitely have a particular purpose. Therefore learning objectives are a prime and integral part of teaching learning process.
Taxonomy of Educational Objectives:
Following the 1948 Convention of the American Psychological Association, a group of college examiners considered the need for a system of classifying educational goals for the evaluation of student performance. Years later and as a result of this effort, Benjamin Bloom formulated a classification of "the goals of the educational process". Eventually, Bloom established a hierarchy of educational objectives for categorizing level of abstraction of questions that commonly occur in educational settings (Bloom, 1965). This classification is generally referred to as Bloom's Taxonomy. Taxonomy means 'a set of classification principles', or 'structure'. The followings are six levels in this taxonomy: Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. The detail is given below:
Cognitive domain: The cognitive domain (Bloom, 1956) involves the development of intellectual skills. This includes the recall or recognition of specific facts, procedural patterns, and concepts that serve in the development of intellectual abilities and skills. There are six levels of this domain starting from the simplest cognitive behaviour to the most complex. The levels can be thought of as degrees of difficulties. That is, the first ones must normally be mastered before the next ones can take place.
Affective domain: The affective domain is related to the manner in which we deal with things emotionally, such as feelings, values, appreciation, enthusiasms, motivations, and attitudes. The five levels of this domain include: receiving, responding, valuing, organization, and characterizing by value.
Psychomotor domain: Focus is on physical and kinesthetic skills. The psychomotor domain includes physical movement, coordination, and use of the motor-skill areas. Development of these skills requires practice and is measured in terms of speed, precision, distance, procedures, or techniques in execution. There are seven levels of this domain from the simplest behaviour to the most complex. Domain levels include: Perception, set, guided response, mechanism, complex or overt response, adaptation.
SOLO Taxonomy: It describes level of increasing complexity in a student's understanding of a subject through five stages, and it is claimed to be applicable to any subject area. Not all students
get through all five stages, of course, and indeed not all teaching.
1. Pre-structural: here students are simply acquiring bits of unconnected information, which have no organisation and make no sense.
2. Unistructural: simple and obvious connections are made, but their significance is not grasped.
3. Multistructural: a number of connections may be made, but the meta-connections between them are missed, as is their significance for the whole.
4. Relational level: the student is now able to appreciate the significance of the parts in relation to the whole.
5. At the extended abstract level, the student is making connections not only within the given subject area, but also beyond it, able to generalise and transfer the principles and ideas underlying the specific instance.
Achievement Tests, Aptitude Tests
Achievement Tests: Achievement tests may assess any or all of reading, math, and written language as well as subject areas such as science and social studies.
Types of Achievement Tests
(a) Summative Evaluation: Testing is done at the end of the instructional unit. The test score is seen as the summation of all knowledge learned during a particular subject unit.
(b) Formative Evaluation: Testing occurs constantly with learning so that teachers can evaluate the effectiveness of teaching methods along with the assessment of students' abilities.
Advantages of Achievement Test:
• One of the main advantages of testing is that it is able to provide assessments that are psychometrically valid and reliable, as well as results which are generalized and replicable.
• Another advantage is aggregation. A well designed test provides an assessment of an individual's mastery of a domain of knowledge or skill which at some level of aggregation will provide useful information. That is, while individual assessments may not be accurate enough for practical purposes, the mean scores of classes, schools, branches of a company, or other groups may well provide useful information.
Aptitude Tests: Aptitude tests assume that individuals have inherent strengths and weaknesses, and are naturally inclined toward success or failure in certain areas based on their inherent
characteristics.
Types of Aptitude Test: The following is a list of the different types of aptitude test that are used for assessment process.
(a) Critical Thinking: Critical thinking is defined as a form of reflective reasoning which analyses and evaluates information and arguments by applying a range of intellectual skills in order to reach clear, logical and coherent judgments within a given context. Critical thinking tests force candidates to analyse and evaluate short passages of written information and make
deductions to form answers.
(b) Numerical Reasoning Tests: Numerical tests, sometimes known as numerical reasoning, are used during the application process at all major investment banks and accountancy & professional services firms. Test can be either written or taken online. The tests are usually provided by a third party.
Perceptual Speed Tests: Perceptual speed is the ability to quickly and accurately compare letters, numbers, objects, pictures, or patterns. In tests of perceptual speed the things to be compared may be presented at the same time or one after the other. Candidates may also be asked to compare a presented object with a remembered object.
(c) Spatial Visualization Tests: Spatial visualization ability or Visual-spatial ability refers to the ability to mentally manipulate 2-dimensional and 3-dimensional figures.
(d) Logical Reasoning Tests: Logical reasoning aptitude tests (also known as Critical Reasoning Tests) may be either verbal (word based, e.g. "Verbal Logical Reasoning"), numerical (number based, e.g. "Numerical Logical Reasoning") or diagrammatic (picture based, see diagrammatic tests for more information).
(e) Verbal Reasoning Tests: In a verbal reasoning test, you are typically
provided with a passage, or several passages, of information and required to evaluate a
set of statements by selecting one of the following possible answers.
Reliability and Validity
Reliability: Reliability means Trustworthy. A test score is called reliable when we have reasons for believing the test score to be stable and objective. For example if the same test is given to two classes and is marked by different teachers even then it produced the similar results, it may be considered as reliable.
Types of Reliability: There are six general classes of reliability estimates, each of which estimates reliability in a different way. They are:
i) Inter-Rater or Inter-Observer Reliability: To assess the degree to which different raters/observers give consistent estimates of the same phenomenon. That is if two teachers mark same test and the results are similar, so it indicates the inter-rater or inter-observer reliability.
ii) Test-Retest Reliability: To assess the consistency of a measure from one time to another, when a same test is administered twice and the results of both administrations are similar, this constitutes the test-retest reliability. Students may remember and may be mature after the first
administration creates a problem for test-retest reliability.
iii) Parallel-Form Reliability: To assess the consistency of the results of two tests constructed in the same way from the same content domain. Here the test designer tries to develop two tests of the similar kinds and after administration the results are similar then it will indicate the parallel form reliability.
iv) Internal Consistency Reliability: To assess the consistency of results across items within a test, it is correlation of the individual items score with the entire test.
v) Split half Reliability: To assess the consistency of results comparing two halves of single test, these halves may be even odd items on the single test.
vi) Kuder-Richardson Reliability: To assess the consistency of the results using all the possible split halves of a test.
Factors Affecting Reliability: Reliability of the test is an important characteristic as we use the test results for the future decisions about the students’ educational advances and for the job selection and many more. The methods to assure the reliability of the tests have been discussed. Many examples have been provided in order to in-depth understanding of the concepts. Here we
shall focus upon the different factors that may affect the reliability of the test.
Nature of Validity: The validity of an assessment tool is the degree to which it measures for what it is designed to measure. For example if a test is designed to measure the skill of addition of three digit in mathematics but the problems are presented in difficult language that is not
according to the ability level of the students then it may not measure the addition skill of
three digits, consequently will not be a valid test.
Methods of Measuring Validity:
Content Validity: The evidence of the content validity is judgmental process and may be formal or informal. The formal process has systematic procedure which arrives at a judgment. The important components are the identification of behavioural objectives and construction of table of specification. Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct.
Curricular Validity: The extent to which the content of the test matches the objectives of a specific curriculum as it is formally described. Curricular validity takes on particular importance in situations where tests are used for high-stakes decisions, such as Punjab Examination Commission exams for fifth and eight grade students and Boards of Intermediate and Secondary
Education Examinations.
Construct Validity: Construct validity is a test’s ability to measure factors which are relevant to the field of study. Construct validity is thus an assessment of the quality of an instrument or
experimental design. It says 'Does it measure the construct it is supposed to measure'. Construct validity is rarely applied in achievement test.
Multiple Choice Questions, Completion items, Short Questions, Essay
Multiple Choice Questions: Multiple-choice test items consist of a stem or a question and three or more alternative answers (options) with the correct answer sometimes called the keyed response and the incorrect answers called distracters. This form is generally better than the incomplete stem because it is simpler and more natural.
Multiple Choice Questions Good for: Application, synthesis, analysis, and evaluation levels
Completion Items: Like true-false items, completion items are relatively easy to write. Perhaps the first tests classroom teachers’ construct and students take completion tests. Like items of all other formats, though, there are good and poor completion items. Student fills in one or more
blanks in a statement. These are also known as “Gap-Fillers.” Most effective for assessing knowledge and comprehension learning outcomes but can be written for higher level outcomes. e.g. The capital city of Pakistan is -----------------.
Short Answer: Student supplies a response to a question that might consistent of a single word or phrase. Most effective for assessing knowledge and comprehension learning outcomes but can be written for higher level outcomes. Short answer items are of two types.
• Simple direct questions
Who was the first president of the Pakistan?
• Completion items
The name of the first president of Pakistan is ___________.
Advantages of Short Anwer:
• Easy to construct
• Good for "who," what," where," "when" content
• Minimizes guessing
Disadvantages of Short Anwer:
• May overemphasize memorization of facts
• Take care - questions may have more than one correct answer
• Scoring is laborious
Essay: Essay questions provide a complex prompt that requires written responses, which can vary in length from a couple of paragraphs to many pages. Like short answer questions, they provide students with an opportunity to explain their understanding and demonstrate creativity, but make it hard for students to arrive at an acceptable answer by bluffing. They can be constructed reasonably quickly and easily but marking these questions can be time-consuming and grade agreement can be difficult.
There are 2 major categories of essay questions -- short response (also referred to as restricted or brief) and extended response.
• Restricted Response: more consistent scoring, outlines parameters of responses
• Extended Response Essay Items: synthesis and evaluation levels; a lot of freedom in answers
Advantages of Essay:
• Students less likely to guess
• Easy to construct
• Stimulates more study
• Allows students to demonstrate ability to organize knowledge, express opinions, show originality.
Disadvantages of Essay:
• Can limit amount of material tested, therefore has decreased validity.
• Subjective, potentially unreliable scoring.
• Time consuming to score.
No comments:
Post a Comment
Type Your Comment Here