Rabu, 09 November 2016

 Chapter IV 

standardized testing


    Approach for Determining Performance Testing Objectives

Determining performance-testing objectives can be thought of in terms of the following activities:
  • Determine the objectives of the performance-testing effort.
  • Capture or estimate resource usage targets and thresholds.  
  • Capture or estimate resource budgets or allocations.
  • Identify metrics.
  • Communicate results.
  • Stay aware of changing objectives, targets, and budgets.
These activities have been discussed in detail in the following sections.

Determine the Objectives of Performance Testing

The methods described in this chapter have proven effective in performance-testing projects. Whether you apply these methods precisely as stated or adapt them to fit your specific project and work environment is unimportant. What is important is to remember that objectives are intentionally collaborative; that is, they are a tool for helping to ensure that the performance-testing effort provides great value to the team — in particular the architects, developers, and administrators — as early as possible in the project life cycle.

Determine Overall Objectives

The first task is to determine the overall objectives for the performance-testing effort. Some common objectives include:
  • Determine if the application complies with contracts, regulations, and service level agreements (SLAs).
  • Detect bottlenecks to be tuned.
  • Assist the development team in determining the performance characteristics for various configuration options.
  • Provide input data for scalability and capacity-planning efforts.
  • Determine if the application is ready for deployment to production.

Review the Project Plan

Review the project plan with individual team members or small groups. Remember that a project plan does not have to be in the form of a document; for example, it may be a whiteboard sketch, a series of e-mail messages, or a vague idea in the minds of various team members. The point is that no matter how informal the project plan might be, every project has some sort of underlying plan. While reviewing or extracting the plan, whenever you encounter something that looks like a checkpoint, iteration, or milestone, you should ask questions such as:
  • What functionality, architecture, and/or hardware will be changing between the last iteration and this iteration?
  • Are there performance budgets or thresholds associated with that change? If so, what are they? Can I test them for you? What are the consequences if the budgets or thresholds are not being met?
  • Is tuning likely to be required as a result of this change? Are there any metrics that I can collect to help you with the tuning?
  • Is this change likely to impact other areas for which we have previously tested/collected metrics? If so, which areas? What tests can I run or what metrics can I collect to help determine if everything is working as expected?
  • What risks or significant concerns are related to these changes? What will the consequences be if the changes do not work?

Review the Architecture

Review both the physical and logical architecture with individual team members or small groups. Again, keep in mind that this information may not yet be documented, but someone will at least have a conceptual model in mind — or if they do not, it is probably valuable to find that out as well. As you review or extract the architecture, ask questions such as:
  • Have you ever done this/used this before?
  • How can we determine if this is performing within acceptable parameters early in the process? Are there experiments or architectural validations that we can use to check some of our assumptions?
  • Is this likely to need tuning? What tests can I run or what metrics can I collect to assist in making this determination?

Ask Team Members

Ask individual team members about their biggest performance-related concern(s) for the project and how you could detect these problems as early as possible. You might need to establish trust with team members before you get the best answers. Reassure the team individually and collectively that you are soliciting this information so that you can better assist them in building a high-quality product.

Capture or Estimate Resource Usage Targets and Thresholds

This activity is sometimes misapplied. Remember that targets and thresholds are specific metrics related to particular resources. For example, it is generally agreed that a server’s performance degrades significantly if the processor utilization regularly exceeds 80 percent. Based on this, many teams will set a processor utilization target of 70 percent and a threshold of 80 percent. By doing so, you know to alert the team if you observe readings of more than 70-percent processor utilization sustained for more than a few seconds, and to register a defect if a processor utilization rate of more than 80 percent is observed for more than a few seconds. It is worth noting that developing these targets and thresholds can be very time-consuming. Do not continue to set targets and thresholds after their value becomes questionable.
Except in extremely rare circumstances, it is not appropriate for the performance tester to determine targets and thresholds, but only to capture data and compare test results to the targets and thresholds. Even if the performance tester is the most qualified individual to set the targets and thresholds, s/he is not the individual responsible for ensuring that they are met; rather, s/he is responsible for providing information to the team members responsible for ensuring that these targets and thresholds are met so that those persons can make informed decisions. It is important to resist the urge to set targets yourself. Consider the following when performing this activity:
  • Talk to the production support team. Determine what they measure and where they set their thresholds. This is their job; they have been doing this for years and they know where the problems occur.
  • Ask the architects, or other team members who may be responsible for enforcing and/or making decisions regarding targets and thresholds, to share those decisions with you.
  • Find out what the rest of the industry is doing. Even though it is not your job to set targets and thresholds, it is always a good idea to do a Web search or refer to other documentation to find the latest recommendations. If these recommendations seem relevant to your project, make a note of them. This target- and threshold-related data may provide a useful context for the actual data you collect during your testing.
  • Work with key performance indicators (network, disk, memory, and processor) for the technology.
  • Work with key performance indicators that map to the business requirements. This will help to bridge engineering with the business.
  • Work with both key performance indicators and business metrics to better understand the current volume and future growth indicators of the business and the infrastructure.
  • Work with the business metrics. Many performance metrics have a strong semantic relationship with the business metrics; for example, database transactions per second and number of orders per second, or number of searches per second with Web hits per second.
  • Work with stakeholders when articulating and understanding performance metrics. While most stakeholders are not experts on performance testing, diagnosis, debugging, or analysis, most of them do have expertise in the performance metrics requirements of the business. These stakeholders can articulate metrics around their systems that correlate with the operations. This will facilitate exposing performance metrics in a more intuitive way.

Capture or Estimate Resource Budgets

As mentioned in the previous section, remember that the performance tester’s job is to collect and provide information about budgets and allocations, not to enforce them. Determining resource budgets or allocations is one way that teams work together to ensure that targets and thresholds are realistic. For example, if one of your targets is to keep the total RAM usage of a particular server under 1 gigabyte (GB) and that server hosts both a database and application server software, the database software may be given a RAM allocation of 600 megabytes (MB) and the application server software 400 MB. It is the responsibility of the developers and administrators of those software components to stay within those budgets. By making sure that you are aware of these budgets or allocations as a performance tester, you can let the team know when a resource is approaching or exceeding its budget almost immediately, thus giving the team more time to react. Consider the following proven practices when performing this activity:
  • Ask the architects, or other team members who may be responsible for enforcing and/or making decisions regarding targets and thresholds, to share those decisions with you.
  • Review project documents. Performance testers are not always specifically invited to review design and architecture documents, so remember to ask.
  • Attend developer and architect meetings. Take note of comments such as “see if you can get that object under X memory consumption.” Although instructions such as these rarely appear on paper, and thus would not be known to you if you didn’t attend the meeting, the developer still might appreciate another set of eyes watching his object’s memory consumption.
  • Work with key performance indicator thresholds that indicate the health of the technologies being used.
  • Work with business metrics that indicate whether you are meeting the business requirements; for example, orders per second, number of failed order requests, and so on.

Identify Metrics

Most of the time, this activity is rather transparent. For example, if an objective states that the processor utilization of the Web server should not exceed 80 percent for more than 1 second in 10, it is clear that one metric you should be monitoring is the processor utilization of the Web server, polled at not less than 1-second intervals. You may not want to do this during every test, but there is no question what you need to measure. However, sometimes the associated metrics are not so clear or are not so simple to collect. In these cases, consider the following approach:
  • Create a grid or a simple spreadsheet that maps each of the collected objectives to the metric(s) that will indicate if the objective is being met.
  • If it is not obvious how to collect each metric without skewing the test or any of the other data you hope to collect at the same time, do some research or work with the development team to determine the optimal approach.
  • Collaborate with the developers, architects, and administrators. These parties know which metrics are valuable for their specific purposes and how to capture most of them. Their input will ensure that you know how to put the application in the state that makes those metrics most valuable.
  • Consider where you will keep this information and how you will label it so that it is accessible after the tests.

Communicate Results

Communicating the results of tests that capture data related to performance objectives is different than communicating results related to overall performance goals and requirements. Objective-related results are intended to be useful information for the team rather than to determine an application’s overall fitness for release. Therefore it is beneficial to share the information freely. In most cases, the fact that an objective is not being met is not something that gets recorded in a defect-tracking system but is simply information to help the team do its job better.
Consider the following techniques when performing this activity:
  • Report results versus targets, budgets, and previous measurements as well as your own research. You never know what the team will find most valuable.
  • Share reports with the entire team.
  • Make the raw data available to the team and invite them to parse it in other ways and to suggest more helpful ways of presenting the data.
  • Be ready, willing, interested, and able to re-execute and/or modify the tests as needed.
  • Do not send raw data outside the team unless instructed to do so by someone willing and able to take responsibility for any consequences that might arise from doing so.
  • Avoid reporting potential causes of poor performance. Instead, report symptoms and conditions. Reporting a cause incorrectly may damage your credibility.

Stay Aware of Changing Objectives, Targets, and Budgets

It is important to remember that objectives are bound to change during the life of a project. As requirements change, features are moved into or out of a particular build, hardware decisions are made, code is refactored, and so on. Performance-testing objectives are bound to change as well. Maintain a running dialogue with your team. Ask the team what is changing and how it impacts the objectives. Whether you do this in person or electronically is up to you; just remember that you will be wasting your own time if you are testing against an old, no-longer-relevant objective.

Case Studies – Identifying Performance-testing Objectives

The following case studies help illustrate the approach to identifying performance-testing objectives.

Case Study 1

Scenario

A 40-year-old financial services company with 3,000 employees is implementing its annual Enterprise Resource Planning (ERP) software upgrade, including new production hardware. The last upgrade resulted in disappointing performance and many months of tuning during production.

Performance Objectives

The performance-testing effort was based on the following overall performance objectives:
  • Ensure that the new production hardware is no slower than the previous release.
  • Determine configuration settings for the new production hardware.
  • Tune customizations. 

Performance Budget/Constraints

The following budget limitations constrained the performance-testing effort:
  • No server should have sustained processor utilization above 80 percent under any anticipated load. (Threshold)
  • No single requested report is permitted to lock more than 20 MB of RAM and 15-percent processor utilization on the Data Cube Server.
  • No combination of requested reports is permitted to lock more than 100 MB of RAM and 50-percent processor utilization on the Data Cube Server at one time.

Performance-Testing Objectives

The following priority objectives focused the performance testing:
  • Verify that there is no performance degradation over the previous release.
  • Verify the ideal configuration for the application in terms of response time, throughput, and resource utilization.
  • Resolve existing performance inadequacy with the Data Cube Server.

Questions

The following questions helped to determine relevant testing objectives:
  • What is the reason for deciding to test performance? 
  • In terms of performance, what issues concern you most in relation to the upgrade?
  • Why are you concerned about the Data Cube Server?

Case Study 2

Scenario

A financial institution with 4,000 users distributed among the central headquarters and several branch offices is experiencing performance problems with business applications that deal with loan processing.
Six major business operations have been affected by problems related to slowness as well as high resource consumption and error rates identified by the company’s IT group. The consumption issue is due to high processor usage in the database, while the errors are related to database queries with exceptions.

Performance Objectives

The performance-testing effort was based on the following overall performance objectives:
  • The system must support all users in the central headquarters and branch offices who use the system during peak business hours.
  • The system must meet backup duration requirements for the minimal possible timeframe.
  • Database queries should be optimal, resulting in processor utilization no higher than 50-75 percent during normal and peak business activities.

Performance Budget/Constraints

The following budget limitations constrained the performance-testing effort:
  • No server should have sustained processor utilization above 75 percent under any anticipated load (normal and peak) when users in headquarters and branch offices are using the system. (Threshold)
  • When system backups are being performed, the response times of business operations should not exceed 8 percent, or the response times experienced when no backup is being done.
  • Response times for all business operations during normal and peak load should not exceed 6 seconds.
  • No error rates are allowable during transaction activity in the database that may result in the loss of user-submitted loan applications.

Performance-Testing Objectives

The following priority objectives focused the performance testing:
  • Help to optimize the loan-processing applications to ensure that the system meets stated business requirements.
  • Test for 100-percent coverage of the entire six business processes affected by the loan-manufacturing applications.
  • Target database queries that were confirmed to be extremely sub-optimal, with improper hints and nested sub-query hashing.
  • Help to remove superfluous database queries in order to minimize transactional cost.
  • Tests should monitor for relevant component metrics: end-user response time, error rate, database transactions per second, and overall processor, memory, network, and disk status for the database server.

Questions

The following questions helped to determine relevant testing objectives:
  • What is the reason for deciding to test performance? 
  • In terms of performance, what issues concern you most in relation to the queries that may be causing processor bottlenecks and transactional errors?
  • What business cases related to the queries might be causing processor and transactional errors?
  • What database backup operations might affect performance during business operations?
  • What are the timeframes for back-up procedures that might affect business operations, and what are the most critical scenarios involved in the time frame?
  • How many users are there and where are they located (headquarters, branch offices) during times of critical business operations?
These questions helped performance testers identify the most important concerns in order to help prioritize testing efforts. The questions also helped determine what information to include in conversations and reports.

Case Study 3

Scenario

A Web site is responsible for conducting online surveys with 2 million users in a one-hour timeframe. The site infrastructure was built with wide area network (WAN) links all over the world. The site administrators want to test the site’s performance to ensure that it can sustain 2 million user visits in one hour.

Performance Objectives

The performance-testing effort was based on the following overall performance objectives:
  • The Web site is able to support a peak load of 2million user visits in a one-hour timeframe.
  • Survey submissions should not be compromised due to application errors.

Performance Budget/Constraints

The following budget limitations constrained the performance-testing effort:
  • No server can have sustained processor utilization above 75 percent under any anticipated load (normal and peak) during submission of surveys (2 million at peak load).
  • Response times for all survey submissions must not exceed 8 seconds during normal and peak loads.
  • No survey submissions can be lost due to application errors.

Performance-Testing Objectives

The following priority objectives focused the performance testing:
  • Simulate one user transaction scripted with 2 million total virtual users in one hour distributed among two datacenters, with 1 million active users at each data center.
  • Simulate the peak load of 2 million user visits in a one-hour period.
  • Test for 100-percent coverage of all survey types.
  • Monitor for relevant component metrics: end-user response time, error rate, database transactions per second, and overall processor, memory, network and disk status for the database server.
  • Test the error rate to determine the reliability metrics of the survey system.
  • Test by using firewall and load-balancing configurations.

Questions

The following questions helped to determine relevant testing objectives:
  • What is the reason for deciding to test performance?
  • In terms of performance, what issues concern you most in relation to survey submissions that might cause data loss or user abandonment due to slow response time?
  • What types of submissions need to be simulated for surveys related to business requirements?
  • Where are the users located geographically when submitting the surveys?

Summary

Determining and recording performance testing objectives involves communicating with the team to establish and update these objectives as the project advances through milestones. Although it is not always easy to schedule time with each team member —especially when you consider that the project team includes executive stakeholders, analysts, and possibly even representative users — they are generally receptive to sharing information that will help you establish valuable performance-testing objectives. Such objectives might include providing business-related metrics, obtaining resource utilization data under load, generating specific loads to assist with tuning an application server, or providing a report of the number of objects requested by each Web page. While it is most valuable to collect performance-testing objectives early in the project life cycle, it is also important to periodically revisit these objectives and ask team members if they would like to see any new objectives added.

Rabu, 02 November 2016


Chapter III


 DESIGNING CLASSROOM LANGUAGE TEST

Test Types

The first task you face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test and it will also help you to focus on the specific objectives of the test. Two types to create as a classroom teacher – language aptitude tests and language proficiency tests, and three types that you will almost certainly need to create – placement test, diagnostic tests, and achievement tests.

Language Aptitude Tests

A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language.

Task in the modern language aptitude test
1. Number learning : Examinees must learn a set of numbers through aural input and then discriminate different combination of those numbers.
2. Phonetic script : Examinees must team a set of correspondences between speech sounds and phonetic symbols.
3. Spelling dues : Examinees must need words that are spelled some what phonetically
4. Word in sentence : Examinees are given a key word in a sentence and are then asked to select a word in second sentence that performs the same grammatical action as the key word.
5. Paired associates : Examinees must quickly team a set of vocabulary words from another language and memorize their English meaning.

Proficiency Tests

A proficiency test is not limited to any one course, curriculum, or single skill in the language, rather, is tests overall ability. Proficiency tests have traditionally consisted of standardized multiple. Choice items on grammar vocabulary, reading comprehension and aural comprehension. Some times also writing is added.
Proficiency test are almost always summative and norm – referenced. For example test of English as a foreign language (TOEFL) produced by the educational testing service.

Placement Tests

Certain proficiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of language curriculum or school. Placement tests come in many varieties; Assessing comprehension and production, responding through written and oral performance, ended and limited responses, selection (multiple - choice) and gap-filling formats, depending on the nature of a program and its needs.

Diagnostic Tests

A diagnostic tests is designed to diagnose specified aspects of a language. A test in pronunciation, for example. Might diagnose the phonological features of English that are difficult for learners and should there fore become part of a curriculum.
There is also a fine line of difference between a diagnostic test and a general achievement test. Achievement test analyze the extent to which students have acquired language features that have already been taught, diagnostic tests should elicit information on what students need to work on in the future. In a curriculum that has a form – focused phase, for example, a diagnostic test might offer information about a learner’s acquisition of verb tense, modal auxiliaries, definite articles, relative clause and the like.

Achievement Tests

An achievement test is related directly to classroom lessons, units, or even a total curriculum achievement tests are (or should be) limited to particular material addressed in curriculum within a particular time frame and are offered a course has focused on the objectives in question.
Achievement tests are often summative because they are administered at the end of a unit or term of study. The specifications for an achievement test should be determined by :
• The objectives of the lesson, unit, or course being assessed
• The relative importance (or weight) assigned to each objective
• The tasks employed in classroom lessons during the unit of time.

Achievement tests range from five or ten – minute quizzes to three hour final examinations, with an almost in finite variety of item types and formats. Here is the outline for a midterm examination offered at the high intermediate level of an intensive English program in the US

Section A Vocabulary
Part 1 (5 items) : Match words and definitions
Part 2 (5 items) : use the words in a sentence
Section B Grammar
(10 sentences) : error detection (Underline or circle the error)
section C Reading Comprehension
(2 one paragraph passage) : Four short – answer items for each
section D Writing
Respond to a two-paragraph article on Native American culture.

Rabu, 26 Oktober 2016

PRINCIPLE OF ASSESMENT

Chapter II

 

Principles of assessment

Reliability

If a particular assessment were totally reliable, assessors acting independently using the same criteria and mark scheme would come to exactly the same judgment about a given piece of work. In the interests of quality assurance, standards and fairness, whilst recognising that complete objectivity is impossible to achieve, when it comes to summative assessment it is a goal worth aiming for. To this end, what has been described as the 'connoisseur' approach to assessment (like a wine-taster or tea-blender of many years experience, not able to describe exactly what they are looking for but 'knowing it when they find it') is no longer acceptable. Explicitness in terms of learning outcomes and assessment criteria is vitally important in attempting to achieve reliability. They should be explicit to the students when the task is set, and where there are multiple markers they should be discussed, and preferably used on some sample cases prior to be using used 'for real'.

Validity

Just as important as reliability is the question of validity. Does the assessed task actually assess what you want it to? Just because an exam question includes the instruction 'analyse and evaluate' does not actually mean that the skills of analysis and evaluation are going to be assessed. They may be, if the student is presented with a case study scenario and data they have never seen before. But if they can answer perfectly adequately by regurgitating the notes they took from the lecture you gave on the subject then little more may be being assessed than the ability to memorise. There is an argument that all too often in British higher education we assess the things which are easy to assess, which tend to be basic factual knowledge and comprehension rather than the higher order objectives of analysis, synthesis and evaluation.

Relevance and transferability

There is much evidence that human beings do not find it easy to transfer skills from one context to another, and there is in fact a debate as to whether transferability is in itself a separate skill which needs to be taught and learnt. Whatever the outcome of that, the transfer of skills is certainly more likely to be successful when the contexts in which they are developed and used are similar. It is also true to say that academic assessment has traditionally been based on a fairly narrow range of tasks with arguably an emphasis on knowing rather than doing; it has therefore tended to develop a fairly narrow range of skills. For these two reasons, when devising an assessment task it is important that it both addresses the skills you want the student to develop and that as much as possible it puts them into a recognisable context with a sense of 'real purpose' behind why the task would be undertaken and a sense of a 'real audience', beyond the tutor, for whom the task would be done.

Criterion v Norm referenced assessment

In criterion-referenced assessment particular abilities, skills or behaviours are each specified as a criterion which must be reached. The driving test is the classic example of a criterion-referenced test. The examiner has a list of criteria each of which must be satisfactorily demonstrated in order to pass - completing a three-point turn without hitting either kerb for example. The important thing is that failure in one criterion cannot be compensated for by above average performance in others; neither can you fail despite meeting every criterion simply because everybody else that day surpassed the criteria and was better than you.
Norm-referenced assessment makes judgments on how well the individual did in relation to others who took the test. Often used in conjunction with this is the curve of 'normal distribution' which assumes that a few will do exceptionally well and a few will do badly and the majority will peak in the middle as average. Despite the fact that a cohort may not fit this assumption for any number of reasons (it may have been a poor intake, or a very good intake, they have been taught well, or badly, or in introductory courses in particular you may have half who have done it all before and half who are just starting the subject giving a bimodal distribution) there are even some assessment systems which require results to be manipulated to fit.
The logic of a model of course design built on learning outcomes is that the assessment should be criterion-referenced at least to the extent that sufficiently meeting each outcome becomes a 'threshold' minimum to passing the course. If grades and marks have to be generated, a more complex system than pass/fail can be devised by defining the criteria for each grade either holistically grade by grade, or grade by grade for each criterion (see below).

Writing and using assessment criteria

Assessment criteria describe how well a student has to be able to achieve the learning outcome, either in order to pass (in a simple pass/fail system) or in order to be awarded a particular grade; essentially they describe standards. Most importantly they need to be more than a set of headings. Use of theory, for example, is not on its own a criterion. Criteria about theory must describe what aspects of the use of theory are being looked for. You may value any one of the following: the students' ability to make an appropriate choice of theory to address a particular problem, or to give an accurate summary of that theory as it applies to the problem, or to apply it correctly, or imaginatively, or with originality, or to critique the theory, or to compare and contrast it with other theories. And remember, as soon as you have more than one assessment criterion you will also have to make decisions about their relative importance (or weighting).
Graded criteria are criteria related to a particular band of marks or honours classification or grade framework such as Pass, Merit, Distinction. If you write these, be very careful about the statement at the 'pass' level. Preferably start writing at this level and work upwards. The danger in starting from, eg first class honours, is that as you move downwards, the criteria become more and more negative. When drafted, ask yourself whether you would be happy for someone meeting the standard expressed for pass, or third class, to receive an award from your institution. Where possible, discuss draft assessment activities, and particularly criteria, with colleagues before issuing them.
Once decided, the criteria and weightings should be given to the students at the time the task is set, and preferably some time should be spent discussing and clarifying what they mean. Apart from the argument of fairness, this hopefully then gives the student a clear idea of the standard they should aim for and increases the chances they will produce a better piece of work (and hence have learnt what you wanted them to). And feedback to the student on the work produced should be explicitly in terms of the extent to which each criterion has been met.

Chapter 1

 

Differences between Testing, Assessment, and Evaluation


What Do We Mean by Testing, Assessment, and Evaluation?

When defined within an educational setting, assessment, evaluation, and testing are all used to measure how much of the assigned materials students are mastering, how well student are learning the materials, and how well student are meeting the stated goals and objectives. Although you may believe that assessments only provide instructors with information on which to base a score or grade, assessments also help you to assess your own learning.
Education professionals make distinctions between assessment, evaluation, and testing. However, for the purposes of this tutorial, all you really need to understand is that these are three different terms for referring to the process of figuring out how much you know about a given topic and that each term has a different meaning. To simplify things, we will use the term "assessment" throughout this tutorial to refer to this process of measuring what you know and have learned.
In case you are curious, here are some definitions:
  • A test or quiz is used to examine someone's knowledge of something to determine what he or she knows or has learned. Testing measures the level of skill or knowledge that has been reached.
  • Evaluation is the process of making judgments based on criteria and evidence.
  • Assessment is the process of documenting knowledge, skills, attitudes and beliefs, usually in measurable terms. The goal of assessment is to make improvements, as opposed to simply being judged. In an educational context, assessment is the process of describing, collecting, recording, scoring, and interpreting information about learning.