AIOU Assignment BEd 1.5 Year 2.5 Year 8614 Educational Statistics Assignment 2
Note: For Other Assignment of BEd 1.5 Year or 2.5 Year Please click on the Links Below:
Q.1 Explain the concept of reliability. Explain types of reliability and methods used to calculate each type.
Answer:
The term reliability in psychological research refers to the consistency of a research study or measuring test. For example, if a person weighs themselves during the course of a day they would expect to see a similar reading. Scales which measured weight differently each time would be of little use. The same analogy could be applied to a tape measure which measures inches differently each time it was used. It would not be considered reliable. If findings from research are replicated consistently they are reliable. A correlation coefficient can be used to assess the degree of reliability. If a test is reliable it should show a high positive correlation. Of course, it is unlikely the exact same results will be obtained each time as participants and situations vary, but a strong positive correlation between the results of the same test indicates reliability. There are two types of reliability – internal and external reliability. Internal reliability assesses the consistency of results across items within a test. External reliability refers to the extent to which a measure varies from one use to another.
Assessing Reliability
Split-half method
The split-half method assesses the internal consistency of a test, such as psychometric tests and questionnaires. There, it measures the extent to which all parts of the test contribute equally to what is being measured. This is done by comparing the results of one half of a test with the results from the other half. A test can be split in half in several ways, e.g. first half and second half, or by odd and even numbers. If the two halves of the test provide similar results this would suggest that the test has internal reliability.
The reliability of a test could be improved through using this method. For example any items on separate halves of a test which have a low correlation (e.g. r = .25) should either be removed or re-written.
The split-half method is a quick and easy way to establish reliability. However it can only be effective with large questionnaires in which all questions measure the same construct. This means it would not be appropriate for tests which measure different constructs.
For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviors such depression, schizophrenia, social introversion. Therefore the split-half method was not be an appropriate method to assess reliability for this personality test.
Types of reliability and methods used to calculate each type:
Reliability is a measure of the consistency of a metric or a method. Every metric or method we use, including things like methods for uncovering usability problems in an interface and expert judgment, must be assessed for reliability. In fact, before you can establish validity, you need to establish reliability. Here are the four most common ways of measuring reliability for any empirical method or metric:
- inter-rater reliability
- test-retest reliability
- parallel forms reliability
- internal consistency reliability
Because reliability comes from a history in educational measurement (think standardized tests), many of the terms we use to assess reliability come from the testing lexicon. But don’t let bad memories of testing allow you to dismiss their relevance to measuring the customer experience. These four methods are the most common ways of measuring reliability for any empirical method or metric.
Inter-Rater Reliability
The extent to which raters or observers respond the same way to a given phenomenon is one measure of reliability. Where there’s judgment there’s disagreement.
Even highly trained experts disagree among themselves when observing the same phenomenon. Kappa and the correlation coefficient are two common measures of inter-rater reliability. Some examples include:
- Evaluators identifying interface problems
- Experts rating the severity of a problem
For example, we found that the average inter-rater reliability[pdf] of usability experts rating the severity of usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate multiple scores from one observer. In that same study, we found that the average intra-rater reliability when judging problem severity was r = .58 (which is generally low reliability).
Test-Retest Reliability
Do customers provide the same set of responses when nothing about their experience or their attitudes has changed? You don’t want your measurement system to fluctuate when all other things are static.
Have a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few days, typically), have them answer the same questions again. When you correlate the two sets of measures, look for very high correlations (r > 0.7) to establish retest reliability.
As you can see, there’s some effort and planning involved: you need for participants to agree to answer the same questions twice. Few questionnaires measure test-retest reliability (mostly because of the logistics), but with the proliferation of online research, we should encourage more of this type of measure.
Parallel Forms Reliability
Getting the same or very similar results from slight variations on the question or evaluation method also establishes reliability. One way to achieve this is to have, say, 20 items that measure one construct (satisfaction, loyalty, usability) and to administer 10 of the items to one group and the other 10 to another group, and then correlate the results. You’re looking for high correlations and no systematic difference in scores between the groups.
Internal Consistency Reliability
This is by far the most commonly used measure of reliability in applied settings. It’s popular because it’s the easiest to compute using software—it requires only one sample of data to estimate the internal consistency reliability. This measure of reliability is described most often using Cronbach’s alpha (sometimes called coefficient alpha).
It measures how consistently participants respond to one set of items. You can think of it as a sort of average of the correlations between items. Cronbach’s alpha ranges from 0.0 to 1.0 (a negative alpha means you probably need to reverse some items). Since the late 1960s, the minimally acceptable measure of reliability has been 0.70; in practice, though, for high-stakes questionnaires, aim for greater than 0.90. For example, the SUS has a Cronbach’s alpha of 0.92.
The more items you have, the more internally reliable the instrument, so to increase internal consistency reliability, you would add items to your questionnaire. Since there’s often a strong need to have few items, however, internal reliability usually suffers. When you have only a few items, and therefore usually lower internal reliability, having a larger sample size helps offset the loss in reliability.
In Summary
Here are a few things to keep in mind about measuring reliability:
- Reliability is the consistency of a measure or method over time.
- Reliability is necessary but not sufficient for establishing a method or metric as valid.
- There isn’t a single measure of reliability, instead there are four common measures of consistent responses.
- You’ll want to use as many measures of reliability as you can (although in most cases one is sufficient to understand the reliability of your measurement system).
- Even if you can’t collect reliability data, be aware of the ways in which low reliability may affect the validity of your measures, and ultimately the veracity of your decisions
*********************************************************************************
Q.2 What is measure of difference? Explain different types of tests in detail with examples. How are these tests used in hypothesis testing?
Answer:
In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation (positive or negative), reports the direction of that difference (the deviation is positive when the observed value exceeds the reference value). The magnitude of the value indicates the size of the difference.
Whether you like them or not, tests are a way of checking your knowledge or comprehension. They are the main instrument used to evaluate your learning by most educational institutions. According to research studies, tests have another benefit: they make you learn and remember more than you might have otherwise. Although it may seem that all tests are the same, many different types of tests exist and each has a different purpose and style.
Diagnostic Tests
These tests are used o diagnose how much you know and what you know. They can help a teacher know what needs to be reviewed or reinforced in class. They also enable the student to identify areas of weakness.
Placement Tests
These tests are used to place students in the appropriate class or level. For example, in language schools, placement tests are used to check a student’s language level through grammar, vocabulary, reading comprehension, writing, and speaking questions. After establishing the student’s level, the student is placed in the appropriate class to suit his/her needs.
Progress or Achievement Tests
Achievement or progress tests measure the students̢۪ improvement in relation to their syllabus. These tests only contain items which the students have been taught in class. There are two types of progress tests: short-term and long-term.
Short-term progress tests check how well students have understood or learned material covered in specific units or chapters. They enable the teacher to decide if remedial or consolidation work is required.
Proficiency Tests:
These tests check learner levels in relation to general standards. They provide a broad picture of knowledge and ability. In English language learning, examples are the TOEFL and IELTS exams, which are mandatory for foreign-language speakers seeking admission to English-speaking universities. In addition, the TOEIC (Test of English for International Communication) checks students̢۪ knowledge of Business English, as a prerequisite for employment.
Internal Tests
Internal tests are those given by the institution where the learner is taking the course. They are often given at the end of a course in the form of a final exam.
External Tests
External tests are those given by an outside body. Examples are the TOEFL, TOEIC, IELTS, SAT, ACT, LSAT, GRE and GMAT. The exams themselves are the basis for admission to university, job recruitment, or promotion.
Objective Tests
Objective tests are those that have clear right or wrong answers. Multiple-choice tests fall into this group. Students have to select a pre-determined correct answer from three or four possibilities.
Subjective Tests
Subjective tests require the marker or examiner to make a subjective judgment regarding the marks deserved. Examples are essay questions and oral interviews. For such tests, it is especially important that both examiner and student are aware of the grading criteria in order to increase their validity.
Combination Tests
Many tests are a combination of objective and subjective styles. For example, on the TOEFL iBT, the Test of English as a Foreign Language, the reading and listening sections are objective, and the writing and speaking sections are subjective.
*********************************************************************************
Q.3 What is correlation? How level of measurement help us in selecting correct type of correlation? Write comprehensive note on range of correlation coefficient and what does it explain? Can we predict future correlation by current relationship? If yes, then how?
Answer:
The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed.
Correlation Example
Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are affects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):
Person Height Self Esteem
1 68 4.1
2 71 4.6
3 62 3.8
4 75 4.4
5 58 3.2
6 60 3.1
7 67 3.8
8 68 4.1
9 71 4.3
10 69 3.7
11 68 3.5
12 67 3.2
13 63 3.7
14 62 3.3
15 60 3.4
16 63 4.0
17 65 4.1
18 67 3.8
19 63 3.4
20 61 3.6
Now, let's take a quick look at the histogram for each variable:
Finally, we'll look at the simple bivariate (i.e., two-variable) plot:
You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation.
What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is generally true in the plot above.
Calculating the Correlation
We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula.
*********************************************************************************
Q.4 Explain the following terms with examples.
a) Degree of Freedom
Answer:
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom. In other words, the number of degrees of freedom can be defined as the minimum number of independent coordinates that can specify the position of the system completely.
Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter are called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (e.g. the sample variance has N − 1 degrees of freedom, since it is computed from N random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean).
Mathematically, degrees of freedom is the number of dimensions of the domain of a random vector, or essentially the number of "free" components (how many components need to be known before the vector is fully determined).
The term is most often used in the context of linear models (linear regression, analysis of variance), where certain random vectors are constrained to lie in linear subspaces, and the number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such vectors, and the parameters of chi-squared and other distributions that arise in associated statistical testing problems.
While introductory textbooks may introduce degrees of freedom as distribution parameters or through hypothesis testing, it is the underlying geometry that defines degrees of freedom, and is critical to a proper understanding of the concept. Walker (1940) has stated this succinctly as "the number of observations minus the number of necessary relations among these observations."
*********************************************************************************
b) Spread of Scores
Answer:
Measures of spread describe how similar or varied the set of observed values are for a particular variable (data item). Measures of spread include the range, quartiles and the interquartile range, variance and standard deviation.
When can we measure spread?
The spread of the values can be measured for quantitative data, as the variables are numeric and can be arranged into a logical order with a low end value and a high end value.
Why do we measure spread?
Summarising the dataset can help us understand the data, especially when the dataset is large. As discussed in the Measures of Central Tendency page, the mode, median, and meansummarise the data into a single value that is typical or representative of all the values in the dataset, but this is only part of the 'picture' that summarises a dataset. Measures of spread summarise the data in a way that shows how scattered the values are and how much they differ from the mean value.
*********************************************************************************
c) Sample
Answer:
In statistics and quantitative research methodology, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure. The elements of a sample are known as sample points, sampling units or observations.
Typically, the population is very large, making a census or a complete enumeration of all the values in the population either impractical or impossible. The sample usually represents a subset of manageable size. Samples are collected and statistics are calculated from the samples, so that one can make inferences or extrapolations from the sample to the population.
The data sample may be drawn from a population without replacement (i.e. no element can be selected more than once in the same sample), in which case it is a subset of a population; or with replacement (i.e. an element may appear multiple times in the one sample), in which case it is a multisubset.
*********************************************************************************
d) Confidence Interval
Answer:
In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level.
Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter. However, the interval computed from a particular sample does not necessarily include the true value of the parameter. Since the observed data are random samples from the true population, the confidence interval obtained from the data is also random. If a corresponding hypothesis test is performed, the confidence level is the complement of the level of significance; for example, a 95% confidence interval reflects a significance level of 0.05. If it is hypothesized that a true parameter value is 0 but the 95% confidence interval does not contain 0, then the estimate is significantly different from zero at the 5% significance level.
The desired level of confidence is set by the researcher (not determined by data). Most commonly, the 95% confidence level is used. However, other confidence levels can be used, for example, 90% and 99%.
Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample. A larger sample size normally will lead to a better estimate of the population parameter. Confidence intervals were introduced to statistics by Jerzy Neyman in a paper published in 1937.
*********************************************************************************
e) Z Score
Answer:
Simply put, a z-score is the number of standard deviations from the mean a data point is. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is. A z-score is also known as a standard score and it can be placed on a normal distribution curve. Z-scores range from -3 standard deviations (which would fall to the far left of the normal distribution curve) up to +3 standard deviations (which would fall to the far right of the normal distribution curve). In order to use a z-score, you need to know the mean μ and also the population standard deviation σ.
Z-scores are a way to compare results from a test to a “normal” population. Results from tests or surveys have thousands of possible results and units. However, those results can often seem meaningless. For example, knowing that someone’s weight is 150 pounds might be good information, but if you want to compare it to the “average” person’s weight, looking at a vast table of data can be overwhelming (especially if some weights are recorded in kilograms). A z-score can tell you where that person’s weight is compared to the average population’s mean weight.
*********************************************************************************
Q.5 What is data cleaning? Write down its importance and benefits. How to ensure it before analysis of data?
Answer:
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.
After cleansing, a data set should be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleaning differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at the time of entry, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records). Some data cleansing solutions will clean data by cross checking with a validated data set. A common data cleansing practice is data enhancement, where data is made more complete by adding related information. For example, appending addresses with any phone numbers related to that address. Data cleansing may also involve activities like, harmonization of data, and standardization of data. For example, harmonization of short codes (st, rd, etc.) to actual words (street, road, etcetera). Standardization of data is a means of changing a reference data set to a new standard, ex, use of standard codes.
Data cleansing is a valuable process that can help companies save time and increase their efficiency. Data cleansing software tools are used by various organisations to remove duplicate data, fix and amend badly-formatted, incorrect and amend incomplete data from marketing lists, databases and CRM’s. They can achieve in a short period of time what could take days or weeks for an administrator working manually to fix. This means that companies can save not only time but money by acquiring data cleaning tools.
Data cleansing is of particular value to organisations that have vast swathes of data to deal with. These organisations can include banks or government organisations but small to medium enterprises can also find a good use for the programmes. In fact, it’s suggested by many sources that any firm that works with and hold data should invest in cleansing tools. The tools should also be used on a regular basis as inaccurate data levels can grow quickly, compromising database and decreasing business efficiency.
Data Cleansing for a Cleaner Database
Companies may also find that cleansing enables them to remain compliant with standards that are legally expected of them. In most territories, companies are duty-bound to ensure that their data is as accurate and current as possible. The tools can be used for everything from correcting spelling mistakes to postcodes, whilst removing unnecessary records from systems, which means that space can be preserved and that information that is no longer needed – or data which companies are no longer permitted to keep – can be removed simply, quickly and efficiently. Users of data cleansing software can set their own rules to increase the efficiency of a database, making the capabilities of the cleansing software as applicable to the company’s needs and requirements as possible. Some common problems with databases can also include incorrectly formatted phone numbers and e-mail addresses, rendering clients and customers uncontactable. The software can be used to put things right in a matter of seconds. This makes it a perfect tool for companies that need to stay in touch with outside parties. Meanwhile, companies that employ more than one database – companies that are spread across various branches or offices for example – can use the tools to ensure that each branch of their organisation can share the same accurate information.
*********************************************************************************
Post a Comment