Package org.apache.parquet.format
Class SizeStatistics
- java.lang.Object
-
- org.apache.parquet.format.SizeStatistics
-
- All Implemented Interfaces:
Serializable,Cloneable,Comparable<SizeStatistics>,org.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>,org.apache.thrift.TSerializable
@Generated(value="Autogenerated by Thrift Compiler (0.22.0)", date="2025-12-22") public class SizeStatistics extends Object implements org.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>, Serializable, Cloneable, Comparable<SizeStatistics>A structure for capturing metadata for estimating the unencoded, uncompressed size of data written. This is useful for readers to estimate how much memory is needed to reconstruct data in their memory model and for fine grained filter pushdown on nested structures (the histograms contained in this structure can help determine the number of nulls at a particular nesting level and maximum length of lists).- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSizeStatistics._FieldsThe set of fields this struct contains, along with convenience methods for finding and manipulating them.
-
Field Summary
Fields Modifier and Type Field Description List<Long>definition_level_histogramSame as repetition_level_histogram except for definition levels.static Map<SizeStatistics._Fields,org.apache.thrift.meta_data.FieldMetaData>metaDataMapList<Long>repetition_level_histogramWhen present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data.longunencoded_byte_array_data_bytesThe number of physical bytes stored for BYTE_ARRAY data values assuming no encoding.
-
Constructor Summary
Constructors Constructor Description SizeStatistics()SizeStatistics(SizeStatistics other)Performs a deep copy on other.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddToDefinition_level_histogram(long elem)voidaddToRepetition_level_histogram(long elem)voidclear()intcompareTo(SizeStatistics other)SizeStatisticsdeepCopy()booleanequals(Object that)booleanequals(SizeStatistics that)SizeStatistics._FieldsfieldForId(int fieldId)List<Long>getDefinition_level_histogram()Same as repetition_level_histogram except for definition levels.Iterator<Long>getDefinition_level_histogramIterator()intgetDefinition_level_histogramSize()ObjectgetFieldValue(SizeStatistics._Fields field)List<Long>getRepetition_level_histogram()When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data.Iterator<Long>getRepetition_level_histogramIterator()intgetRepetition_level_histogramSize()longgetUnencoded_byte_array_data_bytes()The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding.inthashCode()booleanisSet(SizeStatistics._Fields field)Returns true if field corresponding to fieldID is set (has been assigned a value) and false otherwisebooleanisSetDefinition_level_histogram()Returns true if field definition_level_histogram is set (has been assigned a value) and false otherwisebooleanisSetRepetition_level_histogram()Returns true if field repetition_level_histogram is set (has been assigned a value) and false otherwisebooleanisSetUnencoded_byte_array_data_bytes()Returns true if field unencoded_byte_array_data_bytes is set (has been assigned a value) and false otherwisevoidread(org.apache.thrift.protocol.TProtocol iprot)SizeStatisticssetDefinition_level_histogram(List<Long> definition_level_histogram)Same as repetition_level_histogram except for definition levels.voidsetDefinition_level_histogramIsSet(boolean value)voidsetFieldValue(SizeStatistics._Fields field, Object value)SizeStatisticssetRepetition_level_histogram(List<Long> repetition_level_histogram)When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data.voidsetRepetition_level_histogramIsSet(boolean value)SizeStatisticssetUnencoded_byte_array_data_bytes(long unencoded_byte_array_data_bytes)The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding.voidsetUnencoded_byte_array_data_bytesIsSet(boolean value)StringtoString()voidunsetDefinition_level_histogram()voidunsetRepetition_level_histogram()voidunsetUnencoded_byte_array_data_bytes()voidvalidate()voidwrite(org.apache.thrift.protocol.TProtocol oprot)
-
-
-
Field Detail
-
unencoded_byte_array_data_bytes
public long unencoded_byte_array_data_bytes
The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding. This is exclusive of the bytes needed to store the length of each byte array. In other words, this field is equivalent to the `(size of PLAIN-ENCODING the byte array values) - (4 bytes * number of values written)`. To determine unencoded sizes of other types readers can use schema information multiplied by the number of non-null and null values. The number of null/non-null values can be inferred from the histograms below. For example, if a column chunk is dictionary-encoded with dictionary ["a", "bc", "cde"], and a data page contains the indices [0, 0, 1, 2], then this value for that data page should be 7 (1 + 1 + 2 + 3). This field should only be set for types that use BYTE_ARRAY as their physical type.
-
repetition_level_histogram
public List<Long> repetition_level_histogram
When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data. This field may be omitted if max_repetition_level is 0 without loss of information.
-
definition_level_histogram
public List<Long> definition_level_histogram
Same as repetition_level_histogram except for definition levels. This field may be omitted if max_definition_level is 0 or 1 without loss of information.
-
metaDataMap
public static final Map<SizeStatistics._Fields,org.apache.thrift.meta_data.FieldMetaData> metaDataMap
-
-
Constructor Detail
-
SizeStatistics
public SizeStatistics()
-
SizeStatistics
public SizeStatistics(SizeStatistics other)
Performs a deep copy on other.
-
-
Method Detail
-
deepCopy
public SizeStatistics deepCopy()
- Specified by:
deepCopyin interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
clear
public void clear()
- Specified by:
clearin interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
getUnencoded_byte_array_data_bytes
public long getUnencoded_byte_array_data_bytes()
The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding. This is exclusive of the bytes needed to store the length of each byte array. In other words, this field is equivalent to the `(size of PLAIN-ENCODING the byte array values) - (4 bytes * number of values written)`. To determine unencoded sizes of other types readers can use schema information multiplied by the number of non-null and null values. The number of null/non-null values can be inferred from the histograms below. For example, if a column chunk is dictionary-encoded with dictionary ["a", "bc", "cde"], and a data page contains the indices [0, 0, 1, 2], then this value for that data page should be 7 (1 + 1 + 2 + 3). This field should only be set for types that use BYTE_ARRAY as their physical type.
-
setUnencoded_byte_array_data_bytes
public SizeStatistics setUnencoded_byte_array_data_bytes(long unencoded_byte_array_data_bytes)
The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding. This is exclusive of the bytes needed to store the length of each byte array. In other words, this field is equivalent to the `(size of PLAIN-ENCODING the byte array values) - (4 bytes * number of values written)`. To determine unencoded sizes of other types readers can use schema information multiplied by the number of non-null and null values. The number of null/non-null values can be inferred from the histograms below. For example, if a column chunk is dictionary-encoded with dictionary ["a", "bc", "cde"], and a data page contains the indices [0, 0, 1, 2], then this value for that data page should be 7 (1 + 1 + 2 + 3). This field should only be set for types that use BYTE_ARRAY as their physical type.
-
unsetUnencoded_byte_array_data_bytes
public void unsetUnencoded_byte_array_data_bytes()
-
isSetUnencoded_byte_array_data_bytes
public boolean isSetUnencoded_byte_array_data_bytes()
Returns true if field unencoded_byte_array_data_bytes is set (has been assigned a value) and false otherwise
-
setUnencoded_byte_array_data_bytesIsSet
public void setUnencoded_byte_array_data_bytesIsSet(boolean value)
-
getRepetition_level_histogramSize
public int getRepetition_level_histogramSize()
-
addToRepetition_level_histogram
public void addToRepetition_level_histogram(long elem)
-
getRepetition_level_histogram
public List<Long> getRepetition_level_histogram()
When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data. This field may be omitted if max_repetition_level is 0 without loss of information.
-
setRepetition_level_histogram
public SizeStatistics setRepetition_level_histogram(List<Long> repetition_level_histogram)
When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data. This field may be omitted if max_repetition_level is 0 without loss of information.
-
unsetRepetition_level_histogram
public void unsetRepetition_level_histogram()
-
isSetRepetition_level_histogram
public boolean isSetRepetition_level_histogram()
Returns true if field repetition_level_histogram is set (has been assigned a value) and false otherwise
-
setRepetition_level_histogramIsSet
public void setRepetition_level_histogramIsSet(boolean value)
-
getDefinition_level_histogramSize
public int getDefinition_level_histogramSize()
-
addToDefinition_level_histogram
public void addToDefinition_level_histogram(long elem)
-
getDefinition_level_histogram
public List<Long> getDefinition_level_histogram()
Same as repetition_level_histogram except for definition levels. This field may be omitted if max_definition_level is 0 or 1 without loss of information.
-
setDefinition_level_histogram
public SizeStatistics setDefinition_level_histogram(List<Long> definition_level_histogram)
Same as repetition_level_histogram except for definition levels. This field may be omitted if max_definition_level is 0 or 1 without loss of information.
-
unsetDefinition_level_histogram
public void unsetDefinition_level_histogram()
-
isSetDefinition_level_histogram
public boolean isSetDefinition_level_histogram()
Returns true if field definition_level_histogram is set (has been assigned a value) and false otherwise
-
setDefinition_level_histogramIsSet
public void setDefinition_level_histogramIsSet(boolean value)
-
setFieldValue
public void setFieldValue(SizeStatistics._Fields field, Object value)
- Specified by:
setFieldValuein interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
getFieldValue
public Object getFieldValue(SizeStatistics._Fields field)
- Specified by:
getFieldValuein interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
isSet
public boolean isSet(SizeStatistics._Fields field)
Returns true if field corresponding to fieldID is set (has been assigned a value) and false otherwise- Specified by:
isSetin interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
equals
public boolean equals(SizeStatistics that)
-
compareTo
public int compareTo(SizeStatistics other)
- Specified by:
compareToin interfaceComparable<SizeStatistics>
-
fieldForId
public SizeStatistics._Fields fieldForId(int fieldId)
- Specified by:
fieldForIdin interfaceorg.apache.thrift.TBase<SizeStatistics,SizeStatistics._Fields>
-
read
public void read(org.apache.thrift.protocol.TProtocol iprot) throws org.apache.thrift.TException- Specified by:
readin interfaceorg.apache.thrift.TSerializable- Throws:
org.apache.thrift.TException
-
write
public void write(org.apache.thrift.protocol.TProtocol oprot) throws org.apache.thrift.TException- Specified by:
writein interfaceorg.apache.thrift.TSerializable- Throws:
org.apache.thrift.TException
-
validate
public void validate() throws org.apache.thrift.TException- Throws:
org.apache.thrift.TException
-
-